python - 使用 Selenium Python 和 chromedriver 截取整页截图

在尝试了各种方法之后......我偶然发现了这个页面，用 chromedriver、selenium 和 python 截取了整页截图。

原代码是here . (我复制下面这篇文章中的代码)

它使用 PIL 并且效果很好!但是，有一个问题......它会捕获固定的标题并在整个页面中重复，并且在页面更改期间也会丢失页面的某些部分。截图示例网址:

http://www.w3schools.com/js/default.asp

如何避免使用此代码重复 header ...或者有没有更好的选择只使用 python... (我不知道 java 也不想使用 java)。

请看下面当前结果的截图和示例代码。

test.py

"""
This script uses a simplified version of the one here:
https://snipt.net/restrada/python-selenium-workaround-for-full-page-screenshot-using-chromedriver-2x/

It contains the *crucial* correction added in the comments by Jason Coutu.
"""

import sys

from selenium import webdriver
import unittest

import util

class Test(unittest.TestCase):
    """ Demonstration: Get Chrome to generate fullscreen screenshot """

    def setUp(self):
        self.driver = webdriver.Chrome()

    def tearDown(self):
        self.driver.quit()

    def test_fullpage_screenshot(self):
        ''' Generate document-height screenshot '''
        #url = "http://effbot.org/imagingbook/introduction.htm"
        url = "http://www.w3schools.com/js/default.asp"
        self.driver.get(url)
        util.fullpage_screenshot(self.driver, "test.png")


if __name__ == "__main__":
    unittest.main(argv=[sys.argv[0]])

util.py

import os
import time

from PIL import Image

def fullpage_screenshot(driver, file):

        print("Starting chrome full page screenshot workaround ...")

        total_width = driver.execute_script("return document.body.offsetWidth")
        total_height = driver.execute_script("return document.body.parentNode.scrollHeight")
        viewport_width = driver.execute_script("return document.body.clientWidth")
        viewport_height = driver.execute_script("return window.innerHeight")
        print("Total: ({0}, {1}), Viewport: ({2},{3})".format(total_width, total_height,viewport_width,viewport_height))
        rectangles = []

        i = 0
        while i < total_height:
            ii = 0
            top_height = i + viewport_height

            if top_height > total_height:
                top_height = total_height

            while ii < total_width:
                top_width = ii + viewport_width

                if top_width > total_width:
                    top_width = total_width

                print("Appending rectangle ({0},{1},{2},{3})".format(ii, i, top_width, top_height))
                rectangles.append((ii, i, top_width,top_height))

                ii = ii + viewport_width

            i = i + viewport_height

        stitched_image = Image.new('RGB', (total_width, total_height))
        previous = None
        part = 0

        for rectangle in rectangles:
            if not previous is None:
                driver.execute_script("window.scrollTo({0}, {1})".format(rectangle[0], rectangle[1]))
                print("Scrolled To ({0},{1})".format(rectangle[0], rectangle[1]))
                time.sleep(0.2)

            file_name = "part_{0}.png".format(part)
            print("Capturing {0} ...".format(file_name))

            driver.get_screenshot_as_file(file_name)
            screenshot = Image.open(file_name)

            if rectangle[1] + viewport_height > total_height:
                offset = (rectangle[0], total_height - viewport_height)
            else:
                offset = (rectangle[0], rectangle[1])

            print("Adding to stitched image with offset ({0}, {1})".format(offset[0],offset[1]))
            stitched_image.paste(screenshot, offset)

            del screenshot
            os.remove(file_name)
            part = part + 1
            previous = rectangle

        stitched_image.save(file)
        print("Finishing chrome full page screenshot workaround...")
        return True

最佳答案

此答案比之前的答案改进了 am05mhz和 Javed Karim .

它采用 headless 模式，并且最初没有设置窗口大小选项。在调用此函数之前，请确保页面已完全加载或充分加载。

它尝试将宽度和高度都设置为必要的值。整个页面的屏幕截图有时会包含一个不必要的垂直滚动条。通常避免滚动条的一种方法是截取 body 元素的屏幕截图。保存屏幕截图后，它会将大小恢复为原来的大小，否则下一个屏幕截图的大小可能设置不正确。

对于某些示例，这种技术最终可能仍然不能很好地工作。

from selenium import webdriver

def save_screenshot(driver: webdriver.Chrome, path: str = '/tmp/screenshot.png') -> None:
    # Ref: https://stackoverflow.com/a/52572919/
    original_size = driver.get_window_size()
    required_width = driver.execute_script('return document.body.parentNode.scrollWidth')
    required_height = driver.execute_script('return document.body.parentNode.scrollHeight')
    driver.set_window_size(required_width, required_height)
    # driver.save_screenshot(path)  # has scrollbar
    driver.find_element_by_tag_name('body').screenshot(path)  # avoids scrollbar
    driver.set_window_size(original_size['width'], original_size['height'])

如果使用 Python 3.6 之前的版本，请从函数定义中删除类型注释。

关于python - 使用 Selenium Python 和 chromedriver 截取整页截图，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/41721734/

python - 使用 Selenium Python 和 chromedriver 截取整页截图

上一篇：python - 亚马逊 SES : SendEmail operation: Illegal addres 错误

下一篇：python - 使用请求通过 http 下载文件时的进度条