在尝试了各种方法之后......我偶然发现了这个页面,用 chromedriver、selenium 和 python 截取了整页截图。
原代码是here . (我复制下面这篇文章中的代码)
它使用 PIL 并且效果很好!但是,有一个问题......它会捕获固定的标题并在整个页面中重复,并且在页面更改期间也会丢失页面的某些部分。截图示例网址:
http://www.w3schools.com/js/default.asp
如何避免使用此代码重复 header ...或者有没有更好的选择只使用 python... (我不知道 java 也不想使用 java)。
请看下面当前结果的截图和示例代码。
test.py
"""
This script uses a simplified version of the one here:
https://snipt.net/restrada/python-selenium-workaround-for-full-page-screenshot-using-chromedriver-2x/
It contains the *crucial* correction added in the comments by Jason Coutu.
"""
import sys
from selenium import webdriver
import unittest
import util
class Test(unittest.TestCase):
""" Demonstration: Get Chrome to generate fullscreen screenshot """
def setUp(self):
self.driver = webdriver.Chrome()
def tearDown(self):
self.driver.quit()
def test_fullpage_screenshot(self):
''' Generate document-height screenshot '''
#url = "http://effbot.org/imagingbook/introduction.htm"
url = "http://www.w3schools.com/js/default.asp"
self.driver.get(url)
util.fullpage_screenshot(self.driver, "test.png")
if __name__ == "__main__":
unittest.main(argv=[sys.argv[0]])
util.py
import os
import time
from PIL import Image
def fullpage_screenshot(driver, file):
print("Starting chrome full page screenshot workaround ...")
total_width = driver.execute_script("return document.body.offsetWidth")
total_height = driver.execute_script("return document.body.parentNode.scrollHeight")
viewport_width = driver.execute_script("return document.body.clientWidth")
viewport_height = driver.execute_script("return window.innerHeight")
print("Total: ({0}, {1}), Viewport: ({2},{3})".format(total_width, total_height,viewport_width,viewport_height))
rectangles = []
i = 0
while i < total_height:
ii = 0
top_height = i + viewport_height
if top_height > total_height:
top_height = total_height
while ii < total_width:
top_width = ii + viewport_width
if top_width > total_width:
top_width = total_width
print("Appending rectangle ({0},{1},{2},{3})".format(ii, i, top_width, top_height))
rectangles.append((ii, i, top_width,top_height))
ii = ii + viewport_width
i = i + viewport_height
stitched_image = Image.new('RGB', (total_width, total_height))
previous = None
part = 0
for rectangle in rectangles:
if not previous is None:
driver.execute_script("window.scrollTo({0}, {1})".format(rectangle[0], rectangle[1]))
print("Scrolled To ({0},{1})".format(rectangle[0], rectangle[1]))
time.sleep(0.2)
file_name = "part_{0}.png".format(part)
print("Capturing {0} ...".format(file_name))
driver.get_screenshot_as_file(file_name)
screenshot = Image.open(file_name)
if rectangle[1] + viewport_height > total_height:
offset = (rectangle[0], total_height - viewport_height)
else:
offset = (rectangle[0], rectangle[1])
print("Adding to stitched image with offset ({0}, {1})".format(offset[0],offset[1]))
stitched_image.paste(screenshot, offset)
del screenshot
os.remove(file_name)
part = part + 1
previous = rectangle
stitched_image.save(file)
print("Finishing chrome full page screenshot workaround...")
return True
最佳答案
此答案比之前的答案改进了 am05mhz和 Javed Karim .
它采用 headless 模式,并且最初没有设置窗口大小选项。在调用此函数之前,请确保页面已完全加载或充分加载。
它尝试将宽度和高度都设置为必要的值。整个页面的屏幕截图有时会包含一个不必要的垂直滚动条。通常避免滚动条的一种方法是截取 body 元素的屏幕截图。保存屏幕截图后,它会将大小恢复为原来的大小,否则下一个屏幕截图的大小可能设置不正确。
对于某些示例,这种技术最终可能仍然不能很好地工作。
from selenium import webdriver
def save_screenshot(driver: webdriver.Chrome, path: str = '/tmp/screenshot.png') -> None:
# Ref: https://stackoverflow.com/a/52572919/
original_size = driver.get_window_size()
required_width = driver.execute_script('return document.body.parentNode.scrollWidth')
required_height = driver.execute_script('return document.body.parentNode.scrollHeight')
driver.set_window_size(required_width, required_height)
# driver.save_screenshot(path) # has scrollbar
driver.find_element_by_tag_name('body').screenshot(path) # avoids scrollbar
driver.set_window_size(original_size['width'], original_size['height'])
如果使用 Python 3.6 之前的版本,请从函数定义中删除类型注释。
关于python - 使用 Selenium Python 和 chromedriver 截取整页截图,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41721734/