python - Selenium 下载与浏览器中不同的验证码图像

标签 python selenium

我正在尝试使用 Selenium 下载验证码图像,但是,我下载的图像与浏览器中显示的图像不同。 如果我在不更改浏览器的情况下再次尝试下载图像,我会得到一个不同的图像。

有什么想法吗?

from selenium import webdriver
import urllib


driver = webdriver.Firefox()
driver.get("http://sistemas.cvm.gov.br/?fundosreg")

# Change frame.
driver.switch_to.frame("Main")


# Download image/captcha.
img = driver.find_element_by_xpath(".//*[@id='trRandom3']/td[2]/img")
src = img.get_attribute('src')
urllib.request.urlretrieve(src, "captcha.jpeg")

最佳答案

您可以通过一段 Javascript 获取验证码的渲染图像。它比拍摄和裁剪屏幕截图更快:

import base64
from selenium import webdriver

driver = webdriver.Firefox()
driver.set_script_timeout(10)

driver.get("http://sistemas.cvm.gov.br/?fundosreg")

driver.switch_to.frame("Main")

# find the captcha element
ele_captcha = driver.find_element_by_xpath("//img[contains(./@src, 'RandomTxt.aspx')]")

# get the captcha as a base64 string
img_captcha_base64 = driver.execute_async_script("""
    var ele = arguments[0], callback = arguments[1];
    ele.addEventListener('load', function fn(){
      ele.removeEventListener('load', fn, false);
      var cnv = document.createElement('canvas');
      cnv.width = this.width; cnv.height = this.height;
      cnv.getContext('2d').drawImage(this, 0, 0);
      callback(cnv.toDataURL('image/jpeg').substring(22));
    }, false);
    ele.dispatchEvent(new Event('load'));
    """, ele_captcha)

# save the captcha to a file
with open(r"captcha.jpg", 'wb') as f:
    f.write(base64.b64decode(img_captcha_base64))

编辑:

Selenium 刚刚在 4.3.0 版本中删除了 find_element_by_xpath 方法。查看更改: https://github.com/SeleniumHQ/selenium/blob/a4995e2c096239b42c373f26498a6c9bb4f2b3e7/py/CHANGES

Selenium 4.3.0
* Deprecated find_element_by_* and find_elements_by_* are now removed (#10712)
* Deprecated Opera support has been removed (#10630)
* Fully upgraded from python 2x to 3.7 syntax and features (#10647)
* Added a devtools version fallback mechanism to look for an older version when mismatch occurs (#10749)
* Better support for co-operative multi inheritance by utilising super() throughout
* Improved type hints throughout

方法必须改自

ele_captcha = driver.find_element_by_xpath("//img[contains(./@src, 'RandomTxt.aspx')]")

收件人:

ele_captcha = driver.find_element("xpath", "//img[contains(./@src, 'RandomTxt.aspx')]")

完整的工作脚本:

import base64
from selenium import webdriver

driver = webdriver.Firefox()
driver.set_script_timeout(10)

driver.get("http://sistemas.cvm.gov.br/?fundosreg")

driver.switch_to.frame("Main")

# find the captcha element
ele_captcha = driver.find_element("xpath", "//img[contains(./@src, 'RandomTxt.aspx')]")

# get the captcha as a base64 string
img_captcha_base64 = driver.execute_async_script("""
    var ele = arguments[0], callback = arguments[1];
    ele.addEventListener('load', function fn(){
      ele.removeEventListener('load', fn, false);
      var cnv = document.createElement('canvas');
      cnv.width = this.width; cnv.height = this.height;
      cnv.getContext('2d').drawImage(this, 0, 0);
      callback(cnv.toDataURL('image/jpeg').substring(22));
    }, false);
    ele.dispatchEvent(new Event('load'));
    """, ele_captcha)

# save the captcha to a file
with open(r"captcha.jpg", 'wb') as f:
    f.write(base64.b64decode(img_captcha_base64))

关于python - Selenium 下载与浏览器中不同的验证码图像,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36636255/

相关文章:

python - 当使用 selenium 编写测试时,如何查看生成的 HTML?

python - 如果查询不在 mySQL 数据库中,则显示错误消息

将结束参数与打印函数一起使用时出现 Python SyntaxError

Java/Selenium WebDriver : Focus don't skip to ELSE block if a webelement is not found on page

python - WebDriver异常: 'chromedriver.exe' executable may have wrong permissions

java - Selenide打开IE但无法通过ID选择元素

python - Selenium ChromeDriver 无法识别新编译的 Headless Chromium(Python)

python 将读取文件中的字符串方法替换为二进制

python - 在 Matplotlib 中注释时间序列图

python - 初学者 : Python sound modules not working (Circular Import)