python-3.x - python 请求使用验证码登录

我正在尝试下载验证码图像，手动解决它，然后在 POST 中将其与用户名和密码一起提交。我的响应文本只是原始登录页面，因此我认为这意味着我的代码失败。我登录的网页位于暗网上，但我不知道这是否确实相关。我唯一能想到的是提交 POST 的行为正在生成一个新的验证码。希望对 HTTP 有更好了解的人可以帮助我。

from bs4 import BeautifulSoup
import base64
import requests

session = requests.Session()
session.proxies = {'http': 'socks5h://127.0.0.1:9150', 'https': 'socks5h://127.0.0.1:9150'}

url = session.get("http://waeixxcraed4gw7q.onion/signin")
soup = BeautifulSoup(url.text, "lxml")
imgs = soup.findAll('img')

#save captcha from base64 encoding
img_data = bytes(imgs[1]['src'][23:],encoding='utf-8')
with open("olympus_captcha.jpg","wb") as fh:
    fh.write(base64.decodestring(img_data))

#solve the captcha that has been saved to the harddrive    
captcha = input("enter captcha:\n")

#attempt login (password and username removed)
payload = {"username":username, "password":password, "captcha":captcha}
response = session.post("http://waeixxcraed4gw7q.onion/signin", data = payload)
print(response.text)

最佳答案

我通过以下方式做到了这一点，首次安装tesseract在您的系统上

 session = requests.session()
 url = "Login page link"
 r = session.get(url)
 soup = bs(r.text, 'html.parser')
 img = soup.find('img', id='captcha')
 img_SRC = img['src']

 with open('captcha.jpg', 'wb') as handle:
     response = requests.get(img_SRC, stream=True)
     if not response.ok:
         print("ok")
     for block in response.iter_content(1024):
         if not block:
             break
         handle.write(block)
  try:
      from PIL import Image
  except ImportError:
      import Image
  import pytesseract
  pic = Image.open('result/captcha.jpg')
  pytesseract.pytesseract.tesseract_cmd = r'Enter the installation path of the \tesseract.exe'
  captchaText = pytesseract.image_to_string(pic)

验证码中的文本可用于登录

关于python-3.x - python 请求使用验证码登录，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/51049591/

python-3.x - python 请求使用验证码登录

上一篇：r - ggplot 将图例中的符号覆盖为直线

下一篇：encryption - 自定义加密/解密程序中的错误