python - Cloudflare 如何区分 Selenium 和 Requests 流量？

语境

我目前正在尝试使用 Python 中的 Selenium 和 Requests 模块构建一个小型机器人。
但是，我要与之交互的网页在 Cloudflare 后面运行。
我的 python 脚本使用 stem 模块在 Tor 上运行。
我的流量分析是基于 Firefox 的“开发者选项->网络”使用 Persist Logs。

到目前为止我的发现:

Selenium 的 Firefox webdriver 经常可以访问网页，而无需经过“检查浏览器页面”(返回码 503)和“验证码页面”(返回码 403)。

使用相同的用户代理请求 session 对象总是导致“验证码页面”(返回码 403)。

如果 Cloudflare 正在检查我的 Javascript 功能，我的 requests 模块不应该返回 503 吗？

代码示例

driver = webdriver.Firefox(firefox_profile=fp, options=fOptions)
driver.get("https://www.cloudflare.com")   # usually returns code 200 without verifying the browser

session = requests.Session()
# ... applied socks5 proxy for both http and https ... #
session.headers.update({"user-agent": driver.execute_script("return navigator.userAgent;")})
page = session.get("https://www.cloudflare.com")
print(page.status_code) # return code 403
print(page.text)        # returns "captcha page"

Selenium 和 Requests 模块都使用相同的用户代理和 ip。
两者都使用 GET 没有任何参数。
Cloudflare 如何区分这些流量？
我错过了什么吗？

我尝试将 cookie 从 webdriver 传输到请求 session ，以查看是否可以绕过但没有运气。
这是使用的代码:

for c in driver.get_cookies():
    session.cookies.set(c['name'], c['value'], domain=c['domain'])

最佳答案

验证码响应取决于浏览器指纹。这不仅仅是发送 Cookie 和用户代理。
复制开发者控制台中网络选项卡中的所有 header ，并将所有键值对作为 header 发送到请求库中。
这种方法应该合乎逻辑。

关于python - Cloudflare 如何区分 Selenium 和 Requests 流量？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/62084602/

python - Cloudflare 如何区分 Selenium 和 Requests 流量？

上一篇：python - 是否有适用于 google colab 中的笔记本的 python 样式检查器？

下一篇：github - 关于 GitHub 操作/GitHub 拉取请求的 HTML 报告