python - 检测到通过 ChromeDriver 启动的 Chrome 浏览器

标签 python selenium google-chrome selenium-webdriver selenium-chromedriver

我正在尝试在 python 中为网站 www.mouser.co.uk 使用 selenium chromedriver。然而,它从第一次拍摄就被检测为机器人。

enter image description here

有人对此有解释吗?。以下是我使用的代码:

options = Options()
options.add_argument("--start-maximized")
browser = webdriver.Chrome('chromedriver.exe',chrome_options=options)
wait = WebDriverWait(browser, 30)
browser.get('https://www.mouser.co.uk')

最佳答案

我试图访问 url https://www.mouser.co.uk/使用某些 chrome.options 但确实被检测到并被重定向到 Pardon Our Interruption 页面。

  • 代码块:

    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    options = Options()
    options.add_argument("start-maximized")
    options.add_argument("disable-infobars")
    options.add_argument("--disable-extensions")
    driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
    driver.get("https://www.mouser.co.uk")
    myElement = WebDriverWait(driver, 30).until(EC.element_to_be_clickable((By.XPATH, "//a[@id='1_lnkLeftFlag']")))
    driver.execute_script("arguments[0].click();", myElement)
    

现在检查 请原谅我们的打扰 页面,您会发现 <body>标签包含:

  • 属性 dist-GlobalHeader
  • 属性 dist-PageWrap

这清楚地表明该网站受到Bot Management 服务提供商的保护Distil Networks ChromeDriver 的导航会被检测到并随后被阻止


提炼

根据文章There Really Is Something About Distil.it... :

Distil protects sites against automatic content scraping bots by observing site behavior and identifying patterns peculiar to scrapers. When Distil identifies a malicious bot on one site, it creates a blacklisted behavioral profile that is deployed to all its customers. Something like a bot firewall, Distil detects patterns and reacts.

更进一步,

"One pattern with Selenium was automating the theft of Web content", Distil CEO Rami Essaid said in an interview last week. "Even though they can create new bots, we figured out a way to identify Selenium the a tool they're using, so we're blocking Selenium no matter how many times they iterate on that bot. We're doing that now with Python and a lot of different technologies. Once we see a pattern emerge from one type of bot, then we work to reverse engineer the technology they use and identify it as malicious".


引用

您可以在以下位置找到一些详细的讨论:

关于python - 检测到通过 ChromeDriver 启动的 Chrome 浏览器,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52832413/

相关文章:

python - 如何将起始序列号传递给 Django factory_boy 工厂?

Python:UnicodeDecodeError: 'utf8' 编解码器无法解码字节

python - 使用 matplotlib 绘制没有周末间隙的时间序列烛台

javascript - D3 旋转文本 : smoothing in Chrome?

python - 有没有办法使用加载的服务帐户来访问 Cloud Run 中的 Google 表格?

selenium - 如何在 Chrome 或 Firefox 中获取绝对 XPath

windows - 批处理文件 - 关闭由批处理文件进程打开的单独的 cmd 窗口

html - 图像在 Firefox 中完全不对齐,但在 Chrome 中没有

javascript - Webkit[Chrome/Safari] javascript 选择焦点错误的解决方法(在字段之间使用制表符时)

selenium - Requests 或 Urllib - 登录网站,将下载请求发送到 url,并另存为 xlsx