我正在尝试在 python 中为网站 www.mouser.co.uk 使用 selenium chromedriver。然而,它从第一次拍摄就被检测为机器人。
有人对此有解释吗?。以下是我使用的代码:
options = Options()
options.add_argument("--start-maximized")
browser = webdriver.Chrome('chromedriver.exe',chrome_options=options)
wait = WebDriverWait(browser, 30)
browser.get('https://www.mouser.co.uk')
最佳答案
我试图访问 url https://www.mouser.co.uk/
使用某些 chrome.options 但确实被检测到并被重定向到 Pardon Our Interruption 页面。
代码块:
from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC options = Options() options.add_argument("start-maximized") options.add_argument("disable-infobars") options.add_argument("--disable-extensions") driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe') driver.get("https://www.mouser.co.uk") myElement = WebDriverWait(driver, 30).until(EC.element_to_be_clickable((By.XPATH, "//a[@id='1_lnkLeftFlag']"))) driver.execute_script("arguments[0].click();", myElement)
现在检查 请原谅我们的打扰 页面,您会发现 <body>
标签包含:
- 类属性
dist-GlobalHeader
- 类属性
dist-PageWrap
这清楚地表明该网站受到Bot Management 服务提供商的保护Distil Networks ChromeDriver 的导航会被检测到并随后被阻止。
提炼
根据文章There Really Is Something About Distil.it... :
Distil protects sites against automatic content scraping bots by observing site behavior and identifying patterns peculiar to scrapers. When Distil identifies a malicious bot on one site, it creates a blacklisted behavioral profile that is deployed to all its customers. Something like a bot firewall, Distil detects patterns and reacts.
更进一步,
"One pattern with Selenium was automating the theft of Web content"
, Distil CEO Rami Essaid said in an interview last week."Even though they can create new bots, we figured out a way to identify Selenium the a tool they're using, so we're blocking Selenium no matter how many times they iterate on that bot. We're doing that now with Python and a lot of different technologies. Once we see a pattern emerge from one type of bot, then we work to reverse engineer the technology they use and identify it as malicious".
引用
您可以在以下位置找到一些详细的讨论:
关于python - 检测到通过 ChromeDriver 启动的 Chrome 浏览器,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52832413/