我编写了以下代码来登录网站。到目前为止,它只是获取网页,接受 cookie,但是当我尝试通过单击登录按钮登录时,页面挂起并且登录页面永远不会加载。
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException, ElementNotInteractableException
# Accept consent cookies
def accept_cookies(browser):
try:
browser.find_element_by_xpath('//*[@id="gdpr-banner-accept"]').click()
except NoSuchElementException:
print('Cookies already accepted')
# Webpage parameters
base_site = "https://www.ebay-kleinanzeigen.de/"
# Setup remote control browser
fireFoxOptions = webdriver.FirefoxOptions()
#fireFoxOptions.add_argument("--headless")
browser = webdriver.Firefox(executable_path = '/home/Webdriver/bin/geckodriver',firefox_options=fireFoxOptions)
browser.get(base_site)
accept_cookies(browser)
# Click login pop-up
browser.find_elements_by_xpath("//*[contains(text(), 'Einloggen')]")[1].click()
注意:有两个登录按钮(一个是弹出窗口,一个是页面中的),我已尝试过这两个按钮,但结果相同。
我在其他网站也做过类似的事情,没问题。所以我很好奇为什么它在这里不起作用。
对于为什么会这样有什么想法吗?或者如何解决这个问题?
最佳答案
我稍微修改了您的代码,添加了几个可选参数,在执行时我得到了以下结果:
代码块:
from selenium import webdriver from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC driver.get("https://www.ebay-kleinanzeigen.de/") WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[@id='gdpr-banner-accept']"))).click() WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[contains(text(), 'Einloggen')]"))).click()
观察结果:我的观察结果与您的类似,页面挂起并且登录页面永远不会加载,如下所示:
深入探讨
在检查DOM Tree时的网页你会发现一些<script>
和<link>
标签指的是具有关键字dist的JavaScript。举个例子:
-
<script type="text/javascript" async="" src="/static/js/lib/node_modules/@ebayk/prebid/dist/prebid.10o55zon5xxyi.js"></script>
-
window.BelenConf.prebidFileSrc = '/static/js/lib/node_modules/@ebayk/prebid/dist/prebid.10o55zon5xxyi.js';
这明确表明该网站受到机器人管理服务提供商的保护 Distil Networks ChromeDriver 的导航会被检测到并随后被阻止。
蒸馏
根据文章There Really Is Something About Distil.it... :
Distil protects sites against automatic content scraping bots by observing site behavior and identifying patterns peculiar to scrapers. When Distil identifies a malicious bot on one site, it creates a blacklisted behavioral profile that is deployed to all its customers. Something like a bot firewall, Distil detects patterns and reacts.
此外,
"One pattern with Selenium was automating the theft of Web content"
, Distil CEO Rami Essaid said in an interview last week."Even though they can create new bots, we figured out a way to identify Selenium the a tool they're using, so we're blocking Selenium no matter how many times they iterate on that bot. We're doing that now with Python and a lot of different technologies. Once we see a pattern emerge from one type of bot, then we work to reverse engineer the technology they use and identify it as malicious".
引用
您可以在以下位置找到一些详细的讨论:
关于python - 单击登录按钮后 Ebay 网站挂起 - Selenium Python,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65658258/