python - 我的刮刀抛出错误而不是继续

我结合 selenium 在 python 中创建了一个 scraper，以从站点收集一些信息。但是，我面临的问题是在收集到一条线索后，爬虫会抛出错误 element is not attached to the page document。

考虑以下代码:

for 循环 滚动的位置有 20 个名称，抓取工具应该点击每个名称。
点击名字后，它会在新页面中等待文档可用。
在该页面的右上角有一个显示更多按钮，单击该按钮可展开隐藏信息。 (它仍然停留在第二页，只是显示了一条新信息)。
信息一出现，抓取工具就会成功收集。
然后它应该返回到循环开始的起始页面并转到下一个要单击的名称。但是，它没有单击下一个名称，而是抛出以下错误(在 link.click() 行)。

我试图通过使用 wait.until(EC.staleness_of(item)) 来消除陈旧元素错误，但它不起作用。

for link in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,"div.presence-entity__image"))):
    link.click() #error thrown here
    wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,"button[data-control-name='contact_see_more']"))).click()
    item = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,".pv-contact-info__ci-container a[href^='mailto:']")))
    print(item.get_attribute("href"))
    driver.execute_script("window.history.go(-1)")
    wait.until(EC.staleness_of(item))

我遇到的错误:

line 194, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document

我试图描绘正在发生的事情的画面。对此的任何帮助将不胜感激。

最佳答案

与其单击循环中的每个链接，不如收集所有链接并循环导航到所有这些链接:

links = [link.get_attribute('href') for link in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,"a.mn-person-info__picture.ember-view")))]
for link in links:
    driver.get(link)
    wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,"button[data-control-name='contact_see_more']"))).click()
    item = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,".pv-contact-info__ci-container a[href^='mailto:']")))
    print(item.get_attribute("href"))

请注意，要获取所有链接，您可能需要向下滚动“连接”页面以通过 XHR 加载更多连接

关于python - 我的刮刀抛出错误而不是继续，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48592144/

python - 我的刮刀抛出错误而不是继续

上一篇：python - 数据框上的多个类次

下一篇：python - 如何根据Python中的字符拆分字符串列表