python - Selenium webdriver 从 find_elements_by_X 返回空列表

标签 python selenium selenium-webdriver web-scraping dynamic

我的目标是获取所有已在 https://www.prusaprinters.org/prints 上发布的新项目的名称列表在给定一天的 24 小时内。

通过一些阅读,我了解到我应该使用 Selenium,因为我抓取的网站是动态的(在用户滚动时加载更多对象)。

问题是,我似乎无法从 webdriver.find_elements_by_ 中得到一个空列表,其中任何后缀都列在 https://selenium-python.readthedocs.io/locating-elements.html 中。 .

在网站上,当我检查要获取标题的元素时,我看到 "class = name""class = clamp-two-lines" (见屏幕截图),但我似乎无法返回页面上所有元素的列表,其中包含该 name 类或 clamp-two-lines 类。

prusaprinters inspect element

这是我目前的代码(注释掉的行是失败的尝试):

from timeit import default_timer as timer
start_time = timer()
print("Script Started")

import bs4, selenium, smtplib, time
from bs4 import BeautifulSoup 
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

driver = webdriver.Chrome(r'D:\PortableApps\Python Peripherals\chromedriver.exe')

url = 'https://www.prusaprinters.org/prints'
driver.get(url)
# foo = driver.find_elements_by_name('name')
# foo = driver.find_elements_by_xpath('name')
# foo = driver.find_elements_by_class_name('name')
# foo = driver.find_elements_by_tag_name('name')
# foo = [i.get_attribute('href') for i in driver.find_elements_by_css_selector('[id*=name]')]
# foo = [i.get_attribute('href') for i in driver.find_elements_by_css_selector('[class*=name]')]
# foo = [i.get_attribute('href') for i in driver.find_elements_by_css_selector('[id*=clamp-two-lines]')]
# foo = WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, '//*[@id="printListOuter"]//ul[@class="clamp-two-lines"]/li')))
print(foo)
driver.quit()

print("Time to run: " + str(round(timer() - start_time,4)) + "s")

我的研究:

  1. Selenium only returns an empty list
  2. Selenium find_elements_by_css_selector returns an empty list
  3. Web Scraping Python (BeautifulSoup,Requests)
  4. Get HTML Source of WebElement in Selenium WebDriver using Python
  5. How to get Inspect Element code in Selenium WebDriver
  6. Web Scraping Python (BeautifulSoup,Requests)
  7. https://chrisalbon.com/python/web_scraping/monitor_a_website/
  8. https://www.codementor.io/@gergelykovcs/how-and-why-i-built-a-simple-web-scrapig-script-to-notify-us-about-our-favourite-food-fcrhuhn45
  9. https://www.tutorialspoint.com/python_web_scraping/python_web_scraping_dynamic_websites.htm

最佳答案

要获取文本,请等待元素的可见性。标题的 CSS 选择器是 #printListOuter h3:

titles = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, '#printListOuter h3')))

for title in titles:
    print(title.text)

较短的版本:

wait = WebDriverWait(driver, 10)
titles = [title.text for title in wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, '#printListOuter h3')))]

关于python - Selenium webdriver 从 find_elements_by_X 返回空列表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59868524/

相关文章:

Python Pandas : applying a specific function to each row

java - POPUP 窗口关闭后面临的问题,父窗口元素无法单击

java - 从 jenkins 运行 selenium testNG 时,chrome 不会最大化

java - 我可以将java测试复制到Katalon studio吗

java - 无法使用 Java 和 Selenium 通过部分链接文本查找元素

java - 指定ChromeDriver运行的端口

python - 计算累计返回

python - 我类的数组给我一个错误……AttributeError : 'set' object has no attribute 'index'

Javascript 测试 : Selenium cookies data url

python - PyCharm 在 2016.1.2 中不再自动完成 Django 模型查询