python - 学习使用 Selenium 和 Python 进行抓取

我正在学习使用 selenium 进行抓取，但我在连接到此网站时遇到问题 ' http://www.festo.com/cat/it_it/products_VUVG_S?CurrentPartNo=8043720 '

它不加载网站的内容

我想了解如何连接到此网站以请求图像和数据

我的代码很简单，因为我正在学习，我寻找建立连接的方法，但没有成功

from selenium import webdriver
from selenium.webdriver.firefox.firefox_profile import FirefoxProfile

ff_profile = FirefoxProfile()
ff_profile.set_preference("general.useragent.override", "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.85 Safari/537.36")
driver = webdriver.Firefox(firefox_profile = ff_profile)
driver.get('http://www.festo.com/cat/it_it/products_VUVG_S?CurrentPartNo=8043720')
time.sleep(5)
campo_busca = driver.find_elements_by_id('of132')
print(campo_busca)

最佳答案

由于所需的元素位于 <iframe> 内因此调用提取 src所需元素的属性，您必须:

引发WebDriverWait以使所需的框架可用并切换到它。
引发WebDriverWait以获得所需的visibility_of_element_located() .

您可以使用以下 Locator Strategies :

driver = webdriver.Firefox(executable_path=r'C:\Utility\BrowserDrivers\geckodriver.exe')
driver.get('http://www.festo.com/cat/it_it/products_VUVG_S?CurrentPartNo=8043720')
WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//iframe[@id='CamosIFId' and @name='CamosIF']")))
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//img[@id='of132']"))).get_attribute("src"))

但是，正如 @google 提到的其中一条评论中所提到的，ChromeDriver/Chrome 的浏览体验似乎更好，您可以使用以下解决方案:

options = webdriver.ChromeOptions() 
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get('http://www.festo.com/cat/it_it/products_VUVG_S?CurrentPartNo=8043720')
WWebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe#CamosIFId[name='CamosIF']")))
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "img#of132"))).get_attribute("src"))

注意:您必须添加以下导入:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

控制台输出:

https://www.festo.com/cfp/camosHtml/i?SIG=0020e295a546f45d9acb6844231fd8ff31ca817a_64_64.png

Here you can find a relevant discussion on Ways to deal with #document under iframe

关于python - 学习使用 Selenium 和 Python 进行抓取，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59824139/

python - 学习使用 Selenium 和 Python 进行抓取

上一篇：python - spacy-udpipe 与 pytextrank 从非英语文本中提取关键字

下一篇：python - View 未返回 HttpResponse 对象