javascript - 使用 Python/selenium 抓取网页内容

标签 javascript python-3.x selenium web-scraping selenium-chromedriver

我正在尝试抓取 table 的内容.我相信表格是用 JavaScript 呈现的，所以我使用的是 selenium 包和 Python3。要完成这样的任务，我见过 others找到表 xpath 以抓取其内容，但我不确定如何识别正确的 xpath。

如何提取表格内容？如果使用 xpath，我如何通过检查网页的源代码来识别与表或其内容对应的正确 xpath？

from selenium import webdriver                                                                                                                                                                                                                                              
driver = webdriver.Chrome('path/to/chromedriver.exe')                                      
url = https://ultrasignup.com/results_event.aspx?did=6727
driver.get(url)

# Now I need to get the tables contents. I might do something like this:
table = driver.find_elements_by_xpath('my_xpath')
table_html = table.get_attribute('innerHTML') # not sure what innerHTML is...
df = read_html(table_html)[0]
print(df)
driver.close()

最佳答案

我相信没有必要去抓取，因为他们有 API。

如果您访问此链接，您将看到来自您提供的表格的格式良好的数据:https://ultrasignup.com/service/events.svc/results/6727/json

部分代码:

import json, requests

url = 'https://ultrasignup.com/service/events.svc/results/6727/json'

response = requests.get(url)

# Get all people from the table
people = [x for x in response.json()] 

# Print first person's information
print(people[0])

希望对您有所帮助!

关于javascript - 使用 Python/selenium 抓取网页内容，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/56726544/

上一篇：javascript - 寻找一种通过 Nodejs 在 HTML 中运行 Javascript 函数的方法

下一篇：javascript - </in <script> 标记作为以 < 结尾的 javascript 正则表达式的一部分

Selenium IDE : How to use pattern checking for a dynamic id using XPath

c# - 如何在 Selenium 中使用 CSS 选择器查找非根元素的直接后代？

javascript - 删除屏幕尺寸上的 esc 功能

javascript - 如何在表格末尾保留一个按钮

javascript - 观察 HTMLElements 上的隐式大小变化

python - 如果子href符合要求，如何点击父类

javascript - HTML SELECT JS onchange() 禁用注释框

Python 无法从文件创建变量

python-3.x - Python pip install pyarrow 错误，无法执行 'cmake'

javascript - 使用 Python/selenium 抓取网页内容

上一篇：javascript - 寻找一种通过 Nodejs 在 HTML 中运行 Javascript 函数的方法

下一篇：javascript - </in &lt;script&gt; 标记作为以 < 结尾的 javascript 正则表达式的一部分

下一篇：javascript - </in <script> 标记作为以 < 结尾的 javascript 正则表达式的一部分