python - 在 Python 中抓取表格时，返回一个空表格

我需要使用 Python 中的 BeautifulSoup 库通过网络抓取从网站上获取表格。来自 URL https://www.nytimes.com/interactive/2021/world/covid-vaccinations-tracker.html

当我运行此代码时，我得到一个空表:

import requests
from bs4 import BeautifulSoup
#
vaacineProgressResponse = requests.get("https://www.nytimes.com/interactive/2021/world/covid-vaccinations-tracker.html")
vaacineProgressContent = BeautifulSoup(vaacineProgressResponse.content, 'html.parser')
vaacineProgressContentTable = vaacineProgressContent.find_all('table', class_="g-summary-table  svelte-2wimac")
if vaacineProgressContentTable is not None and len(vaacineProgressContentTable) > 0:
    vaacineProgressContentTable = vaacineProgressContentTable[0]
#
print ('the table =', vaacineProgressContentTable)

输出:

the table = []

Process finished with exit code 0

下面的屏幕截图显示了网页中的表格(左侧)和相关的检查元素部分(右侧):

最佳答案

非常简单 - 这是因为您要搜索的类中有一个额外的空间。

如果将类更改为 g-summary-table svelte-2wimac，则应该正确返回标签。

以下代码应该可以工作:

import requests
from bs4 import BeautifulSoup
#
url = requests.get("https://www.nytimes.com/interactive/2021/world/covid-vaccinations-tracker.html")
soup = BeautifulSoup(url.content, 'html.parser')
table = soup.find_all('table', class_="g-summary-table svelte-2wimac")
print(table)

我也在《纽约时报》互动网站上进行了类似的抓取，空格可能非常棘手。如果您添加了额外的空格或遗漏了一个空格，则会返回空结果。

如果您找不到标签，我建议您先使用 print(soup.prettify()) 打印整个文档，然后找到您计划抓取的所需标签。确保从 BeautifulSoup 打印的内容中复制准确类名称文本。

关于python - 在 Python 中抓取表格时，返回一个空表格，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/67144542/

python - 在 Python 中抓取表格时，返回一个空表格

上一篇：Python shlex 没有右引号错误——如何处理？

下一篇：windows - 如何检测 Delphi FMX Windows 窗体中的鼠标后退和前进按钮？