Python Beautiful Soup 找不到特定的表

标签 python html python-3.x pandas beautifulsoup

我在抓取 basketball-reference.com 时遇到问题。我正在尝试访问“Team Per Game Stats”表，但似乎无法定位到正确的 div/表。我正在尝试使用 pandas 捕获表格并将其放入数据框中。

我已经尝试使用 soup.find 和 soup.find_all 来查找所有表，但是当我搜索结果时，我没有看到我要查找的表的 ID。见下文。

x = soup.find("table", id="team-stats-per_game")

import csv, time, sys, math
import numpy as np
import pandas as pd
import requests 
from bs4 import BeautifulSoup
import urllib.request


#NBA season
year = 2019

# URL page we will scraping
url = "https://www.basketball-reference.com/leagues/NBA_2019.html#all_team-stats-base".format(year)

# Basketball reference URL
html = urlopen(url)
soup = BeautifulSoup(html,'lxml')

x = soup.find("table", id="team-stats-per_game")
print(x)


Result:

None

我希望输出列出表格元素，特别是 tr 和 th 标记以定位并带入 pandas df。

最佳答案

正如 Jarett 上面提到的，BeautifulSoup 无法解析您的标签。在这种情况下，这是因为它在源代码中被注释掉了。虽然这无疑是一种业余方法，但它适用于您的数据。

table_src = html.text.split('<div class="overthrow table_container" 
id="div_team-stats-per_game">')[1].split('</table>')[0] + '</table>'

table = BeautifulSoup(table_src, 'lxml')

关于Python Beautiful Soup 找不到特定的表，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57032340/

上一篇：javascript - 当我为不同的条件设置不同的 if 语句时，为什么只输出一个 if 语句？

下一篇：python - 我正在计算观看次数，但这会导致错误

相关文章：

python-3.x - 批量读取Cifar10数据集

python-3.x - 获取pandas数据框中子节点的所有直接中间和最终父节点

python - 在 python 2.7 中对大量数字进行幂运算后，长整数是错误的

html - 没有 Visual w/MP3 Colorbox 链接？

python - Python SciPy 需要 BLAS 吗？

Javascript 清除输入字段，将焦点从一个输入字段移动到另一个输入字段

javascript - 如何在自定义表格中显示数组数据？

python - 计算每个字典键中值的数量(包括一个值)

python - 将 Python 模块安装为模块和脚本

python - DictReader 字段名与 csv 阅读器第一行