Python - 使用 BeautifulSoup 进行抓取不显示所有行

我是 BeautifulSoup 的新手。我正在尝试从 ESPN Fantasy Basketball Standings 中抓取“赛季统计数据”表，但并非返回所有行。经过一番研究，我认为可能是html.parser的问题，所以我使用了lxml。我得到了相同的结果。如果有人能告诉我如何获得所有团队名称，我将不胜感激。

我的代码:

from bs4 import BeautifulSoup
from urllib.request import urlopen

soup = BeautifulSoup(urlopen("http://games.espn.com/fba/standings?leagueId=20960&seasonId=2017"),'html.parser')
tableStats = soup.find("table", {"class" : "tableBody"})
for row in tableStats.findAll('tr')[2:]:
    col = row.findAll('td')

    try:
        name = col[0].a.string.strip()
        print(name)
    except Exception as e:
        print(str(e))

输出(如您所见，只显示了几个团队名称):

勒图克灰熊队佩顿·乌鸦天鹫凡尔赛金熊巴尔的摩科托的穆雷特拾荒者 XO 斑鱼

最佳答案

你似乎理解错误了table共。而不是运行 find()对于<table>标签，您可以使用 findAll()相反，并寻找具有整个排名的正确表格。我还注意到统计表有一个特殊的表 id叫statsTable 。寻找这个是个好主意 id而不是 class因为它是 HTML 文件所特有的。

请查看以下代码中的注释以获取更多指南，

from bs4 import BeautifulSoup
import requests
# Note, I'm using requests here as it's a superior library
text = requests.get("http://games.espn.com/fba/standings?leagueId=20960&seasonId=2017").text
soup = BeautifulSoup(text,'html.parser')
# searching by id, always a better option when available
tableStats = soup.find("table", {"id" : "statsTable"})
for row in tableStats.findAll('tr')[3:]:
    col = row.findAll('td')
    try:
        # This fetches all the text in the tag stripped off all the HTML
        name = col[1].get_text()
        print(name)
    except Exception as e:
        print(str(e))

关于Python - 使用 BeautifulSoup 进行抓取不显示所有行，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/41202102/

Python - 使用 BeautifulSoup 进行抓取不显示所有行

上一篇：python - 查询嵌套的 python 对象

下一篇：python - 通过ajax更新可变长度的html列表