Python Beautiful Soup 网页抓取特定数字

标签 python html web-scraping beautifulsoup html-parsing

关于this page每个团队的最终得分(数字)具有相同的类(class)名称class="finalScore"

当我计算客队的最终得分(顶部)时,代码可以毫无问题地调用该数字。如果...favLastGM = 'A'

当我尝试调用主队的最终比分(底部)时,代码给出了错误。如果...favLastGM = 'H'

下面是我的代码:

import pickle
import math
import urllib2
from lxml import etree
from bs4 import BeautifulSoup
from urllib import urlopen

#Last Two Game info Home [H] or Away [A]
favLastGM = 'A' #Higher week number 2

#Game Info (Favorite) Last Game Played - CBS Sports (Change Every Week)
favPrevGMInfoUrl = 'http://www.cbssports.com/nfl/gametracker/boxscore/NFL_20140914_NE@MIN'
favPrevGMInfoHtml = urlopen(favPrevGMInfoUrl).read()
favPrevGMInfoSoup = BeautifulSoup(favPrevGMInfoHtml)
if favLastGM == 'A': #This Gives Final Score of Away Team - Away Score
    favScore = favPrevGMInfoSoup.find_all("td", { "class" : "finalScore" })
elif favLastGM == 'H':
    favScore = favPrevGMInfoSoup.find_all("td", { "class" : "finalScore" })[1]
else:
    print("***************************************************")
    print("NOT A VALID ENTRY - favLastGM  !")
    print("***************************************************")


print ("Enter: Total Points Allowed from Favored Team Defense for last game played: "),
print favScore[0].text

这是我在 favLastGM = 'H' 时收到的错误

Traceback (most recent call last): File "C:/Users/jcmcdonald/Desktop/FinalScoreTest.py", line 26, in print favScore[0].text File "C:\Python27\lib\site-packages\bs4\element.py", line 905, in getitem return self.attrs[key] KeyError: 0

最佳答案

class="finalScore"只有两个元素,第一个是主队得分,第二个是客队得分:

>>> from urllib import urlopen
>>> from bs4 import BeautifulSoup
>>> 
>>> favPrevGMInfoUrl = 'http://www.cbssports.com/nfl/gametracker/boxscore/NFL_20140914_NE@MIN'
>>> 
>>> favPrevGMInfoSoup = BeautifulSoup(urlopen(favPrevGMInfoUrl))
>>> score = [item.get_text() for item in favPrevGMInfoSoup.find_all("td", {"class": "finalScore"})]
>>> score
[u'30', u'7']

仅供引用,您可以使用 CSS selector 而不是 .find_all("td", {"class": "finalScore"}) : .select("td.finalScore").

关于Python Beautiful Soup 网页抓取特定数字,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30810934/

相关文章:

python - 我应该在哪里导入 urllib2 来上课?

python - Scrapy 蜘蛛在第一次请求 start_urls 后关闭

Linux 上的 Python 模块错误

python - Pandas Group 2-D NumPy 数据(按值范围)

javascript - 无法将我的菜单选项项居中

jquery - 在滚动 : move up content div 上

javascript - Selenium 找到的元素缺少一些属性

python - 从网站抓取某些字段时无法继续单击下一页按钮

python - Django - 跨查询集的 DatetimeField 时间聚合

jquery - 调整窗口大小时 div 的高度不正确