python - python 迭代美丽汤结果集

标签 python html python-2.7 beautifulsoup html-parsing

我正在努力从网站( http://sports.yahoo.com/nfl/players/8800/ )上抓取数据，为此我使用 urllib2 和 BeautifulSoup。我现在的代码如下所示:

site=  'http://sports.yahoo.com/nfl/players/8800/'
response = urllib2.urlopen(site)
html = response.read()
soup = BeautifulSoup(html)
rushing=[]
passing=[]
receiving=[]

#here is where my problem arises
for elem in soup.find_all('th', text=re.compile('2008')):
        passing = elem.parent.find_all('td', class_=re.compile('10'))
        rushing = elem.parent.find_all('td', class_=re.compile('20'))
        receiving = elem.parent.find_all('td', class_=re.compile('30'))

此页面上存在 soup.find_all(...'2008')) 部分的三个实例，并且当单独打印该部分时，每个实例都会出现。不过，运行这个 for 循环只会运行一次循环。如何确保循环运行三次？

最佳答案

据我了解，您需要 extend()您在循环之前定义的列表:

rushing = []
passing = []
receiving = []

for elem in soup.find_all('th', text=re.compile('2008')):
    passing.extend([td.text for td in elem.parent.find_all('td', class_=re.compile('10'))])
    rushing.extend([td.text for td in elem.parent.find_all('td', class_=re.compile('20'))])
    receiving.extend([td.text for td in elem.parent.find_all('td', class_=re.compile('30'))])

print passing
print rushing
print receiving

打印:

[u'3']
[u'19', u'58', u'14.5', u'3.1', u'0']
[u'2', u'17', u'4.3', u'8.5', u'11', u'6.5', u'0']

关于python - python 迭代美丽汤结果集，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/28390597/

上一篇：python - 在Python中，While循环具有多个条件，直到满足一个条件

下一篇：python - 将 web scraper、scrapy 0.24 移植到 python 3。或者使用更好的东西

python - 访问 lmfit 中的 params 属性

html - 我应该使用 <table> 还是 <ul>？

javascript - 昆图斯错误 "has no method animations"

python - Google应用程序引擎本地主机服务器错误python

python - 需要帮助了解将 N*N 矩阵原地旋转 90 度的解决方案是如何工作的

html - IE 7 问题 : Super simple page - TD height not taking effect

python - 如何使 Tkinter KeyRelease 事件始终提供大写字母？

python - 检查一个数据帧与另一个数据帧并返回错误

python - 任何 sklearn 模块都可以在 k 重交叉验证中返回负类的平均精度和召回分数吗？