我正在尝试使用 Beautiful Soup 从 URL 中提取数字,然后对这些数字求和的代码,但我不断收到如下所示的错误:
Expected string or buffer
我认为这与正则表达式有关,但我无法确定问题所在。
import re
import urllib
from BeautifulSoup import *
htm1 = urllib.urlopen('https://pr4e.dr-chuck.com/tsugi/mod/python-data/data/comments_42.html').read()
soup = BeautifulSoup(htm1)
tags = soup('span')
for tag in tags:
y = re.findall ('([0-9]+)',tag.txt)
print sum(y)
最佳答案
我推荐 bs4
而不是 BeautifulSoup
(旧版本)。您还需要更改此行:
y = re.findall ('([0-9]+)',tag)
像这样:
y = re.findall ('([0-9]+)',tag.text)
看看这是否能让你更进一步:
sum = 0 #initialize the sum
for tag in tags:
y = re.findall ('([0-9]+)',tag.text) #get the text from the tag
print(y[0]) #y is a list, print the first element of the list
sum += int(y[0]) #convert it to an integer and add it to the sum
print('the sum is: {}'.format(sum))
关于python - "Expected string or buffer"使用 Beautiful Soup 时出错,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33929317/