python - 在 python 中使用 BeautifulSoup 时出错 : ValueError: invalid literal for int() with base 10: 'xBB'

标签 python beautifulsoup

以下代码在我的机器上运行良好,但在该行抛出错误

soup = BeautifulSoup(html)

当它在另一台机器上运行时。它正在解析 yahoo sports 的现役 NBA 球员列表,并将他们的姓名和位置存储到文本文件中。

from bs4 import BeautifulSoup
import urllib2

'''
scraping the labeled data from yahoo sports
'''
def scrape(filename):
    base_url = "http://sports.yahoo.com/nba/players?type=position&c=NBA&pos="
    positions = ['G', 'F', 'C']
    players = 0

    with open(filename, 'w') as names:
        for p in positions:
            html = urllib2.urlopen(base_url + p).read()
            soup = BeautifulSoup(html) #throws the error!
            table = soup.find_all('table')[9]
            cells = table.find_all('td')

            for i in xrange(4, len(cells) - 1, 3):
                names.write(cells[i].find('a').string + '\t' + p + '\n')
                players += 1

    print "...success! %r players downloaded." % players

它抛出的错误是:

Traceback (most recent call last):
  File "run_me.py", line 9, in <module>
    scrapenames.scrape('namelist.txt')
  File "/Users/brapse/Downloads/bball/scrapenames.py", line 15, in scrape
    soup = BeautifulSoup(html)
  File "/usr/local/Cellar/python/2.6.5/lib/python2.6/site-packages/bs4/__init__.py", line 100, in __init__
    self._feed()
  File "/usr/local/Cellar/python/2.6.5/lib/python2.6/site-packages/bs4/__init__.py", line 113, in _feed
    self.builder.feed(self.markup)
  File "/usr/local/Cellar/python/2.6.5/lib/python2.6/site-packages/bs4/builder/_htmlparser.py", line 46, in feed
    super(HTMLParserTreeBuilder, self).feed(markup)
  File "/usr/local/Cellar/python/2.6.5/lib/python2.6/HTMLParser.py", line 108, in feed
    self.goahead(0)
  File "/usr/local/Cellar/python/2.6.5/lib/python2.6/HTMLParser.py", line 171, in goahead
    self.handle_charref(name)
  File "/usr/local/Cellar/python/2.6.5/lib/python2.6/site-packages/bs4/builder/_htmlparser.py", line 58, in handle_charref
    self.handle_data(unichr(int(name)))
ValueError: invalid literal for int() with base 10: 'xBB'

最佳答案

我相信这是 BS4 htmlparser 代码中的一个错误,它会在 » 实体(代表 »)上崩溃,认为它应该是以十进制表示。我建议您在那台机器上更新 BeautifulSoup。

关于python - 在 python 中使用 BeautifulSoup 时出错 : ValueError: invalid literal for int() with base 10: 'xBB' ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11417718/

相关文章:

python - 在Python中处理后台网络操作

python - 如何检查变量是否存在于列表中(来自txt),然后逻辑地将其打印出来

python - BeautifulSoup4 在 Python 3.x 中抛出错误

python - 结合 pandas 和 shutil 时与解码相关的错误

python - Python 中的二维列表?

python - python列表中的不同值

python - 使用 beautifulSoup、Python 抓取 h3 和 div 标签中的文本

python - 美丽汤 : how to find all elements 2nd parent of which has exact attribute?

python - 用 BeautifulSoup 刮模式?

python - 将标记字符串附加到 BeautifulSoup 中的标记