python - BeautifulSoup4 在 Python 3.x 中抛出错误

我正在尝试创建一个网页抓取工具，并且我想使用 BeautifulSoup 来执行此操作。我安装了 BeautifulSoup 4.3.2，因为网站说它与 python 3.x 兼容。我用过

pip install beautifulsoup4

安装它。但是当我运行时

from bs4 import BeautifulSoup
import requests

url = input("Enter a URL (start with www): ")

link = "http://" + url

data = requests.get(link).content

soup = BeautifulSoup(data)

for link in soup.find_all('a'):

   print(link.get('href'))

我收到一条错误消息

Traceback (most recent call last):
File "/Users/user/Desktop/project.py", line 1, in <module>
  from bs4 import BeautifulSoup
File "/Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/site-packages   /bs4/__init__.py", line 30, in <module>
from .builder import builder_registry, ParserRejectedMarkup
File "/Library/Frameworks/Python.framework/Versions/3.1/lib/python3.1/site-packages/bs4/builder /__init__.py", line 308, in <module>
from .. import _htmlparser
  ImportError: cannot import name _htmlparser

最佳答案

刚刚在我这边安装了Python 3.x并测试了最新下载的BS4。没用。但是，可以在此处找到修复程序:https://github.com/il-vladislav/BeautifulSoup4 (感谢 GitHub 用户 Il Vladislav，无论你是谁)。

下载 zip 文件，覆盖 BeautifulSoup 下载中的 bs4 文件夹，然后通过 python setup.py install 重新安装它。现在在我这边可以工作了，正如您在下面的屏幕截图中看到的，在完全工作之前就出现了明显的错误。

代码:

from bs4 import BeautifulSoup
import requests

url = input("Enter a URL (start with www): ")
link = "http://" + url
data = requests.get(link).content
soup = BeautifulSoup(data)

for link in soup.find_all('a'):
   print(link.get('href'))

屏幕截图:

enter image description here

找到相关 SO 主题 here ，表明 BS4 尚未完全兼容 Python 3.x(即使在 2 年后)。

关于python - BeautifulSoup4 在 Python 3.x 中抛出错误，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/22182634/

python - BeautifulSoup4 在 Python 3.x 中抛出错误

上一篇：python - 从数据采集单元读取数据(测量计算)

下一篇：python - 使用 Scrapy 在 Craigslist 上进行递归抓取