python - 链接抓取错误

标签 python web-scraping beautifulsoup

url = "https://www.cnn.com/"

response = requests.get(url)

soup = BeautifulSoup(response.text, "html.parser")

links = []

for link in soup(response).find_all("a", href=True):
    links.append(link["href"])

for link in links:
    print(links)

AttributeError:ResultSet 对象没有属性“find_all”。您可能将元素列表视为单个元素。当您打算调用 find() 时，您是否调用了 find_all() ？

我不太确定为什么会收到此错误，我正在尝试从该网站抓取所有 href/链接。

最佳答案

不需要调用soup(response)，直接在soup soup上调用find_all即可。 Soup 已经有了第 5 行的响应信息，所以它是多余的。

# Replace this:
for link in soup(response).find_all("a", href=True):

# With this
for link in soup.find_all("a", href=True):

关于python - 链接抓取错误，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/74426805/

上一篇：python - 加载和保存 JPEG 图像会导致不同的文件内容

下一篇：r - 无法使用 ggplot2 绘制多线图

相关文章：

python - 通过 POP3 获取邮件，但将它们保存在服务器上

python - 高效查询字段范围内缺失的整数？

python - 为什么在 logging.info 中显示名称错误？

python-2.7 - 如何在Python中完成递归异步作业后调用函数？

Python lxml xpath - 返回所有标签而不是所选标签

python - 尝试使用 beautifulsoup 抓取页面，获取大量我想消除的元素数据(我假设)

python - SQLAlchemy 模型分配关系不可预测

python - 如何在 BeautifulSoup 链接后附加标签

python - 如何替换 beautifulSoup 中的特定字符？

python - 我如何只能使用 python 脚本从 json api 接收来自选定序列号的数据