python - BeautifulSoup，findAll 之后 findAll？

我对 Python 还很陌生，主要需要它来从网站获取信息。在这里，我尝试从网站底部获取短标题，但无法完全获取它们。

from bfs4 import BeautifulSoup
import requests

url = "http://some-website"
r = requests.get(url)
soup = BeautifulSoup(r.content, "html.parser")

nachrichten = soup.findAll('ul', {'class':'list'})

现在我需要另一个 findAll 来从 var“nachrichten”获取所有链接/a，但我该怎么做？

最佳答案

如果您希望将所有链接放在一个列表中，请使用 css 选择器 和 select:

anchors = soup.select('ul.list a')

如果您想要单独的列表:

anchors = [ ul.find_all(a) for a in soup.find_all('ul', {'class':'list'})]

此外，如果您想要 href，您可以确保只找到具有 href 属性的 anchor 并提取:

hrefs = [a["href"] for a in soup.select('ul.list a[href]')]

使用 find_all 设置 href=True 即 ul.find_all(a, href=True) 。

关于python - BeautifulSoup，findAll 之后 findAll？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/39478865/

上一篇：python - SQLAlchemy中如何删除多个表

下一篇：python线性回归实现

python - PyQt5 加载微调器在发布请求时停止

python - Matplotlib 中 x 轴标签的频率和旋转

python - snimpy的Session查询花费太多时间

python - BeautifulSoup 循环不迭代其他节点

python - 使用 beautifulsoup4 缺失单元格进行表抓取

python - 安装包 Beautiful Soup 失败。错误消息是 "SyntaxError: Missing parentheses in call to ' print'"

python - 读取列中的数据 Python 2.7.3

python - Doctest python 中的私有(private)方法

Python 干+请求 : Not switching circut/changing IP address when using a session