python - 使用 beautiful soup 来解析给定 html 结构中的 href

我有以下给定的 html 结构

<li class="g">
 <div class="vsc">    
  <div class="alpha"></div>
  <div class="beta"></div>
  <h3 class="r">
   <a href="http://www.stackoverflow.com"></a>
  </h3>
 </div>
</li>

上面的html结构不断重复，使用BeautifulSoup和Python从上面的html结构解析所有链接(stackoverflow.com)的最简单方法是什么？

最佳答案

BeautifulSoup 4使用 CSS 选择器提供了一种实现此目的的便捷方法:

from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
print [a["href"] for a in soup.select('h3.r a')]

这还具有通过上下文限制选择的优点:它仅选择属于 r 类 h3 节点子节点的 anchor 节点。

只需调整选择器即可轻松省略约束或选择最适合需要的约束；请参阅CSS selector docs为此。

关于python - 使用 beautiful soup 来解析给定 html 结构中的 href，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/13191550/

上一篇：python - 使用 python 中的字符串索引从文件名返回文件扩展名

下一篇：python - 在Python中使用装饰器注入(inject)函数

相关文章：

python - 将表格抓取到字典列表 BeautifulSoup

python-3.x - 如何用BeautifulSoup4解析表格并优雅打印？

Python - 初学者用 Beautiful Soup 4 抓取 - onmouseover

python - 使用 Google 的 TensorFlow 添加额外的隐藏层

python - 为什么这些 C/Cython 数组定义为字符，而不是整数数组？

python - 如何在 Python 中对对象进行排序

python - 使用 for 循环创建字典

python - 找不到模块 beautifulsoup

python - 如果线程尚未运行则启动它

python - 如何将爬取的数据横向导出到Excel？