python - 如何获取出现在特定 <h2> 之后的 <a href>？

标签 python html html-parsing beautifulsoup

这是网页的布局:

<h2>Featured Ads</h2>
<a href=""></a>

<h2>Ads</h2>
<a href=""></a>

class 中没有任何内容我可以用来区分它们的常规广告。仅返回 <a href> 的有效方法是什么？出现在 <h2>Ads</h2> 之后？

更新:

这是最终的代码

h2 = soup.find("h2", text="Ads")
articles = h2.find_next_siblings("article")

for article in articles:
    for div in article.find_all('div', {'class': 'address'}):
        for link in div.find_all('a', href=True):
            print (link['href'])

更新 2:必须重构...

articles = soup.find("h2", text="Ads").find_next_siblings("article")
for article in articles:
    ad_url = article.find('a', href=True)['href']

最佳答案

找到 h2 元素和 find the next a sibling :

h2 = soup.find("h2", text="Ads")
a = h2.find_next_sibling("a")

关于python - 如何获取出现在特定 <h2> 之后的 <a href>？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/33336881/

上一篇：python - Python 正则表达式匹配结果中缺少组

下一篇：python - 如何使用 SqlAlchemy 通过联接构建计数聚合？

相关文章：

html - 使视频背景变暗

python - 解析 HTML、Python 中特定标签下的文本

html - 如何使用 CSS 在多行文本上创建波浪下划线？

javascript - 我应该如何显示带有 XML 数据源的表？

android - 如何用jsoup解析简单的html代码？安卓

php - 如何用php写这个爬虫？

Python PrettyPrint 输出到变量

python - 读取二进制文件并遍历每个字节

python - 为什么我会收到错误的文件描述符错误？

python - SAT 求解器 SAGE