python - BeautifulSoup:根据前面标签的内容打印 div

标签 python web-scraping beautifulsoup html-parsing coordinates

我想根据前面的标签选择元素的内容:

<h4>Models &amp; Products</h4>
    <div class="profile-area">...</div>

<h4>Production Capacity (year)</h4>
    <div class="profile-area">...</div>

如何根据前面标签的内容得到“profile-area”的值？

这是我的代码:

import requests
from bs4 import BeautifulSoup
import csv
import re

html_doc = """
<html>
<body>
  <div class="col-md-6">
    <iframe class="factory_detail_google_map" frameborder="0" src=
    "https://www.google.com/maps/embed/v1/search?q=3.037787%2C101.38189&amp;key=AIzaSyCMDADp9QHYbQ8OBGl8puAOv-16W8ziz7Y"
    allowfullscreen=""></iframe>
  </div>

  <div class="col-md-12">
    <h4>Models &amp; Products</h4>

    <div class="profile-area">
      Large Buses, Trucks, Trailer-heads
    </div>

    <h4>Production Capacity (year)</h4>

    <div class="profile-area">
      Vehicle 700 units /year
    </div>

    <h4>Output</h4>

    <div class="profile-area">
      Vehicle 356 units ( 2016 )
    </div>

    <div class="profile-area">
      Vehicle 477 units ( 2015 )
    </div>

    <div class="profile-area">
      Vehicle 760 units ( 2014 )
    </div>

    <div class="profile-area">
      Vehicle 647 units ( 2013 )
    </div>
  </div>
</body>
</html>
"""
soup = BeautifulSoup(html, 'lxml')

#link=soup.iframe.get('src')
#print(link.split("%2C"))

for item in soup.select("div.profile-area"):
    print(item.text)

如您所见，我也在尝试将 Google map 链接拆分为坐标，但这可能需要我自己解决。

感谢您的帮助!

最佳答案

使用 .find_previous_sibling() 显式查找前面的第一个 h4 标记:

for item in soup.select("div.profile-area"):
    prev_h4 = item.find_previous_sibling('h4').text
    if 'Capacity' in prev_h4:
        print(item.text)

输出

Vehicle 700 units /year

关于python - BeautifulSoup:根据前面标签的内容打印 div，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/50589151/

上一篇：具有多个字符的 Python 2.7 格式字符串填充

下一篇：python - 根据不同的数组对随机生成的 Numpy 数组进行排序

相关文章：

javascript - Puppeteer - 评估方法中的异步函数引发错误

python - text.replace(punctuation ,'' ) 不会删除 list(punctuation) 中包含的所有标点符号？

python - 当我使用 BeautifulSoup .findAll 时如何获取下一个 div？

python - 从 numpy 数组创建 Pandas 数据框并使用数组的第一列作为索引

python - glpk.LPX 向后兼容性？

python - Django 表单不显示

python - 如何减小 Python 创建的 txt 文件的大小？

c# - html敏捷得不到结果

python - BeautifulSoup 的 for 循环出现问题

python - 如何在 scrapy spider 的 start_urls 中发送 post 数据