python - 使用 beautifulsoup4 从 div 获取文本

标签 python html web-scraping beautifulsoup

我想使用 python 和 bs4 从以下 html 代码中仅提取地名。

<div class="results-list" id="theaterlist">
 <table>
  <tr class="trspacer">
   <td>
    <a href="theater.aspx?id=4000642">
     <h2 class="placename">
      Hyde Park
      <span class="boldelement">
      Richmond Avenue 56 ls61bz
      </span>
     </h2>
    </a>

我正在使用以下代码，但我也得到了地址。

mydivs = soup.find("div", {"id": "theaterlist"})
lis = mydivs.select("a[href*=theater.aspx]")
for x in lis:
    theater = x.find('h2', class_='placename')
    print theater.text

如有任何帮助，我们将不胜感激。

最佳答案

要仅获取元素(而不是子元素)的文本，您可以使用.find(text=True):

data = """
<div class="results-list" id="theaterlist">
 <table>
  <tr class="trspacer">
   <td>
    <a href="theater.aspx?id=4000642">
     <h2 class="placename">
      Hyde Park
      <span class="boldelement">
      Richmond Avenue 56 ls61bz
      </span>
     </h2>
    </a>
"""

from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'lxml')
print(soup.find('h2').find(text=True).strip())

打印:

Hyde Park

关于python - 使用 beautifulsoup4 从 div 获取文本，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/51389843/

上一篇：python - 如何显示预测类别的名称

下一篇：python - 如何制作精确的节拍器？

php - wp_nav_menu 没有应用 css

python - sibling 困惑后的 Scrapy？

python - Beautiful Soup WebScraping 错误 - ResultSet 对象没有属性 '%s'

python - 使用 BeautifulSoup 提取特定标题下的文本

python - Django/python 测试 django 表单

python - 使用 imp 模块在 python 中传递动态导入

python - py2neo 的 WriteBatch 操作失败

javascript - 使用 javascript 添加新的 div 元素

html - Bootstrap 可折叠面板在添加样式时不起作用