python - BeautifulSoup find_all() 未找到所有请求的元素

标签 python python-2.7 beautifulsoup

我在 BeautifulSoup 中发现了一些奇怪的行为,如下面的示例所示。

import re
from bs4 import BeautifulSoup
html = """<p style='color: red;'>This has a <b>color</b> of red. Because it likes the color red</p>
<p class='blue'>This paragraph has a color of blue.</p>
<p>This paragraph does not have a color.</p>"""
soup = BeautifulSoup(html, 'html.parser')
pattern = re.compile('color', flags=re.UNICODE+re.IGNORECASE)
paras = soup.find_all('p', string=pattern)
print(len(paras)) # expected to find 3 paragraphs with word "color" in it
  2
print(paras[0].prettify())
  <p class="blue">
    This paragraph as a color of blue.
  </p>

print(paras[1].prettify())
  <p>
    This paragraph does not have a color.
  </p>

正如您所看到的,由于某种原因<p style='color: red;'>This has a <b>color</b> of red. Because it likes the color red</p>的第一段没有被 find_all(...) 拾取我不明白为什么不。

最佳答案

string 属性期望标记仅包含文本而不包含标记。如果您尝试为第一个 p 标记打印 .string,它将返回 None,因为它包含标记。

或者,为了更好地解释它,documentation说:

If a tag has only one child, and that child is a NavigableString, the child is made available as .string

If a tag contains more than one thing, then it’s not clear what .string should refer to, so .string is defined to be None

克服这个问题的方法是使用 lambda 函数。

html = """<p style='color: red;'>This has a <b>color</b> of red. Because it likes the color red</p>
<p class='blue'>This paragraph has a color of blue.</p>
<p>This paragraph does not have a color.</p>"""
soup = BeautifulSoup(html, 'html.parser')

first_p = soup.find('p')
print(first_p)
# <p style="color: red;">This has a <b>color</b> of red. Because it likes the color red</p>
print(first_p.string)
# None
print(first_p.text)
# This has a color of red. Because it likes the color red

paras = soup.find_all(lambda tag: tag.name == 'p' and 'color' in tag.text.lower())
print(paras)
# [<p style="color: red;">This has a <b>color</b> of red. Because it likes the color red</p>, <p class="blue">This paragraph has a color of blue.</p>, <p>This paragraph does not have a color.</p>]

关于python - BeautifulSoup find_all() 未找到所有请求的元素,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49338402/

相关文章:

Python/Flask 谷歌 API 集成

python - 将 Python 数据发送到 PHP 脚本

python - 如何更改 Kivy 中的背景颜色

python - 如何在 Python 中使用 json.loads 获取文本

python - 使用scrapy丢失数据

python - 使用 pyparsing 接受直到 "more than 1 white space"的所有内容

python - 使用 Python 抓取 HTML 信息

python - 在Python中抓取<table>TABLE I NEED</table>之间的所有文本

python - 如何使用 Beautiful Soup 网页抓取获取帖子 ID?

python - dict.fromkeys 是否会一遍又一遍地分配相同的引用?