python - 如何使用 beautifulsoup4 选择除某个 html 元素之外的所有内容?

标签 python html-parsing beautifulsoup

示例:

import bs4

html = '''
<div class="short-description std ">
<em>Android Apps Security</em> provides guiding principles for how to 
best design and develop Android apps with security in mind. The book explores 
techniques that developers can use to build additional layers of security into 
their apps beyond the security controls provided by Android itself.             
<p class="scroll-down">∨ <a href="#main-desc" onclick="Effect.ScrollTo(
'main-desc', { duration:'0.2'}); return false;">Full Description</a> ∨</p></div>
'''
soup = bs4.BeautifulSoup(html)

如何从 soup 获取以下内容(一个 beautifulsoup 对象)?

<div class="short-description std ">
<em>Android Apps Security</em> provides guiding principles for how to 
best design and develop Android apps with security in mind. The book explores 
techniques that developers can use to build additional layers of security into 
their apps beyond the security controls provided by Android itself.             
</div>

最佳答案

只需搜索即可:

soup.find('p', class_='scroll-down')

我使用类来限制查找,但由于没有其他 p 元素,所以这里有点多余。

如果您需要删除该标签,请先使用上述方法找到它,然后调用.extract()将其从文档中删除:

>>> soup.find('p', class_='scroll-down').extract()
<p class="scroll-down"> <a href="#main-desc" onclick="Effect.ScrollTo(
'main-desc', { duration:'0.2'}); return false;">Full Description</a> </p>
>>> print soup

<div class="short-description std ">
<em>Android Apps Security</em> provides guiding principles for how to 
best design and develop Android apps with security in mind. The book explores 
techniques that developers can use to build additional layers of security into 
their apps beyond the security controls provided by Android itself.             
</div>

有两件事:删除的标签是从 .extract() 方法返回的,您可以保存它以供以后使用。该标签已从文档中完全删除,如果您仍然需要将其添加到文档中,则必须稍后手动重新添加它。

或者,您可以使用 .decompose() method ,这会从文档中完全删除标签,而不返回引用。然后标签就永远消失了。

关于python - 如何使用 beautifulsoup4 选择除某个 html 元素之外的所有内容?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13493579/

相关文章:

python - Pandas 数据帧 : Is there a difference in performance in replacing values by column and row?

python - python中圆圈内均匀间隔点的生成器

python - 如何使用 python 和 beautifulsoup4 循环和抓取多个页面的数据

javascript - 使用 python 2.7、beautiful soup 和 selenium 抓取 asp 和 javascript 生成表

python - 无法使用 Drive API 授权 App Engine

python - 如果您使用 'with' 语句打开一个文件,您还需要关闭文件对象吗?

python - split ("\n") 和 splitlines() 都无法拆分字符串

Python:如何向 re.sub() 的替换参数添加计数器

python - 使用 BeautifulSoup 在 python 中解析 Google App Engine 中的 HTML?

Python、BeautifulSoup 或 LXML - 使用 CSS 标签从 HTML 解析图像 URL