示例:
import bs4
html = '''
<div class="short-description std ">
<em>Android Apps Security</em> provides guiding principles for how to
best design and develop Android apps with security in mind. The book explores
techniques that developers can use to build additional layers of security into
their apps beyond the security controls provided by Android itself.
<p class="scroll-down">∨ <a href="#main-desc" onclick="Effect.ScrollTo(
'main-desc', { duration:'0.2'}); return false;">Full Description</a> ∨</p></div>
'''
soup = bs4.BeautifulSoup(html)
如何从 soup
获取以下内容(一个 beautifulsoup 对象)?
<div class="short-description std ">
<em>Android Apps Security</em> provides guiding principles for how to
best design and develop Android apps with security in mind. The book explores
techniques that developers can use to build additional layers of security into
their apps beyond the security controls provided by Android itself.
</div>
最佳答案
只需搜索即可:
soup.find('p', class_='scroll-down')
我使用类来限制查找,但由于没有其他 p
元素,所以这里有点多余。
如果您需要删除该标签,请先使用上述方法找到它,然后调用.extract()
将其从文档中删除:
>>> soup.find('p', class_='scroll-down').extract()
<p class="scroll-down"> <a href="#main-desc" onclick="Effect.ScrollTo(
'main-desc', { duration:'0.2'}); return false;">Full Description</a> </p>
>>> print soup
<div class="short-description std ">
<em>Android Apps Security</em> provides guiding principles for how to
best design and develop Android apps with security in mind. The book explores
techniques that developers can use to build additional layers of security into
their apps beyond the security controls provided by Android itself.
</div>
有两件事:删除的标签是从 .extract()
方法返回的,您可以保存它以供以后使用。该标签已从文档中完全删除,如果您仍然需要将其添加到文档中,则必须稍后手动重新添加它。
或者,您可以使用 .decompose()
method ,这会从文档中完全删除标签,而不返回引用。然后标签就永远消失了。
关于python - 如何使用 beautifulsoup4 选择除某个 html 元素之外的所有内容?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13493579/