python - 如何删除特定类别的标签？

我正在使用 Beautifulsoup (python3.x) 解析 HTML 页面我正在尝试从我为其编写的 < p> 标签获取数据

def getBody(url):
    html_page = requests.get(url)
    soup = BeautifulSoup(html_page.content, 'html.parser')
    Con = "".join([p.text for p in soup.find_all("p")])
    #print(Con)
return Con

但这样做时我从下面的 htmltag 中获取文本。我怎样才能删除这个？

<p class="notice">Comments are closed for this article.</p>

最佳答案

您可以使用decompose()或extract()删除标签。

>>> from bs4 import BeautifulSoup
>>> html = '''
... <p>text</p>
... <p class="notice">Comments are closed for this article.</p>
... <p>text</p>
... <p class="notice">Comments are closed for this article.</p>
... <p>text</p>'''
>>> soup = BeautifulSoup(html, 'html.parser')
>>> for tag in soup.find_all('p', class_='notice'):
...     tag.decompose()
...
>>> soup

<p>text</p>

<p>text</p>

<p>text</p>

关于python - 如何删除特定类别的标签？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/50125972/

上一篇：python - 将全局列表附加到全局列表？

下一篇：python - 在 df 中按场比赛跟踪球队记录

相关文章：

python-3.x - 如何更改音频标题？

python - BeautifulSoup 用户的 html5lib/lxml 示例？

python - 运行器中的 salt-key 命令

python - 删除图像边缘的稀疏文本

python - 如何截断小数类型并保留为小数类型而不四舍五入？

python - 如何从python中的递归函数返回值？

python - 在 python 中存储 html

python - 使用 BeautifulSoup 与基本表的选项 - 无类 ID，

python - 获取上传文件内容到wsgi

python - 我如何确保为层次结构中的每个类调用一个方法(一次，如果存在)？