我正在使用 Beautifulsoup (python3.x) 解析 HTML 页面 我正在尝试从我为其编写的 < p> 标签获取数据
def getBody(url):
html_page = requests.get(url)
soup = BeautifulSoup(html_page.content, 'html.parser')
Con = "".join([p.text for p in soup.find_all("p")])
#print(Con)
return Con
但这样做时我从下面的 htmltag 中获取文本。我怎样才能删除这个?
<p class="notice">Comments are closed for this article.</p>
最佳答案
您可以使用decompose()
或extract()
删除标签。
>>> from bs4 import BeautifulSoup
>>> html = '''
... <p>text</p>
... <p class="notice">Comments are closed for this article.</p>
... <p>text</p>
... <p class="notice">Comments are closed for this article.</p>
... <p>text</p>'''
>>> soup = BeautifulSoup(html, 'html.parser')
>>> for tag in soup.find_all('p', class_='notice'):
... tag.decompose()
...
>>> soup
<p>text</p>
<p>text</p>
<p>text</p>
关于python - 如何删除特定类别的标签?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50125972/