使用 Python 3 和 BeautifulSoup 4,我希望能够从 HTML 页面中提取仅由其上方的注释描述的文本。一个例子:
<\!--UNIQUE COMMENT-->
I would like to get this text
<\!--SECOND UNIQUE COMMENT-->
I would also like to find this text
我找到了多种方法来提取页面的文本或评论,但无法实现我想要的效果。任何帮助将不胜感激。
最佳答案
您只需遍历所有可用的评论,看看它是否是您需要的条目之一,然后显示以下元素的文本,如下所示:
from bs4 import BeautifulSoup, Comment
html = """
<html>
<body>
<p>p tag text</p>
<!--UNIQUE COMMENT-->
I would like to get this text
<!--SECOND UNIQUE COMMENT-->
I would also like to find this text
</body>
</html>
"""
soup = BeautifulSoup(html, 'lxml')
for comment in soup.findAll(text=lambda text:isinstance(text, Comment)):
if comment in ['UNIQUE COMMENT', 'SECOND UNIQUE COMMENT']:
print comment.next_element.strip()
这将显示以下内容:
I would like to get this text
I would also like to find this text
关于python - 使用 BeautifulSoup 提取 HTML 注释之间的文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34673851/