python - 获取lxml中标签内的所有文本

标签 python parsing lxml

我想编写一个代码片段来获取 <content> 中的所有文本标记，在 lxml 中，在以下所有三个实例中，包括代码标记。我试过tostring(getchildren())但这会错过标签之间的文本。我在 API 中搜索相关功能时运气不佳。你能帮帮我吗？

<!--1-->
<content>
<div>Text inside tag</div>
</content>
#should return "<div>Text inside tag</div>

<!--2-->
<content>
Text with no tag
</content>
#should return "Text with no tag"


<!--3-->
<content>
Text outside tag <div>Text inside tag</div>
</content>
#should return "Text outside tag <div>Text inside tag</div>"

最佳答案

只需使用 node.itertext() 方法，如:

 ''.join(node.itertext())

关于python - 获取lxml中标签内的所有文本，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/4624062/

上一篇：python - shell中的清屏

下一篇：python - 计算给定2个句子字符串的余弦相似度

相关文章：

python - 带dict()的SyntaxError

python - Django + mod_wsgi 致命 Python 错误 : Py_Initialize: No module named Encodings

java - 从 PDF 中解析对象，具有字节流的对象由于某种原因被忽略？

python - lxml - 如何优雅地标记文本中的单词？

没有\n的python子进程输出

python - Selenium Python - 无法定位元素 - 该网站被屏蔽了吗？ (影子 DOM)

c# - 是否可以从 base64 格式解析证书？

使用Unix解析和打印$ PATH

python - 关于如何使用 lxml 从 html 输出解析数据的简单示例

python-2.7 - src/lxml/etree_defs.h :9:31: fatal error: libxml/xmlversion. h:没有这样的文件或目录