python - 使用 text() 时 lxml 谓词无效

标签 python html lxml

我正在使用lxml要进行 HTML 屏幕抓取，我需要通过 text() 选择一个元素，类似于 what is done on another question with pure XML ，但是无论发生什么，我都会收到无效谓词错误。我已将其简化为以下示例:

import lxml.html
sample_html = "<div><h2>test string</h2><h2>other string</h2></div>"
sample_tree = lxml.html.fromstring(sample_html)
sample_tree.findall('.//h2[text()="test string"]')

虽然这应该是有效的，但我不断收到错误:

  File "<string>", line unknown
SyntaxError: invalid predicate

有关如何在解析 HTML 时正确让 lxml 通过 text() 选择元素的任何提示吗？

最佳答案

表达式本身有效，但您必须使用 .xpath() 方法:

sample_tree.xpath('.//h2[text()="text string"]')

请注意，您也可以使用. in place of text()在这种情况下:

.//h2[. = "text string"]

关于python - 使用 text() 时 lxml 谓词无效，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/43958197/

上一篇：python - Django 和 mod_wsgi python 版本？

下一篇：Python - 猜词游戏对具有重复字符的单词失败

python - 如何使用 python 和 lxml 检索某些子元素

python - 运行时警告 : invalid value encountered in greater

Python Popen.subprocess 退出状态0，但程序在子进程成功完成后立即退出

html - 响应式 Div 布局重排

python - 如何查看 lxml 元素的文本表示？

python - 打开文件时lxml中的密码错误

python - 使用 Python PIL 的隐写算法

python - OpenCV 中的缩放功能 imshow 在 Windows 中

javascript - 如何防止掉入 div 元素？