python - 如何匹配 XPath (lxml) 中元素的内容？

我想使用 XPath 表达式通过 lxml 解析 HTML。我的问题是匹配标签的内容:

例如给定

<a href="http://something">Example</a>

元素我可以使用匹配href属性

.//a[@href='http://something']

但是给定的表达式

.//a[.='Example']

甚至

.//a[contains(.,'Example')]

lxml 抛出“无效节点谓词”异常。

我做错了什么？

编辑:

示例代码:

from lxml import etree
from cStringIO import StringIO

html = '<a href="http://something">Example</a>'
parser = etree.HTMLParser()
tree   = etree.parse(StringIO(html), parser)

print tree.find(".//a[text()='Example']").tag

预期输出为“a”。我得到“SyntaxError:无效的节点谓词”

最佳答案

我会尝试:

.//a[text()='Example']

使用 xpath() 方法:

tree.xpath(".//a[text()='Example']")[0].tag

如果您想使用 iterfind()、findall()、find()、findtext()，请记住 值比较和函数 等高级功能在 ElementPath 中不可用。

lxml.etree supports the simple path syntax of the find, findall and findtext methods on ElementTree and Element, as known from the original ElementTree library (ElementPath). As an lxml specific extension, these classes also provide an xpath() method that supports expressions in the complete XPath syntax, as well as custom extension functions.

关于python - 如何匹配 XPath (lxml) 中元素的内容？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/2637760/

上一篇：python - Python urllib.urlretrieve() 和 wget 的区别

下一篇：python - 如何使用 gstreamer 查找媒体的长度？

相关文章：

XPath:根路径表达式开头的 «/» 运算符与什么完全匹配？

python - omegle lxml 抓取不起作用

jquery - Selenium 中的 XPath 定位器与 JQuery 定位器

asp.net - 使用 Xpath 检索元素的文本内容

python - 如何提取lxml中指定的div表数据？

plone - LXML 无法在 Plone 4.3 64 位 (MS Windows) 上安装

Python Sphinx 排除模式

python (django) 数据库返回结果 (u'string')

python - numpy 数组的矢量化 "by-layer"缩放

python - 从单页模板生成多个 OpenOffice 页面