lxml 很好地支持 Unicode 元素名称,因为它们根据 XML 规范是有效的。但是在 XPath 中使用 Unicode 会产生错误:
>>> import lxml.etree
>>> e = lxml.etree.fromstring('<?xml version="1.0" encoding="UTF-8"?><элемент>текст</элемент>'.encode('utf-8'))
>>> e.xpath('/элемент/text()')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "lxml.etree.pyx", line 1509, in lxml.etree._Element.xpath (src/lxml/lxml.etree.c:50702)
File "xpath.pxi", line 318, in lxml.etree.XPathElementEvaluator.__call__ (src/lxml/lxml.etree.c:145954)
File "xpath.pxi", line 238, in lxml.etree._XPathEvaluatorBase._handle_result (src/lxml/lxml.etree.c:144962)
File "xpath.pxi", line 224, in lxml.etree._XPathEvaluatorBase._raise_eval_error (src/lxml/lxml.etree.c:144817)
lxml.etree.XPathEvalError: Invalid expression
这是 lxml 限制吗?我在文档中找不到它,但也许我错过了。
有人可以解释一下这背后的原因吗?
<小时/> 更新: 仅当 XPath 的第二个字符是西里尔字母时,问题才会重现。它适用于:相对路径,例如
//элемент
第一个英文字母的路径,例如
//qлемент
/./элемент
而不是/элемент
(它们是等效的)
而且,这似乎是 libxml2
问题,而不仅仅是 lxml
问题。
$ xmlstarlet sel -t -v "/элемент/text()" test.xml
Invalid expression: /элемент/text()
compilation error: element with-param
XSLT-with-param: Failed to compile select expression '/элемент/text()'
$ xmlstarlet sel -t -v "/./элемент/text()" test.xml
текст
我放弃了这个问题,转而使用 /./
来获取带有西里尔字母标记的绝对 XPath。
最佳答案
如果引用根节点,您的 XPath 缺少 /
:
>>> e.xpath('//элемент/text()')
['текст']
或者两个点..
(如果引用相对父节点):
>>> e.xpath('../элемент/text()')
['текст']
关于python - "lxml.etree.XPathEvalError: Invalid expression"带有 Unicode 元素名称,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29689078/