我正在尝试通过 xpath Text contains 来获取这样的元素。
<p><strong>Полное наименование</strong></p>
结果我收到了这个错误。
In [4]: response.xpath("//p[contains(text(),'Полное')]").extract()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-4-7e122465e645> in <module>()
----> 1 response.xpath("//p[contains(text(),'Полное')]").extract()
c:\python27\lib\site-packages\scrapy\http\response\text.pyc in xpath(self, query, **kwargs)
117
118 def xpath(self, query, **kwargs):
--> 119 return self.selector.xpath(query, **kwargs)
120
121 def css(self, query):
c:\python27\lib\site-packages\parsel\selector.pyc in xpath(self, query, namespaces, **kwargs)
226 result = xpathev(query, namespaces=nsp,
227 smart_strings=self._lxml_smart_strings,
--> 228 **kwargs)
229 except etree.XPathError as exc:
230 msg = u"XPath error: %s in %s" % (exc, query)
src\lxml\etree.pyx in lxml.etree._Element.xpath()
src\lxml\xpath.pxi in lxml.etree.XPathElementEvaluator.__call__()
src\lxml\apihelpers.pxi in lxml.etree._utf8()
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
这是我的 xpath
response.xpath("//p[contains(text(),'Полное')]").extract()
'Полное' 是我用于搜索的俄文文本。
如何修复错误?
最佳答案
用 u
作为表达式字符串的前缀制作一个unicode字符串:
response.xpath(u"//p[contains(text(),'Полное')]").extract()
关于xpath - 在 Xpath 中传递 Cyrilics 包含返回 XML 值错误。破烂。 python 2,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51864582/