python - 刮刀给出空白输出

我在 python 脚本中使用了选择器从下面给出的一些 html 元素中获取文本。我尝试使用 .text 从元素中获取 Shop here Cheap 字符串，但它根本不起作用。但是，当我尝试使用 .text_content() 时，它会正常工作。

我的问题是:

.text 方法有什么问题？为什么它无法解析元素中的文本？

HTML 元素:

<div class="Price__container">
    <span class="ProductPrice" itemprop="price">$6.35</span>
    <span class="ProductPrice_original">$6.70</span>
    Shop here cheap
</div>

我尝试过的:

from lxml import html

tree = html.fromstring(element)
for data in tree.cssselect(".Price__container"):      
    print(data.text)           #It doesn't work at all

顺便说一句，我不想继续使用 .text_content() 这就是为什么我期待任何答案来使用 .text 来抓取文本。提前致谢。

最佳答案

我认为造成困惑的根本原因是lxml有这个 .text & .tail concept表示节点内容，避免必须有特殊的“文本”节点实体，引用 documentation :

The two properties .text and .tail are enough to represent any text content in an XML document. This way, the ElementTree API does not require any special text nodes in addition to the Element class, that tend to get in the way fairly often (as you might know from classic DOM APIs).

就您而言，Shop here cheap是 <span class="ProductPrice_original">$6.70</span> 的尾部元素，因此不包含在 .text 中父节点的值。

除了其他方法，例如.text_content() ，您可以通过非递归地获取所有顶级文本节点来到达尾部:

print(''.join(data.xpath("./text()")).strip())

或者，获取最后一个顶级文本节点:

print(data.xpath("./text()[last()]")[0].strip())

关于python - 刮刀给出空白输出，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/46894569/

python - 刮刀给出空白输出

上一篇：python - Tensorflow:为什么 inception_v3 预测在评估中是 Nan？

下一篇：python - BeautifulSoup4 输出中的 bool 属性