python - 无法使用 lxml Xpath 解析器解析 html

标签 python xpath lxml

我正在尝试解析此页面的评论:http://www.amazon.co.uk/product-reviews/B00143ZBHY

使用以下方法:

代码

html # a variable which contains exact html as given at the above page.
from lxml import etree
tree = etree.HTML(html)
r = tree.xpath(".//*[@id='productReviews']/tbody/tr/td[1]/div[9]/text()[4]")
print len(r)
print r[0].tag

输出

0
Traceback (most recent call last):
  File "c.py", line 37, in <module>
    print r[0].tag
IndexError: list index out of range

p,s,: 当在 firefox 的 xpath 检查器插件上使用相同的 xpath 时,我可以很容易地做到这一点。但是这里没有结果,请帮助!

最佳答案

尝试删除 /tbody形式 XPath — 没有 <tbody>#productReviews .

import urllib2
html = urllib2.urlopen("http://www.amazon.co.uk/product-reviews/B00143ZBHY").read()
from lxml import etree
tree = etree.HTML(html)
r = tree.xpath(".//*[@id='productReviews']/tr/td[1]/div[9]/text()[4]")
print r[0]

输出:

bought this as replacement for the original cover which came with my greenhouse and which ripped in the wind.  so far this seems a good replacement although for some reason it seems slightly too small for my greenhouse so that i cant zip both sides of the front at the same time.  seems sturdier and thicker than the cover i had before so hoping it lasts a bit longer!

关于python - 无法使用 lxml Xpath 解析器解析 html,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11458902/

相关文章:

xml - 两个或多个子类的XPATH串联

python lxml 添加一个保留所有父树的子元素

python - 在 OSX 10.9 中安装 lxml

python - Django:按月/年分组的日期属性求和

xpath - 如何使用变量节点名称获取节点值?

Selenium webdriver Xpath 属性名称中带有点 (.)

python xpath 表的一些但不是所有列

Python 子进程在 Google Cloud Functions 中不起作用

python - py2neo:具有多个键/值的 Graph.find_one

python - 用于半实现抽象类的 Pylint