python - 从 anchor 标记的 xpath 节点获取 2 属性的值

标签 python html xml xpath lxml

我在 xpath 的帮助下提取了以下内容:

In [206]: list = tree.xpath('/html/body/div[@id="gs_top"]/div[@id="gs_bdy"]/div[@id="gs_ccl"]/div[@id="gsc_ccl"]/div[@class="gsc_1usr gs_scl"]/div[@class="gsc_1usr_text"]/h3[@class="gsc_1usr_name"]/a')

In [208]: for item in list:
    print(etree.tostring(item, pretty_print=True))
   .....:
<a href="/citations?user=lMkTx0EAAAAJ&amp;hl=en&amp;oe=ASCII">Jason Weston</a>
<a href="/citations?user=RhFhIIgAAAAJ&amp;hl=en&amp;oe=ASCII">Pierre Baldi</a>
<a href="/citations?user=9DXQi8gAAAAJ&amp;hl=en&amp;oe=ASCII">Yair Weiss</a>
<a href="/citations?user=J8YyZugAAAAJ&amp;hl=en&amp;oe=ASCII">Peter Belhumeur</a>
<a href="/citations?user=ORr4XJYAAAAJ&amp;hl=en&amp;oe=ASCII">Serge Belongie</a>

现在我可以通过附加 /@href 来提取 href,也可以在 text() 的帮助下提取文本。但我怎样才能一次性获得它们,如这里的答案所示:How to select two attributes from the same node with one expression in XPath?

最佳答案

只需这样对每个元素调用 .xpath("@href|text()") 即可:

for item in list:
    href, text = item.xpath("@href|text()")
    print(href, text)

演示:

>>> from lxml.html import fromstring
>>> 
>>> data = """
... <body>
...     <a href="/citations?user=lMkTx0EAAAAJ&amp;hl=en&amp;oe=ASCII">Jason Weston</a>
...     <a href="/citations?user=RhFhIIgAAAAJ&amp;hl=en&amp;oe=ASCII">Pierre Baldi</a>
...     <a href="/citations?user=9DXQi8gAAAAJ&amp;hl=en&amp;oe=ASCII">Yair Weiss</a>
...     <a href="/citations?user=J8YyZugAAAAJ&amp;hl=en&amp;oe=ASCII">Peter Belhumeur</a>
...     <a href="/citations?user=ORr4XJYAAAAJ&amp;hl=en&amp;oe=ASCII">Serge Belongie</a>
... </body>
... """
>>> 
>>> tree = fromstring(data)
>>> 
>>> for item in tree.xpath("//a"):
...     print(item.xpath("@href|text()"))
... 
['/citations?user=lMkTx0EAAAAJ&hl=en&oe=ASCII', 'Jason Weston']
['/citations?user=RhFhIIgAAAAJ&hl=en&oe=ASCII', 'Pierre Baldi']
['/citations?user=9DXQi8gAAAAJ&hl=en&oe=ASCII', 'Yair Weiss']
['/citations?user=J8YyZugAAAAJ&hl=en&oe=ASCII', 'Peter Belhumeur']
['/citations?user=ORr4XJYAAAAJ&hl=en&oe=ASCII', 'Serge Belongie']

关于python - 从 anchor 标记的 xpath 节点获取 2 属性的值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34468751/

相关文章:

php - 帖子中的多字变量被截断

python - (重新)在 Python 中加权随机 CSV 样本

python - 如何创建 Python stub 文件以及放置在哪里?

python - 如何将终端中python提示符下的代码保存到本地文件

jquery - 多个 Bootstrap 导航栏的动画汉堡图标

页面内的 Javascript 页面

php - 在 PHP 中使用 importNode、namespace 和 appendChild 的奇怪事情

java - JiBX:如何在我的代码中继续使用接口(interface)?

javascript - xml 自动收报机在谷歌浏览器中不起作用

python - 如何在 python 中创建元组的元组?