嗨,有什么方法可以使用 requests-HTML 获取标签的所有父元素吗?
例如:
<!DOCTYPE html>
<html lang="en">
<body id="two">
<h1 class="text-primary">hello there</h1>
<p>one two tree<b>four</b>five</p>
</body>
</html>
我想获取 b
标记的所有父级:[html, body, p]
或者对于 h1
标签,得到以下结果:[html, body]
最佳答案
凭借出色的lxml
:
from lxml import etree
html = """<!DOCTYPE html>
<html lang="en">
<body id="two">
<h1 class="text-primary">hello there</h1>
<p>one two tree<b>four</b>five</p>
</body>
</html> """
tree = etree.HTML(html)
# We search the first <b> element
b_elt = tree.xpath('//b')[0]
print(b_elt.text)
# -> "four"
# Walking around ancestors of this <b> element
ancestors_tags = [elt.tag for elt in b_elt.iterancestors()]
print(ancestors_tags)
# -> [p, body, html]
关于python - 使用 python requests-HTML 获取标签的父元素,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55124725/