python - Div文字，但排除一些标签文字

我正在尝试从div获取所有文本，但想在某些标签中排除一些文本。像<header><h2>some text</h2><header>中的所有文本一样，也可能排除<footer>的文本。

我已经有类似的东西了：

tree = <some html> 
XpathArticleSummary = "string(div)"
divs = tree.xpath(XpathArticleSummary)

我想要的是这样的：

XpathArticleSummary = "string(div[not(header|footer)])"

但这当然是行不通的:)

有没有一种排除方法？

最佳答案

由于您使用的是lxml，因此该xpath应该可以工作：

div//text()[not(parent::footer or parent::header)]

它应该为您提供文本节点列表。

关于python - Div文字，但排除一些标签文字，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/16898698/

相关文章：

python - 如何为网络 python 包安装注册入口点？