xpath - 使用XPath检索<script>标记内的元素

我正在尝试使用XPath获取位于<script>标记内的页面上的元素。例如：

<div id="foo">
    <script>
        <p>You can't get me.</p>
    </script>
</div>

如果我尝试response.xpath('//div[@id="foo"]//p')或response.xpath('//div[@id="foo"]/script/p')，则都返回一个空数组。

如何使用XPath获取<script>标记内的元素？

最佳答案

eLRuLL为我的问题提供了更加优雅和更好的answer。他的解决方案如下：

from scrapy import Selector

#First, retrieve the content within the <script> tag:
text = response.xpath('//script/text()').extract_first()
#Then, create a Selector
sel = Selector(text=text)
#Now we can use XPath normally as if the text was a common HTML response
sel.xpath(//p/text()).extract_first()

旧答案：
<script>节点只有文本类型的子代。这就是为什么XPath不会深入到<script>标记的原因。但是，我找到了解决方法。

#First, retrieve the content within the <script> tag:
text = response.xpath('//script/text()').extract_first()
#Then, encode it
text_encoded = text.encode('utf-8')
#Now, convert it to a HtmlResponse object
text_in_html = HtmlResponse(url='some url', body=text_encoded, encoding='utf-8')
#Now we can use XPath normally as if the text was a common HTML response
text_in_html.xpath(//p/text()).extract_first()

关于xpath - 使用XPath检索<script>标记内的元素，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/53178974/

上一篇：javascript - 加载模块脚本失败 : The server responded with a non-JavaScript MIME type of "text/plain"

下一篇：typescript - create-react-app 保留 ES5 后的 JavaScript

php - PHP DOMXPath-用于选择包含包含具有特定属性的输入的tr的表达式

selenium - 为什么我应该使用 CSS 选择器而不是 XPath 进行自动化测试？

xslt - xsl :template match attribute: how related to default namespace

python - Scrapy从第一个元素和帖子标题收集数据

python - scrapyd deploy 显示 0 个蜘蛛

python - 如何在源代码(Xpath)中查找特定字符串并提取后续文本？

xml - 查找两个元素之间的相对 XPath 的最有效方法是什么？

java - Java中NodeList转字符串

python - 在 Scrapy 中本地运行所有的爬虫

xpath - 使用XPath检索&lt;script&gt;标记内的元素

上一篇：javascript - 加载模块脚本失败 : The server responded with a non-JavaScript MIME type of "text/plain"

下一篇：typescript - create-react-app 保留 ES5 后的 JavaScript

xpath - 使用XPath检索<script>标记内的元素