javascript - 通过xpath从scrapy python中包含javascript的div中抓取数据

标签 javascript python xpath scrapy

我正在使用 scrapy ，我正在抓取一个网站并使用 xpath 来抓取项目。但是一些div包含javascript，所以当我使用xpath直到包含javascript代码的div id返回一个空列表，并且没有包括 div 元素(包含 javascript)可以获取 HTML 数据

HTML 代码

<div class="subContent2">    
   <div id="contentDetails">
       <div class="eventDetails">
            <h2>
                <a href="javascript:;" onclick="jdevents.getEvent(117032)">Some data</a>
            </h2>
       </div>
   </div>
</div>

蜘蛛代码

class ExampleSpider(BaseSpider):
    name = "example"
    domain_name = "www.example.com"
    start_urls = ["http://www.example.com/jkl/index.php"]


    def parse(self, response):
         hxs = HtmlXPathSelector(response)
         required_data = hxs.select('//div[@class="subContent2"]/div[@id="contentDetails"]/div[@class="eventDetails"]')

那么我如何从上面提到的h2元素内的 anchor 标记获取文本(一些数据)，是否有任何替代方案scrapy中从包含javascript的元素中获取数据的方法

最佳答案

<div class="subContent2">    
   <div id="contentDetails">
       <div class="eventDetails">
            <h2>
                <a href="javascript:;" onclick="jdevents.getEvent(117032)">Some data</a>
            </h2>
       </div>
   </div>
</div>

问题不在于本例中获取“Some data”字符串的 JavaScript 代码。

您需要获取子节点:

required_data = hxs.select('//div[@class="subContent2"]/div[@id="contentDetails"]/div[@class="eventDetails"]/h2/a/text()')

enter image description here

或使用字符串函数:

required_data = hxs.select('string(//div[@class="subContent2"]/div[@id="contentDetails"]/div[@class="eventDetails"])')

关于javascript - 通过xpath从scrapy python中包含javascript的div中抓取数据，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/10996357/

上一篇：javascript - 设置与滚动位置相关的 div 不透明度

下一篇：javascript - 为什么要使用 javascript 闭包来实现回调和事件处理程序附件？

相关文章：

python - 从数据集在 seaborn 的线图中创建多条线

python - 字符串字符同一性悖论

C# 从 XML 中选择属性或元素

javascript - image .show() 动画何时可见

javascript - 动态扩展 Canvas 而不拉伸(stretch)绘制的内容

python - flake8 e999 在 python2 中使用 fstrings (使用 future_fstrings)

selenium - 如何使用XPath在特定表单元格内获取值

php - 如何使用 XPath 通过嵌套在其他元素中的 CSS 类查找元素？

javascript - 导出为 PDF 时如何在表格中保留垂直标题？

javascript - 子文件夹的 Angular Route 出错