python - 使用scrapy解析同一div下的网站

我需要使用scrapy解析一个网站。html页面模式如下

div class="nameinfo"
     div class="namesub"
           span class="namesub">/span>
           span class="info">data of type 1 /span>
     /div
     div class="namesub">
          span class="namesub">/span>
          span class="info">data of type 2 /span>
    /div>

     div class="namesub">
          span class="namesub">/span>
          span class="info">data of type 3 /span>> 
    /div>
/div

我有上面标记的三种不同类型的数据。知道如何获得所需的数据。它们全部位于 div 内的 span 元素中，类属性为“namesub”。提前致谢:)

最佳答案

这是您应该放入蜘蛛内部的内容:

hxs = HtmlXPathSelector(response)

namesubs = hxs.select("//div[@class='namesub']")
for namesub in namesubs:
    item = MyItem()
    item["info"] = namesub.select('.//span[@class="info"]/text()').extract()[0]

    yield item

此代码假设您已使用 info 字段定义了 MyItem 项目类。

希望有帮助。

关于python - 使用scrapy解析同一div下的网站，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/17548886/

上一篇：Python Pandas - 将 groupby 函数的结果返回到父表

下一篇：python - 在Windows上为python编译线程安全的tcl

相关文章：

python - 从python中的二维数组中随机采样子数组

html - 带有 CSS 问题的 Ruby Nokogiri HTML 抓取表

python - Scrapy 跟随链接并收集电子邮件

python - 排除要抓取的元素

python - Scrapy 部署停止工作

python - 如何在Python/Kivy中实现ScrollView

python - 启动 python 控制台并控制其输出

python - 如何在海量数据帧上提高 lambda 函数的性能

python - NameError:名称 'container' 未定义 Python Webscraping

regex - 查询页面并使用表格抓取它