python - Scrapy 从网站上提取表格

我是一个Python新手，正在尝试编写一个脚本来从中提取数据page 。使用scrapy，我编写了以下代码:

import scrapy

class dairySpider(scrapy.Spider):
    name = "dairy_price"

    def start_requests(self):
        urls = [
            'http://www.dairy.com/market-prices/?page=quote&sym=DAH15&mode=i',

        ]
        for url in urls:
            yield scrapy.Request(url=url, callback=self.parse)




    def parse(self, response):
        for rows in response.xpath("//tr"):
            yield {
                'text': rows.xpath(".//td/text()").extract().strip('. \n'),

                }

但是，这并没有抓取掉任何东西。你有什么想法？谢谢

最佳答案

页面http://www.dairy.com/market-prices/?page=quote&sym=DAH15&mode=i上的表格通过向 http://shared.websol.barchart.com/quotes/quote.php?page=quote&sym=DAH15&mode=i&domain=blimling&display_ice=&enabled_ice_exchanges=&tz=0&ed=0 发出请求来动态添加到 DOM 。

您应该废弃第二个链接而不是第一个链接。由于 scrapy.Request 只会返回 html 源代码，而不返回使用 javascript 添加的内容。

更新

这是提取表数据的工作代码

import scrapy

class dairySpider(scrapy.Spider):
    name = "dairy_price"

    def start_requests(self):
        urls = [
            "http://shared.websol.barchart.com/quotes/quote.php?page=quote&sym=DAH15&mode=i&domain=blimling&display_ice=&enabled_ice_exchanges=&tz=0&ed=0",
        ]

        for url in urls:
            yield scrapy.Request(url=url, callback=self.parse)


    def parse(self, response):
        for row in response.css(".bcQuoteTable tbody tr"):
            print row.xpath("td//text()").extract()

请确保编辑 settings.py 文件并将 ROBOTSTXT_OBEY = True 更改为 ROBOTSTXT_OBEY = False

关于python - Scrapy 从网站上提取表格，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/46938138/

python - Scrapy 从网站上提取表格

上一篇：python - 如何在Python中为列表中的每个连续重复元素赋值？

下一篇：Python - 使用另一个列表迭代Python列表