python - 解析中的 scrapy 回调未调用

我正在尝试 scrapy，但遇到了一点困难。我希望这个脚本能够运行回调。

import scrapy
from scrapy.spiders import Spider

class ASpider(Spider):
    name = 'myspider'
    allowed_domains = ['wikipedia.org','en.wikipedia.org']
    start_urls = ['https://www.wikipedia.org/']

    def parse(self, response):
        urls = response.css("a::attr('href')").extract()
        for url in urls:
            url = response.urljoin(url)
            print("url\t",url)
            scrapy.Request(url, callback=self.my_callback)


    def my_callback(self,response):
        print("callback called")

调用此函数的输出:

2016-05-31 16:21:26 [scrapy] INFO: Scrapy 1.1.0 started (bot: scrapybot)
2016-05-31 16:21:26 [scrapy] INFO: Overridden settings: {}
2016-05-31 16:21:26 [scrapy] INFO: Enabled extensions:
['scrapy.extensions.logstats.LogStats',
 'scrapy.extensions.corestats.CoreStats']
2016-05-31 16:21:26 [scrapy] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.chunked.ChunkedTransferMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2016-05-31 16:21:26 [scrapy] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2016-05-31 16:21:26 [scrapy] INFO: Enabled item pipelines:
[]
2016-05-31 16:21:26 [scrapy] INFO: Spider opened
2016-05-31 16:21:26 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2016-05-31 16:21:27 [scrapy] DEBUG: Crawled (200) <GET https://www.wikipedia.org/> (referer: None)
url  https://en.wikipedia.org/
url  https://es.wikipedia.org/
url  https://ja.wikipedia.org/

(Long list of similar urls)

url  https://meta.wikimedia.org/
2016-05-31 16:21:27 [scrapy] INFO: Closing spider (finished)
2016-05-31 16:21:27 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 215,
 'downloader/request_count': 1,
 'downloader/request_method_count/GET': 1,
 'downloader/response_bytes': 18176,
 'downloader/response_count': 1,
 'downloader/response_status_count/200': 1,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2016, 5, 31, 14, 21, 27, 240038),
 'log_count/DEBUG': 1,
 'log_count/INFO': 7,
 'response_received_count': 1,
 'scheduler/dequeued': 1,
 'scheduler/dequeued/memory': 1,
 'scheduler/enqueued': 1,
 'scheduler/enqueued/memory': 1,
 'start_time': datetime.datetime(2016, 5, 31, 14, 21, 26, 328888)}
2016-05-31 16:21:27 [scrapy] INFO: Spider closed (finished)

它不运行回调。这是为什么？需要更改什么才能使回调起作用？

最佳答案

蜘蛛回调必须产生一个请求、项目或(我认为)一个字典。

In the callback function, you parse the response (web page) and return either dicts with extracted data, Item objects, Request objects, or an iterable of these objects. Those Requests will also contain a callback (maybe the same) and will then be downloaded by Scrapy and then their response handled by the specified callback.

来自the scrapy docs (point 2) .

关于python - 解析中的 scrapy 回调未调用，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/37548479/

python - 解析中的 scrapy 回调未调用

上一篇：python - 如何从 subprocess.Popen 使用 STDIN

下一篇：python - 使用 imshow 绘制二维数组，设置轴值