我在 Scrapy 中有一个简单的代码 -
def start_requests(self):
response = scrapy.Request(url,callback=self.parse_response)
response.meta['some_useful_params'] = some_useful_params
yield response
def parse_respone(self,resposne):
some_useful_params = response.meta['some_useful_params']
do_parsing_stuff()
if some_conditon==True:
presponse = scrapy.Request(otherurl,callback=self.parse_response)
presponse.meta['some_useful_params'] = some_useful_params
yield presponse
else:
yield items
上面的程序对我来说工作正常,但我需要将其更改为检查该页面的 html 是否已经存在的东西,然后将其作为 html 而不是向网站发出请求。
现在是代码 -
def start_requests(self):
if html_exist:
request = scrapy.Request(url)
request.meta['some_useful_params'] = some_useful_params
response = scrapy.http.Response(url,body=cached_html,request=request)
#the below line doesn't call the method parse_response
self.parse_response(response)
else:
response = scrapy.Request(url,callback=self.parse_response)
response.meta['some_useful_params'] = some_useful_params
yield response
def parse_respone(self,resposne):
some_useful_params = response.meta['some_useful_params']
do_parsing_stuff()
if some_conditon==True:
if html_exist:
request = scrapy.Request(url)
request.meta['some_useful_params'] = some_useful_params
presponse = scrapy.http.Response(url,body=cached_html,request=request)
#the below line doesn't call the method parse_response
self.parse_response(presponse)
else:
presponse = scrapy.Request(otherurl,callback=self.parse_response)
presponse.meta['some_useful_params'] = some_useful_params
yield presponse
else:
yield items
我面临的问题是在第二个代码中,如果 html 退出,则不会调用 parse_response 方法。
虽然我完全不明白原因,但我认为它与Python生成器有关,我该如何解决这个问题。?
最佳答案
您必须生成 items
或 requests
,而不仅仅是调用方法:
for item_or_request in self.parse_response(response):
yield item_or_request
关于python - start_request 方法中的 Scrapy yield 响应对象,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43159881/