现在我可以看到 scrapy 同时下载所有页面,但我需要的是链接 people
和 extract_person
方法,这样当我在方法 people
我关注他们所有人并抓取我需要的所有信息,然后我才继续使用另一个页面 people urls。我该怎么做?
def people(self, response):
sel = Selector(response)
urls = sel.xpath(XPATHS.URLS).extract()
for url in urls:
yield Request(
url=BASE_URL+url,
callback=self.extract_person,
)
def extract_person(self, response):
sel = Selector(response)
name = sel.xpath(XPATHS.NAME).extract()[0]
person = PersonItem(name=name)
yield student
最佳答案
您可以控制 priority请求数:
priority (int) – the priority of this request (defaults to 0). The priority is used by the scheduler to define the order used to process requests. Requests with a higher priority value will execute earlier. Negative values are allowed in order to indicate relatively low-priority.
将人员请求的优先级设置为 1
将让 Scrapy 知道首先处理它们:
for url in student_urls:
yield Request(
url=BASE_URL+url,
callback=self.extract_person,
priority=1
)
关于python - 使用 scrapy 链接请求,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26782276/