python - Scrapy make_requests_from_url(url)

Scrapy教程中有BaseSpider的这个方法:

make_requests_from_url(url)

A method that receives a URL and returns a Request object (or a list of Request objects) to scrape.

This method is used to construct the initial requests in the start_requests() method, and is typically used to convert urls to requests.

Unless overridden, this method returns Requests with the parse() method as their callback function, and with dont_filter parameter enabled (see Request class for more info).

你明白这个方法的作用吗？我可以使用 makerequestsfrom_url 和 BaseSpider 来代替不适合我的 SgmlLinkExtractor 和 CrawlSpider 吗？

我正在尝试抓取超出给定初始网址的内容，而 Scrapy 没有这样做。

谢谢

最佳答案

没错，CrawlSpider在许多情况下很有用且方便，但它仅涵盖所有可能的蜘蛛的子集。如果您需要更复杂的东西，您通常会子类 BaseSpider并实现start_requests()方法。

关于python - Scrapy make_requests_from_url(url)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/1810143/