Scrapy教程中有BaseSpider的这个方法:
A method that receives a URL and returns a Request object (or a list of Request objects) to scrape.
This method is used to construct the initial requests in the start_requests() method, and is typically used to convert urls to requests.
Unless overridden, this method returns Requests with the parse() method as their callback function, and with dont_filter parameter enabled (see Request class for more info).
你明白这个方法的作用吗?我可以使用 makerequestsfrom_url 和 BaseSpider 来代替不适合我的 SgmlLinkExtractor 和 CrawlSpider 吗?
我正在尝试抓取超出给定初始网址的内容,而 Scrapy 没有这样做。
谢谢
最佳答案
没错,CrawlSpider在许多情况下很有用且方便,但它仅涵盖所有可能的蜘蛛的子集。如果您需要更复杂的东西,您通常会子类 BaseSpider并实现start_requests()方法。
关于python - Scrapy make_requests_from_url(url),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/1810143/