scrapy 可以忽略 rel="nofollow"
链接吗?
看着 sgml.py在 scrapy 0.22
中看起来是这样的:
如何启用它?
最佳答案
Paul 说对了,我就是这样做的:
rules = (
# Extract all pages, follow links, call method 'parse_page' for response callback, before processing links call method links_processor
Rule(LinkExtractor(allow=('','/')),follow=True,callback='parse_page',process_links='links_processor'),
这是实际的功能(我是 python 的新手,我确信有一种更好的方法可以在不创建新列表的情况下从 for 循环中删除项目
def links_processor(self,links):
# A hook into the links processing from an existing page, done in order to not follow "nofollow" links
ret_links = list()
if links:
for link in links:
if not link.nofollow: ret_links.append(link)
return ret_links
很简单。
关于python - Scrapy 荣誉 rel=nofollow,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21392222/