python - 网络爬虫 - 忽略 Robots.txt 文件？

一些服务器有一个 robots.txt 文件，以阻止网络爬虫在他们的网站上爬行。有没有办法让网络爬虫忽略 robots.txt 文件？我正在为 python 使用 Mechanize。

最佳答案

documentation对于 mechanize 有这个示例代码:

br = mechanize.Browser()
....
# Ignore robots.txt.  Do not do this without thought and consideration.
br.set_handle_robots(False)

这正是您想要的。

关于python - 网络爬虫 - 忽略 Robots.txt 文件？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/8386481/

相关文章：

python - 我可以让 JSON 加载到 OrderedDict 中吗？