python - Scrapy 的日志处理程序

我在以下 2 个问题中寻求您的帮助 - 如何像在 python 中一样为不同的日志级别设置处理程序。目前，我有

STATS_ENABLED = True
STATS_DUMP = True 

LOG_FILE = 'crawl.log'

但是Scrapy生成的调试信息也会被添加到日志文件中。这些非常长，理想情况下，我希望 DEBUG 级别的消息留在标准错误上，INFO 消息转储到我的 LOG_FILE。

其次，在文档中，它说 日志服务必须通过 scrapy.log.start() 函数显式启动。 我的问题是，我在哪里运行这个 scrapy。 log.start()?它在我的蜘蛛里面吗？

最佳答案

Secondly, in the docs, it says The logging service must be explicitly started through the scrapy.log.start() function. My question is, where do I run this scrapy.log.start()? Is it inside my spider?

如果您使用 scrapy crawl my_spider 运行蜘蛛——如果 STATS_ENABLED = True

，日志将自动启动

如果您手动启动爬虫进程，您可以在启动爬虫进程之前执行scrapy.log.start()。

from scrapy.crawler import CrawlerProcess
from scrapy.conf import settings


settings.overrides.update({}) # your settings

crawlerProcess = CrawlerProcess(settings)
crawlerProcess.install()
crawlerProcess.configure()

crawlerProcess.crawl(spider) # your spider here

log.start() # depends on LOG_ENABLED

print "Starting crawler."
crawlerProcess.start()
print "Crawler stopped."

关于你的第一个问题，我知道的一点点:

因为你必须手动启动 scrapy 日志，所以这允许你使用你自己的记录器。

我认为你可以在 scrapy 源中复制模块 scrapy/scrapy/log.py，修改它，导入它而不是 scrapy.log 然后运行 start () - scrapy 将使用您的日志。其中，函数 start() 中有一行内容为 log.startLoggingWithObserver(sflo.emit, setStdout=logstdout)。

制作你自己的观察者(http://docs.python.org/howto/logging-cookbook.html#logging-to-multiple-destinations)并在那里使用它。

关于python - Scrapy 的日志处理程序，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/8320730/

python - Scrapy 的日志处理程序

上一篇：python - 使用 PIP 安装程序在 MAC OS X LION 10.7.2 上安装 PIL

下一篇：python - 使用 k-Means 聚类算法预测值