python - Scrapy - "scrapy crawl"在内部捕获异常并将其隐藏在 Jenkins 的 "catch"子句中

标签 python jenkins scrapy

我每天都通过 Jenkins 运行 scrapy,并且我希望通过电子邮件将异常情况发送给我。

这是一个示例蜘蛛:

class ExceptionTestSpider(Spider):
    name = 'exception_test'

    start_urls = ['http://google.com']

    def parse(self, response):
        raise Exception

这是.Jenkinsfile:

#!/usr/bin/env groovy
try {
    node ('jenkins-small-py3.6'){
        ...
        stage('Execute Spider') {
            cd ...
            /usr/local/bin/scrapy crawl exception_test
        }
    }
} catch (exc) {
    echo "Caught: ${exc}"
    mail subject: "...",
            body: "The spider is failing",
              to: "...",
            from: "..."

    /* Rethrow to fail the Pipeline properly */
    throw exc
}

这是日志:

...
INFO:scrapy.core.engine:Spider opened
2019-08-22 10:49:49 [scrapy.core.engine] INFO: Spider opened
INFO:scrapy.extensions.logstats:Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2019-08-22 10:49:49 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
DEBUG:scrapy.extensions.telnet:Telnet console listening on 127.0.0.1:6023
DEBUG:scrapy.downloadermiddlewares.redirect:Redirecting (301) to <GET http://www.google.com/> from <GET http://google.com>
DEBUG:scrapy.core.engine:Crawled (200) <GET http://www.google.com/> (referer: None)
ERROR:scrapy.core.scraper:Spider error processing <GET http://www.google.com/> (referer: None)
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/twisted/internet/defer.py", line 654, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "...", line ..., in parse
    raise Exception
Exception
2019-08-22 10:49:50 [scrapy.core.scraper] ERROR: Spider error processing <GET http://www.google.com/> (referer: None)
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/twisted/internet/defer.py", line 654, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "...", line ..., in parse
    raise Exception
Exception
INFO:scrapy.core.engine:Closing spider (finished)
2019-08-22 10:49:50 [scrapy.core.engine] INFO: Closing spider (finished)
INFO:scrapy.statscollectors:Dumping Scrapy stats:
{
  ...
}
INFO:scrapy.core.engine:Spider closed (finished)
2019-08-22 10:49:50 [scrapy.core.engine] INFO: Spider closed (finished)
[Pipeline] }
[Pipeline] // stage
[Pipeline] }
[Pipeline] // node
[Pipeline] End of Pipeline
Finished: SUCCESS

并且没有发送任何邮件。 我相信Scrapy会在内部捕获异常,将其保存到稍后日志中,然后退出而不会出现错误。

如何让 Jenkins 获得异常(exception)?

最佳答案

问题在于,当 scrape 失败时,scrapy 不使用非零退出代码(src:https://github.com/scrapy/scrapy/issues/1231)。

正如该期评论者所说,我建议您添加自定义命令 ( http://doc.scrapy.org/en/master/topics/commands.html#custom-project-commands )。

关于python - Scrapy - "scrapy crawl"在内部捕获异常并将其隐藏在 Jenkins 的 "catch"子句中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57608702/

相关文章:

python - 使用Scrapy解析站点,跟随Next Page,写成XML

python - 如何从Scrapy获取已经抓取的URL数量(request_count)?

python - Scrapy 可以用作实时包装器吗?

python - 如何在Django中从服务器以缩进行显示user_list?

python - lxml.etree 和 xml.etree.ElementTree 添加没有前缀的命名空间(ns0、ns1 等)

python - 在 python 中使用 jenkinsapi 触发参数化构建

maven - Jenkins 在 mercurial commit 之后构建

python - 跨纪元的恒定准确性

python - 如何正确分类图像中正(亮色)圆圈和负(深色)圆圈的数量

selenium - 从 Jenkins 远程运行 Katalon