python - 如何将 scrapy 日志记录到 Logstash 中

标签 python scrapy logstash elastic-stack

我已经在我的服务器上正确设置了 ELK 堆栈,并使用 python-logstash,我可以使用以下代码段将日志发送到logstash,一切正常。

import logging
import logstash
import sys

host = 'localhost'

test_logger = logging.getLogger('python-logstash-logger')
test_logger.setLevel(logging.INFO)
# test_logger.addHandler(logstash.LogstashHandler(host, 5959, version=1))
test_logger.addHandler(logstash.TCPLogstashHandler(host, 5000, version=1))

test_logger.error('python-logstash: test logstash error message.')
test_logger.info('python-logstash: test logstash info message.')
test_logger.warning('python-logstash: test logstash warning message.')

# add extra field to logstash message
extra = {
    'test_string': 'python version: ' + repr(sys.version_info),
    'test_boolean': True,
    'test_dict': {'a': 1, 'b': 'c'},
    'test_float': 1.23,
    'test_integer': 123,
    'test_list': [1, 2, '3'],
}
test_logger.info('python-logstash: test extra fields', extra=extra)

**下一步**是我想将Logstash与Scrapy集成,

这是我的蜘蛛代码的一部分:

# -*- coding: utf-8 -*-
import scrapy
import json
import logging
from scrapy.xlib.pydispatch import dispatcher
from scrapy import signals
from collections import defaultdict
import time
from ..helper import Helper
from ..items import SampleItem
import requests as py_request
import logging
import logstash
import sys


class SampleSpider(scrapy.Spider):
    name = 'sample'
    allowed_domains = []
    start_urls = ['https://www.sample.com/']
    duplicate_found = False
    counter = defaultdict(dict)
    cat = 0
    place_code = 0
    categories = {}
    logstash_logger = None

    def __init__(self, *args, **kwargs):


        self.logstash_logger = logging.getLogger('scrapy-logger')
        self.logstash_logger.setLevel(logging.INFO)
        self.logstash_logger.addHandler(logstash.TCPLogstashHandler('localhost', 5000, version=1))
        dispatcher.connect(self.spider_closed, signal=signals.spider_closed)

    def get_place_code(self):
        return self.place_code

    def set_place_code(self, value):
        self.place_code = value

    def start_requests(self):
        logging.info(":::>{0} Spider Starting".format(self.name))
        self.logstash_logger.info(":::>{0} Spider Starting".format(self.name))
        self.categories = Helper().get_categories();
        req_timestamp = str(time.time())[:-2]
        for cat in self.categories:
            self.counter[cat['id']] = 0
            logging.info(":::> Start crawling category = {0} ".format(cat['id']))
            self.logstash_logger.info(":::> Start crawling category = {0} ".format(cat['id']))
            start_url = 'https://www.sample.com?c=' + str(
                cat['id'])
            logging.info(start_url)
            yield scrapy.Request(url=start_url,
                                 method="GET",
                                 callback=self.parse,
                                 meta={'cat': cat['id'], 'requestDateTime': 0, 'counter': 0}
                                 )

    def spider_closed(self, spider):
        logging.info(":::>********************************************************************")
        logging.info(":::>{0} Spider Finished.".format(self.name))
        self.logstash_logger.info(":::>{0} Spider Finished.".format(self.name))

        total = 0
        for cat_id, value in self.counter.items():
            logging.info("{0} items imported into {1} category".format(value, cat_id))
            self.logstash_logger.info("{0} items imported into {1} category".format(value, cat_id))
            total += value
        logging.info(":::>******** End Summary; Total : {0} items scraped ***********".format(total))
        self.logstash_logger.info(":::>******** End Summary; Total : {0} items scraped ***********".format(total))

    def parse(self, response):
     # do my parsing stuffs there
     self.logstash_logger.info('End of Data for category')

我可以在Scrapyd日志中看到我的自定义日志,但没有任何内容发送到logstash

2018-08-04 13:42:18 [root] INFO: :::> Start crawling category = 43614 
2018-08-04 13:42:18 [scrapy-logger] INFO: :::> Start crawling category = 43614 

我的问题是为什么它不将日志发送到logstash? 如何将 scrapy 日志记录到 Logstash 中?

最佳答案

实际上,我已经完成了 99%,我只需要使用 scrapy 作为记录器

def __init__(self, *args, **kwargs):
    self.logstash_logger = logging.getLogger('scrapy')
    self.logstash_logger.addHandler(logstash.TCPLogstashHandler('logstash', 5000, version=1))
    dispatcher.connect(self.spider_closed, signal=signals.spider_closed)

我发布的答案也许对处于相同情况的其他人有用。

关于python - 如何将 scrapy 日志记录到 Logstash 中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51691756/

相关文章:

python - 与列表列中的项目关联的总金额

python - 如何在图中的 x 轴上添加年份?

python - 使用 Python 检查 HTML 文本中的多个字符串(来自文件)

python - 如何使用 response.css() 和 response.follow() 在 Scrapy 中对最后一页进行分页?

elasticsearch - 如何使用 Kibana + Elastic Search 检索字段的唯一计数

python - 加载 Python lib pyinstaller 时出错(不同尝试的详细描述)

python - 获取邮件的内容 IMAPCLIENT

python - XPath 不适用于屏幕抓取

logstash - Beat 和 Logstash - 连接被同行重置

logstash - 类型错误 : no implicit conversion of Integer into String