amazon-web-services - elasticsearch只显示1个使用logstash进行数据迁移的docs.count

标签 amazon-web-services elasticsearch logstash kibana

我正在尝试使用自定义Templete的logstash将数据从S3(.csv文件的数据)移动到 Elasticsearch 集群。
但是当我在Kibana中使用以下查询进行检查时,它仅将docs.count = 1和其余记录显示为docs.deleted:-

GET /_cat/indices?v

我的第一个问题是:
  • 为什么只传输一条记录(最后一条),而其他记录则删除删除?

  • 现在,当我在Kibana中使用以下查询查询该索引时:-
    GET /my_file_index/_search
    {
      "query": {
        "match_all": {}
      }
    }
    

    我在"message" :字段中仅得到一条记录,其中用逗号分隔数据,所以第二个问题是:-
  • 我如何在模板文件中指定所有列映射并将其输入到logstash中的情况下,像在csv中那样获取具有列名称的数据?

  • 我也尝试在logstash csv过滤器中提供columns字段,但是没有运气。
     columns => ["col1", "col2",...]
    

    任何帮助,将不胜感激。

    编辑-1:以下是我的logstash.conf文件:
    input {
     s3{
         access_key_id => "xxx"
         secret_access_key => "xxxx"
         region => "eu-xxx-1"
         bucket => "xxxx"
         prefix => "abc/stocks_03-jul-2018.csv"
       }
    }
    filter {
      csv {
          separator => ","
          columns => ["AAA","BBB","CCC"]
      }
    }
    output {
        amazon_es {
            index => "my_r_index"
            document_type => "my_r_index"
            hosts => "vpc-totemdev-xxxx.eu-xxx-1.es.amazonaws.com"
            region => "eu-xxxx-1"
            aws_access_key_id => 'xxxxx'
            aws_secret_access_key => 'xxxxxx+xxxxx'
            document_id => "%{id}"
            template => "templates/template_2.json"
            template_name => "my_r_index"
     }
    }
    

    注意:
    Logstash版本:6.3.1
    的elasticsearch版本:6.2

    编辑:-2与示例csv header 一起添加template_2.json文件:-

    1.映射文件:-
    { 
        "template" : "my_r_index", 
        "settings" : {
            "index" : {
                "number_of_shards" : 50,
                "number_of_replicas" : 1
             },
             "index.codec" : "best_compression",
             "index.refresh_interval" : "60s"
          },
        "mappings" : { 
            "_default_" : { 
                "_all" : { "enabled" : false },
           "properties" : { 
            "SECURITY" : {
                "type" : "keyword"
            },
            "SERVICEID" : {
                "type" : "integer"
            },
            "MEMBERID" : {
                "type" : "integer"
            },
            "VALUEDATE" : {
                "type" : "date"
            },
            "COUNTRY" : {
                "type" : "keyword"
            },
            "CURRENCY" : {
                "type" : "keyword"
            },
            "ABC" : {
                "type" : "integer"
            },
            "PQR" : {
                "type" : "keyword"
            },
            "KKK" : {
                "type" : "keyword"
            },
            "EXPIRYDATE" : {
                "type" : "text",
                "index" : "false"
            },
            "SOMEID" : {
                "type" : "double",
                "index" : "false"
            },
            "DDD" : {
                "type" : "double",
                "index" : "false"
            },
            "EEE" : {
                "type" : "double",
                "index" : "false"
            },
            "FFF" : {
                "type" : "double",
                "index" : "false"
            },
            "GGG" : {
                "type" : "text",
                "index" : "false"
            },
            "LLL" : {
                "type" : "double",
                "index" : "false"
            },
            "MMM" : {
                "type" : "double",
                "index" : "false"
            },
            "NNN" : {
                "type" : "double",
                "index" : "false"
            },
            "OOO" : {
                "type" : "double",
                "index" : "false"
            },
            "PPP" : {
                "type" : "text",
                "index" : "false"
            },
            "QQQ" : {
                "type" : "integer",
                "index" : "false"
            },
            "RRR" : {
                "type" : "double",
                "index" : "false"
            },
            "SSS" : {
                "type" : "double",
                "index" : "false"
            },
            "TTT" : {
                "type" : "double",
                "index" : "false"
            },
            "UUU" : {
                "type" : "double",
                "index" : "false"
            },
            "VVV" : {
                "type" : "text",
                "index" : "false"
            },
            "WWW" : {
                "type" : "double",
                "index" : "false"
            },
            "XXX" : {
                "type" : "double",
                "index" : "false"
            },
            "YYY" : {
                "type" : "double",
                "index" : "false"
            },
            "ZZZ" : {
                "type" : "double",
                "index" : "false"
            },
            "KNOCKORWARD" : {
                "type" : "text",
                "index" : "false"
            },
            "RANGEATSSPUT" : {
                "type" : "double",
                "index" : "false"
            },
            "STDATMESSPUT" : {
                "type" : "double",
                "index" : "false"
            },
            "CONSENSUPUT" : {
                "type" : "double",
                "index" : "false"
            },
            "CLIENTLESSPUT" : {
                "type" : "double",
                "index" : "false"
            },
            "KNOCKOUESSPUT" : {
                "type" : "text",
                "index" : "false"
            },
            "RANGACTOR" : {
                "type" : "double",
                "index" : "false"
            },
            "STDDACTOR" : {
                "type" : "double",
                "index" : "false"
            },
            "CONSCTOR" : {
                "type" : "double",
                "index" : "false"
            },
            "CLIENTOR" : {
                "type" : "double",
                "index" : "false"
            },
            "KNOCKOACTOR" : {
                "type" : "text",
                "index" : "false"
            },
            "RANGEPRICE" : {
                "type" : "double",
                "index" : "false"
            },
            "STANDARCE" : {
                "type" : "double",
                "index" : "false"
            },
            "NUMBERICE" : {
                "type" : "integer",
                "index" : "false"
            },
            "CONSECE" : {
                "type" : "double",
                "index" : "false"
            },
            "CLIECE" : {
                "type" : "double",
                "index" : "false"
            },
            "KNOCICE" : {
                "type" : "text",
                "index" : "false"
            },
            "SKEWICE" : {
                "type" : "text",
                "index" : "false"
            },
            "WILDISED" : {
                "type" : "text",
                "index" : "false"
            },
            "WILDATUS" : {
                "type" : "text",
                "index" : "false"
            },
            "RRF" : {
                "type" : "double",
                "index" : "false"
            },
            "SRF" : {
                "type" : "double",
                "index" : "false"
            },
            "CNRF" : {
                "type" : "double",
                "index" : "false"
            },
            "CTRF" : {
                "type" : "double",
                "index" : "false"
            },
            "RANADDLE" : {
                "type" : "double",
                "index" : "false"
            },
            "STANDANSTRADDLE" : {
                "type" : "double",
                "index" : "false"
            },
            "CONSLE" : {
                "type" : "double",
                "index" : "false"
            },
            "CLIDLE" : {
                "type" : "double",
                "index" : "false"
            },
            "KNOCKOADDLE" : {
                "type" : "text",
                "index" : "false"
            },
            "RANGEFM" : {
                "type" : "double",
                "index" : "false"
            },
            "SMIUM" : {
                "type" : "double",
                "index" : "false"
            },
            "CONIUM" : {
                "type" : "double",
                "index" : "false"
            },
            "CLIEEMIUM" : {
                "type" : "double",
                "index" : "false"
            },
            "KNOREMIUM" : {
                "type" : "text",
                "index" : "false"
            },
            "COT" : {
                "type" : "double",
                "index" : "false"
            },
            "CLIEEDSPOT" : {
                "type" : "double",
                "index" : "false"
            },
            "IME" : {
                "type" : "keyword"
            },
            "KKE" : {
                "type" : "keyword"
            }
            } 
        }
        }     
    } 
    
  • 我的excel内容为:-

    标题:实际标题很长,因为有很多列,请继续考虑与下面类似的其他列名。
      SECURITY | SERVICEID  | MEMBERID | VALUEDATE...
    

    第一行:同样,下面某些列的列值具有空白值,我已经在上面提到了包含所有列值的真实模板文件(在上面的映射文件中)。

    KKK-LMN 2 1815 6/25/2018
    PPL-ORL 2 1815 2018年6月25日
    SLB-ORD 2 1815 6/25/2018

  • 3. Kibana查询输出
    查询:
    GET /my_r_index/_search
    {
      "query": {
        "match_all": {}
      }
    }
    

    出局:
    {
            "_index": "my_r_index",
            "_type": "my_r_index",
            "_id": "IjjIZWUBduulDsi0vYot",
            "_score": 1,
            "_source": {
              "@version": "1",
              "message": "XXX-XXX-XXX-USD,2,3190,2018-07-03,UNITED STATES,USD,300,60,Put,2042-12-19,,,,.009108041,q,,,,.269171754,q,,,,,.024127966,q,,,,68.414017367,q,,,,.298398645,q,,,,.502677959,q,,,,,0.040880692400344164,q,,,,,,,159.361792143,,,,.631296636,q,,,,.154877384,q,,42.93,N,Y,\n",
              "@timestamp": "2018-08-23T07:56:06.515Z"
            }
          },  
    

    ...上述其他类似记录。

    EDIT-3:
    使用autodetect_column_names => true后的样本输出:
    {
      "took": 4,
      "timed_out": false,
      "_shards": {
        "total": 10,
        "successful": 10,
        "skipped": 0,
        "failed": 0
      },
      "hits": {
        "total": 3,
        "max_score": 1,
        "hits": [
          {
            "_index": "indr",
            "_type": "logs",
            "_id": "hAF1aWUBS_wbCH7ZG4tW",
            "_score": 1,
            "_source": {
              "2": "2",
              "1815": "1815",
              "message": """
    PPL-ORD-XNYS-USD,2,1815,6/25/2018,UNITED STATES
    
    """,
              "SLB-ORD-XNYS-USD": "PPL-ORD-XNYS-USD",
              "6/25/2018": "6/25/2018",
              "@timestamp": "2018-08-24T01:03:26.436Z",
              "UNITED STATES": "UNITED STATES",
              "@version": "1"
            }
          },
          {
            "_index": "indr",
            "_type": "logs",
            "_id": "kP11aWUBctDorPcGHICS",
            "_score": 1,
            "_source": {
              "2": "2",
              "1815": "1815",
              "message": """
    SLBUSD,2,1815,4/22/2018,UNITEDSTATES
    
    """,
              "SLB-ORD-XNYS-USD": "SLBUSD",
              "6/25/2018": "4/22/2018",
              "@timestamp": "2018-08-24T01:03:26.436Z",
              "UNITED STATES": "UNITEDSTATES",
              "@version": "1"
            }
          },
          {
            "_index": "indr",
            "_type": "logs",
            "_id": "j_11aWUBctDorPcGHICS",
            "_score": 1,
            "_source": {
              "2": "SERVICE",
              "1815": "CLIENT",
              "message": """
    UNDERLYING,SERVICE,CLIENT,VALUATIONDATE,COUNTRY
    
    """,
              "SLB-ORD-XNYS-USD": "UNDERLYING",
              "6/25/2018": "VALUATIONDATE",
              "@timestamp": "2018-08-24T01:03:26.411Z",
              "UNITED STATES": "COUNTRY",
              "@version": "1"
            }
          }
        ]
      }
    }
    

    最佳答案

    我确定您的单个文档的ID为%{id}。第一个问题来自以下事实:在您的CSV文件中,您没有提取名称为id的列,而这正是您在document_id => "%{id}"中使用的列,因此所有行都使用id %{id}进行了索引,并且每个索引都删除了前一个。最后,您拥有一个文档,该文档已被索引为CSV中的行。

    关于第二个问题,您需要修复过滤器部分,如下所示:

    filter {
      csv {
          separator => ","
          autodetect_column_names => true
      }
      date {
        match => [ "VALUATIONDATE", "M/dd/yyyy" ]
      }
    }
    

    您还需要像这样修复索引模板(我只在format字段中添加了VALUATIONDATE设置:
    {
      "order": 0,
      "template": "helloindex",
      "settings": {
        "index": {
          "codec": "best_compression",
          "refresh_interval": "60s",
          "number_of_shards": "10",
          "number_of_replicas": "1"
        }
      },
      "mappings": {
        "_default_": {
          "_all": {
            "enabled": false
          },
          "properties": {
            "UNDERLYING": {
              "type": "keyword"
            },
            "SERVICE": {
              "type": "integer"
            },
            "CLIENT": {
              "type": "integer"
            },
            "VALUATIONDATE": {
              "type": "date",
              "format": "MM/dd/yyyy"
            },
            "COUNTRY": {
              "type": "keyword"
            }
          }
        }
      },
      "aliases": {}
    }
    

    关于amazon-web-services - elasticsearch只显示1个使用logstash进行数据迁移的docs.count,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51932734/

    相关文章:

    c# - Dotnet-core默认使用2.0框架ubuntu

    .net - 如何使用 Mono 为 AWS Lambda 函数创建 .NET 函数代码 zip 文件?

    elasticsearch - filebeat忽略多个探矿者中的logiles

    Elasticsearch 不存储字段,我做错了什么?

    amazon-web-services - RDS - SQL 服务器代理

    amazon-web-services - 想要通过电子邮件发送云形成输出

    elasticsearch - 在 Elasticsearch 中计数不同

    asp.net - 如何使用身份验证通过 NLog 或 SeriLog 登录到 Elastic Search

    java - 使用大字典执行logstash时出现堆空间错误(翻译过滤器)

    elasticsearch - 如何在logstash中基于grok创建过滤器