elasticsearch - 将JSON文件中的大容量索引文档导入ElasticSearch

标签 elasticsearch

我有一个sample.json如下:

{"id":921,"car_make":"Chevrolet","car_model":"Traverse","car_year":2009,"car_color":"Yellow","made_in":"Guinea-Bissau"},
{"id":922,"car_make":"Mitsubishi","car_model":"Eclipse","car_year":1996,"car_color":"Khaki","made_in":"Luxembourg"},
{"id":923,"car_make":"Ford","car_model":"Lightning","car_year":1994,"car_color":"Teal","made_in":"China"},
{"id":924,"car_make":"Mercedes-Benz","car_model":"Sprinter 2500","car_year":2012,"car_color":"Yellow","made_in":"Colombia"},
{"id":925,"car_make":"Nissan","car_model":"Maxima","car_year":2002,"car_color":"Yellow","made_in":"Kazakhstan"},
{"id":926,"car_make":"Chrysler","car_model":"Pacifica","car_year":2006,"car_color":"Crimson","made_in":"China"}

我应该使用什么命令将每一行索引到ElasticSearch中?
到目前为止,我已经尝试了以下方法,但它不起作用。
>> curl -XGET 'localhost:9200/car/car' -d @sample.json 
{"error":"Content-Type header [application/x-www-form-urlencoded] is not supported","status":406}

还尝试了:
curl -XGET 'localhost:9200/car/inventory/_bulk' -H 'Content-Type: application/json' -d @sample.json 
{"_index":"car","_type":"inventory","_id":"_bulk","found":false}

最佳答案

您将要使用Bulk API

该文档很好地解释了所有内容,但请注意以下事项:

  • 您的文件应为以换行符分隔的json(NDJSON),并将application/x-ndjson指定为Content-Type。这意味着最后没有逗号。
  • 每条记录有2行,“Action / Metadata”行,然后是源json行
  • 您的文件必须以换行符
  • 结尾
  • 使用curl时,请确保使用--data-binary,以便保留换行符
  • URL路径不需要指定索引或类型,只需指定_bulk,然后您必须在每个记录的元数据行中包括索引和类型。如果指定索引并在url中键入,则元数据不需要包括_index_type字段。

  • 以您的示例为例,您的文件将如下所示:
    { "index" : { "_index" : "car", "_type" : "car", "_id" : "921" } }
    {"id":921,"car_make":"Chevrolet","car_model":"Traverse","car_year":2009,"car_color":"Yellow","made_in":"Guinea-Bissau"}
    { "index" : { "_index" : "car", "_type" : "car", "_id" : "922" } }
    {"id":922,"car_make":"Mitsubishi","car_model":"Eclipse","car_year":1996,"car_color":"Khaki","made_in":"Luxembourg"}
    { "index" : { "_index" : "car", "_type" : "car", "_id" : "923" } }
    {"id":923,"car_make":"Ford","car_model":"Lightning","car_year":1994,"car_color":"Teal","made_in":"China"}
    { "index" : { "_index" : "car", "_type" : "car", "_id" : "924" } }
    {"id":924,"car_make":"Mercedes-Benz","car_model":"Sprinter 2500","car_year":2012,"car_color":"Yellow","made_in":"Colombia"}
    { "index" : { "_index" : "car", "_type" : "car", "_id" : "925" } }
    {"id":925,"car_make":"Nissan","car_model":"Maxima","car_year":2002,"car_color":"Yellow","made_in":"Kazakhstan"}
    { "index" : { "_index" : "car", "_type" : "car", "_id" : "926" } }
    {"id":926,"car_make":"Chrysler","car_model":"Pacifica","car_year":2006,"car_color":"Crimson","made_in":"China"}
    
    

    然后,当然,curl命令将Content-Type header 指定为application/x-ndjson,看起来像这样:
    curl -XPOST -H "Content-Type: application/x-ndjson" localhost:9200/_bulk --data-binary @sample.json 
    

    关于elasticsearch - 将JSON文件中的大容量索引文档导入ElasticSearch,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47932284/

    相关文章:

    node.js - 如何在elasticsearch中对多个字段进行模糊查询?

    elasticsearch - ElasticSearch NEST手动映射分析仪所需的子字段

    elasticsearch - 为什么在Elasticsearch中创建索引时此新映射无法生效?

    嵌套内部命中的 Elasticsearch 聚合

    rest - 为什么 Elasticsearch 批量插入使用\n定界符,而不是使用json对象数组?

    Heroku 上 ElasticSearch 的 Python 包装器

    php - Plastic/Elasticsearch-搜索具有空值的条目

    elasticsearch mapping tokenizer 关键字以避免拆分 token 并启用通配符

    elasticsearch - 如何编写脚本字段以获取状态转换之间的天数

    amazon-web-services - AWS RED 上的 Elasticsearch 和重新路由不允许