python - 使用Python客户端通过映射将不规则json加载到Elasticsearch索引中

我有一些.json，其中并非所有记录中都存在所有字段，例如caseclass.json看起来像:

[{
    "name" : "john smith", 
    "age" : 12, 
    "cars": ["ford", "toyota"], 
    "comment": "i am happy"
},
{
    "name": "a. n. other", 
    "cars": "", 
    "comment": "i am panicking"
}]

通过python客户端elasticsearch使用Elasticsearch-7.6.1:

from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
import json
import os
from elasticsearch_dsl import Document, Text, Date, Integer, analyzer

es = Elasticsearch([{'host': 'localhost', 'port': 9200}])
class Person(Document):
        class Index:
            using = es
            name = 'person_index'
        name = Text()
        age = Integer()
        cars = Text()
        comment = Text(analyzer='snowball')   

Person.init()

with open ("caseclass.json") as json_file:
data = json.load(json_file)
for indexid in range(len(data)):
    document = Person(name=data[indexid]['name'], age=data[indexid]['age'], cars=data[indexid]['cars'], comment=data[indexid]['comment'])
    document.meta.id = indexid
    document.save()

当第二条记录试图读取时，我自然会得到KeyError: 'age'。我的问题是:可以使用Python客户端和预先定义的映射而不是动态映射将此类记录加载到Elasticsearch索引上吗？如果所有字段都出现在所有记录中，则上述代码有效，但是有一种方法可以不检查每个记录是否存在每个字段，因为实际记录具有复杂的结构并且有数百万个记录？谢谢

最佳答案

该错误与映射无关，它只是告诉您在age之一中无法访问caseclasses。

调用Person.init()时会创建索引映射-您可以通过在print(es.indices.get_mapping(Person.Index.name))之后立即调用Person.init()来验证索引映射。

我已经整理了一下您的代码:

import json
import os
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search, Document, Text, Date, Integer, analyzer

es = Elasticsearch([{'host': 'localhost', 'port': 9200}])


class Person(Document):
    class Index:
        using = es
        name = 'person_index'
    name = Text()
    age = Integer()
    cars = Text()
    comment = Text(analyzer='snowball')


Person.init()
print(es.indices.get_mapping(Person.Index.name))

with open("caseclass.json") as json_file:
    data = json.load(json_file)
    for indexid, case in enumerate(data):
        document = Person(**case)
        document.meta.id = indexid
        document.save()

请注意，我是如何使用**case而不是case在data[property_key]内散布所有键值对的。

生成的映射如下:

{
  "person_index" : {
    "mappings" : {
      "properties" : {
        "age" : {
          "type" : "integer"
        },
        "cars" : {
          "type" : "text"
        },
        "comment" : {
          "type" : "text",
          "analyzer" : "snowball"
        },
        "name" : {
          "type" : "text"
        }
      }
    }
  }
}

关于python - 使用Python客户端通过映射将不规则json加载到Elasticsearch索引中，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/60904887/

python - 使用Python客户端通过映射将不规则json加载到Elasticsearch索引中

上一篇：flash - 在某些浏览器中外部加载的声音回声

下一篇：iphone - 将音频路由到听筒？