python - 使用 python 将字典或 JSON 文件列表导入 Elasticsearch

我有一个 .json.gz 文件，我希望将其加载到 Elasticsearch 中。

我的第一次尝试涉及使用 json 模块将 JSON 转换为字典列表。

import gzip
import json
from pprint import pprint
from elasticsearch import Elasticsearch

nodes_f = gzip.open("nodes.json.gz")
nodes = json.load(nodes_f)

字典示例:

pprint(nodes[0])

{u'index': 1,
 u'point': [508163.122, 195316.627],
 u'tax': u'fehwj39099'}

使用 Elasticsearch :

es = Elasticsearch()

data = es.bulk(index="index",body=nodes)

但是，这会返回:

elasticsearch.exceptions.RequestError: TransportError(400, u'illegal_argument_exception', u'Malformed action/metadata line [1], expected START_OBJECT or END_OBJECT but found [VALUE_STRING]')

除此之外，我希望能够找到给定 point 查询的 tax，以防这对我应该如何使用 elasticsearch 索引数据产生影响.

最佳答案

Alfe 为我指明了正确的方向，但我无法让他的代码正常工作。

我找到了两个解决方案:

逐行使用 for 循环:

es = elasticsearch.Elasticsearch()

for node in nodes:
    _id = node['index']
    es.index(index='nodes',doc_type='external',id=_id,body=node)

批量使用helper:

actions = [
    {
    "_index" : "nodes_bulk",
    "_type" : "external",
    "_id" : str(node['index']),
    "_source" : node
    }
for node in nodes
]

helpers.bulk(es,actions)

对于 343724 个字典的列表，批量处理的速度大约是 22 倍。

关于python - 使用 python 将字典或 JSON 文件列表导入 Elasticsearch ，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/42623636/

上一篇：python - 将字符串化的字典列表转换回字典列表

下一篇：python - 使用 lambda 函数交叉嵌套列表

相关文章：

javascript - 动态地将输入字段添加到数组中

json - 如何安全地从 JSON 1 升级到 JSON 2 wrt utf8 字符串？

elasticsearch - 如何获取按字段分组的最新文档？

python - Pandas fillna 不工作

sql - JSONB ILIKE 索引

Python Pandas 矩阵乘法多项运算合二为一

elasticsearch - 根据特定的uniqueid字段在Elasticsearch中获取汇总计数

elasticsearch - 在Elasticsearch中映射以限制Array数据类型的长度

python - django 表格 : change attribute of formfield dynamically

c# - 使用 pythonnet 从 .Net Core 调用 python 脚本