我写了这个简短的脚本
from elasticsearch import Elasticsearch
from fastavro import reader
es = Elasticsearch(['someIP:somePort'])
with open('data.avro', 'rb') as fo:
avro_reader = reader(fo)
for record in avro_reader:
es.index(index="my_index", body=record)
它绝对正常。每个记录都是一个json,Elasticsearch可以为json文件建立索引。但是,有没有一种方法可以批量进行此操作,而不是在for循环中逐一进行?因为这很慢。
最佳答案
有两种方法可以做到这一点。
requests
python from elasticsearch import Elasticsearch
from elasticsearch import helpers
from fastavro import reader
es = Elasticsearch(['someIP:somePort'])
with open('data.avro', 'rb') as fo:
avro_reader = reader(fo)
records = [
{
"_index": "my_index",
"_type": "record",
"_id": j,
"_source": record
}
for j,record in enumerate(avro_reader)
]
helpers.bulk(es, records)
关于python - 将avro文件索引到elasticsearch中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62739810/