python - 将avro文件索引到elasticsearch中

我写了这个简短的脚本

from elasticsearch import Elasticsearch
from fastavro import reader

es = Elasticsearch(['someIP:somePort'])
with open('data.avro', 'rb') as fo:
    avro_reader = reader(fo)
    for record in avro_reader:
        es.index(index="my_index", body=record)

它绝对正常。每个记录都是一个json，Elasticsearch可以为json文件建立索引。但是，有没有一种方法可以批量进行此操作，而不是在for循环中逐一进行？因为这很慢。

最佳答案

有两种方法可以做到这一点。

使用Elasticsearch Bulk API和requests python

使用Elasticsearch python库，该库在内部调用相同的批量API

    from elasticsearch import Elasticsearch
    from elasticsearch import helpers
    from fastavro import reader
    
    es = Elasticsearch(['someIP:somePort'])
    
    with open('data.avro', 'rb') as fo:
        avro_reader = reader(fo)
        records = [
            {
                "_index": "my_index",
                "_type": "record",
                "_id": j,
                "_source": record
            }
            for j,record in enumerate(avro_reader)
            ]
        helpers.bulk(es, records)

关于python - 将avro文件索引到elasticsearch中，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/62739810/

上一篇：elasticsearch - Elastic Search 6.x不断失败

下一篇：python - 编写queryDSL可以从sys日志数据中查找唯一的错误消息？

相关文章：

python - 如何在python中将日期转换为纪元时间

python - 获取具有退出状态的单独变量中的 stderr 和 stdout

python - 虚拟环境中的kivy，窗口提供程序错误(linux)

python - SQLAlchemy+ Tornado : How to create a scopefunc for SQLAlchemy's ScopedSession?

elasticsearch - Logstash 配置 : how to call the partial-update API from ElasticSearch output plugin?

avro - Windows 上的 Python Avro 为 ord() 提供了预期的字符，但找到了长度为 0 的字符串

elasticsearch - 索引时 Elasticsearch 面临内存问题

ElasticSearch 服务启动但无法访问并且不执行任何日志记录

java - 使用 Java 将 JSON 转换为 Avro

python-3.x - 将 avro 文件压缩为 gzip (.gz) 压缩