python - 使用 Python + Elasticsearch 时出现 BulkIndexError : ('500 document(s) failed to index.' ,

标签 python elasticsearch

代码如下

from elasticsearch import helpers, Elasticsearch
import csv

es = Elasticsearch()

with open(r'C:\Users\user\Desktop\police.csv') as f:
    index_name = 'census_data_records'
    doctype = 'census_record'
    reader = csv.reader(f)
    headers = []
    index = 0
    es.indices.delete(index=index_name, ignore=[400, 404])
    es.indices.create(index=index_name, ignore=400)
    action_list = []
    for row in reader:
        record ={
            '_op_type': 'index',
            '_index': index_name,
            '_type' : doctype,
            '_source': row
        }
        action_list.append(record)
    helpers.bulk(es, action_list)

数据集如下

,IncidntNum,Category,Descript,DayOfWeek,Date,Time,PdDistrict,Resolution,Address,X,Y
0,120058272,WEAPON LAWS,POSS OF PROHIBITED WEAPON,Friday,01/29/2016 12:00:00 AM,11:00,Kan,"ARREST, BOOKED",800 Block of BRYANT ST,10.98727872,75.44928793
1,120058272,WEAPON LAWS,"FIREARM, LOADED, IN VEHICLE, POSSESSION OR USE",Friday,01/29/2016 12:00:00 AM,11:00,Kan,"ARREST, BOOKED",800 Block of BRYANT ST,10.93029836,75.85839714
2,141059263,WARRANTS,WARRANT ARREST,Monday,04/25/2016 12:00:00 AM,14:59,Thi,"ARREST, BOOKED",KEITH ST / SHAFTER AV,10.02948575,74.81278836
3,160013662,NON-CRIMINAL,LOST PROPERTY,Tuesday,01/05/2016 0:00,23:50,Pat,NONE,JONES ST / OFARRELL ST,10.91399488,75.39788708
4,160002740,NON-CRIMINAL,LOST PROPERTY,Friday,01/01/2016 0:00,0:30,Ala,NONE,16TH ST / Alapuzha ST,12.35918751,74.87851143
5,160002869,ASSAULT,BATTERY,Friday,01/01/2016 0:00,21:35,Ern,NONE,1700 Block of BUSH ST,10.87491543,75.96476576
6,160003130,OTHER OFFENSES,PAROLE VIOLATION,Saturday,01/02/2016 0:00,0:04,Kan,"ARREST, BOOKED",MARY ST / HOWARD ST,10.6450246,75.7202032

插入时出现错误

  • BulkIndexError:(“500 个文档未能索引。”,

是否有其他方法可以将 csv 插入 Elasticsearch ?任何文档或博客也会有帮助

最佳答案

from elasticsearch import helpers, Elasticsearch
import pandas as pd
import json

df = pd.read_csv("police.csv")
json_str = df.to_json(orient='records')

json_records = json.loads(json_str)

es = Elasticsearch()
index_name = 'census_data_records'
doctype = 'census_record'
es.indices.delete(index=index_name, ignore=[400, 404])
es.indices.create(index=index_name, ignore=400)
action_list = []
for row in json_records:
    record ={
        '_op_type': 'index',
        '_index': index_name,
        '_type' : doctype,
        '_source': row
    }
    action_list.append(record)
helpers.bulk(es, action_list)

关于python - 使用 Python + Elasticsearch 时出现 BulkIndexError : ('500 document(s) failed to index.' ,,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62869254/

相关文章:

python - 如何在 tkinter 中单击按钮后更新 Canvas 上的文本

python - django haystack 中 2 个 SearchQuerySet 的并集

elasticsearch - Elasticsearch:聚合一个字段的所有唯一值,并通过另一个字段应用条件或过滤器

java - 在Python中实例化类——设置属性

python - 使用seaborn从列表列中分组箱线图

python 复利

python - 用于获取 python 中所有问题的 Redmine API

java - 我们是否需要在每次搜索请求后关闭elasticsearch节点

django - Elasticsearch 查询

date - Kibana 4不是可视化菜单中的好日子