python - elasticsearch批量方法失败，并带有字母数字ID

我可以使用以下代码将数据从 Pandas 数据框导入elasticsearch。我只需要添加带有自动生成的序列号的id列。但是我可以使用messageid列作为ID吗？

# message id looks like nucb-9a7ff0885b95efae
df["id"] = [x for x in range(len(df["messageid"])) ]

# the above statement works but the following does not
#df["id"] = df["messageid"]

tmp = df.to_json(orient = "records")
df_json= json.loads(tmp)
import elasticsearch
es = elasticsearch.Elasticsearch('https://some_site.com')

for id in df_json:
    es.index(index='fromdf', doc_type='mydf', body=id)

elasticsearch中的id不必为数字。但是在使用python时，出现错误

RequestError: TransportError(400, u'MapperParsingException[failed to parse [id]]; nested: NumberFormatException[For input string: "nucb-a006fd8dd60ac7a6"]; ')

如何确保可以对非数字ID使用批量方法？

换句话说，该代码应与

df["id"] = df["messageid"]

最佳答案

索引方法签名:

def index(self, index, doc_type, body, id=None, params=None):
...
    :arg index: The name of the index
    :arg doc_type: The type of the document
    :arg body: The document
    :arg id: Document ID
...

因此，您的数据应转到正文，标识您数据的标识符应转到id。如果要存储由mesageid标识的消息，可以这样做:

for row_dict in df_json:
    es.index(index='fromdf', doc_type='mydf', body=row_dict, id=row_dict['messageid'])

您还可以通过使用已经定义的函数(例如pandas.DataFrame.to_dict)来极大地简化代码，从而不必为了获取字典而转换为json和加载json。

关于python - elasticsearch批量方法失败，并带有字母数字ID，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/34127339/

上一篇：android - 在Android上合并单个视频轨道和多个音频轨道

下一篇：ruby-on-rails - 如何解决Rails中Searchkick的未定义方法 `paginate'？

python - 如何在 python 中使用正则表达式获取前导零的所有索引

python - 如果我只想在我的条目下面有一个简单的评论框，我应该使用 Django 的评论框架还是自己编写？

python - 查找并显示具有近似接近条目的行

python - 如何在pandas合并期间使用多个where条件

python - 如何根据时间段评估日志跟踪

python - 在 App Engine 应用之间移动特定的 Google 数据存储区实体

python - Pandas python .describe() 格式化/输出

postgresql - 使用 Postgres 进行 Azure 搜索

elasticsearch - 聚合嵌套类型属性-使用NEST