我正在尝试索引stackoverflow数据。首先,我创建具有指定映射和设置的索引。
@classmethod
def create_index_with_set_map(cls, name, elasticsearch):
"""
create index with default mappings and settings(and analyzer).
Argument:
name -- The name of the index.
elasticsearch -- Elasticsearch instance for connection.
"""
mappings = "mappings": {
"properties": {
"Body": {
"type": "text",
"analyzer": "whitespace",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}}}
settings = {
"analysis": {
"analyzer": {
"default": {
"type": "whitespace"
}
}
}
}
body = {
"settings": settings,
"mappings": mappings
}
res = elasticsearch.indices.create(index=name, body=body)
print(res)
然后,我尝试批量索引我的文档:@classmethod
def start_index(cls, index_name, index_path, elasticsearch, doc_type):
"""
This function is using bulk index.
Argument:
index_name -- the name of index
index_path -- the path of xml file to index
elasticsearch -- Elasticsearch instance for connection
doc_type -- doc type
Returns:
"""
for lines in Parser.xml_reader(index_path):
actions = [
{
"_index": index_name,
"_type": doc_type,
"_id": Parser.post_parser(line)['Id'],
"_source": Parser.post_parser(line)
}
for line in lines if Parser.post_parser(line) is not None
]
helpers.bulk(elasticsearch, actions)
给定错误:('500个文档未能建立索引。',[{'index':{'_index':'sof-question-answer2','_type':'Stackoverflow','_id':1','状态' :400,'错误':{'类型':'illegal_argument_exception','原因':'[Body]的映射器与现有映射冲突:\ n [mapper [Body]具有不同的[analyzer]]'},'数据' :...}
最佳答案
看来sof-question-answer2
索引已使用其他分析器创建,可能使用默认的standard analyzer
。
如果通过kibana运行GET sof-question-answer2/_mapping
命令,您将看到Body
字段没有whitespace
分析器。
为了解决此问题,您将必须删除索引,更新映射并为数据重新索引(如果有)。
关于python - 带有预定义映射和索引文档的Elasticsearch问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64662669/