python - Elasticsearch-IndicesClient.put_settings无法正常工作

标签 python python-3.x elasticsearch

我正在尝试更新原始索引设置。
我的初始设置如下所示:

client.create(index = "movies", body= {
        "settings": {
            "number_of_shards": 1,
            "number_of_replicas": 0,

            "analysis": {
                "filter": {
                    "my_custom_stop_words": {
                        "type": "stop",
                        "stopwords": stop_words
                    }
                },
                "analyzer": {
                    "my_custom_analyzer": {
                        "filter": [
                            "lowercase",
                            "my_custom_stop_words"
                        ],
                        "type": "custom",
                        "tokenizer": "standard"
                    }
                }
            }
        },
        "mappings": {
            "properties": {
                "body": {
                    "type": "text",
                    "analyzer": "my_custom_analyzer",
                    "search_analyzer": "my_custom_analyzer",
                    "search_quote_analyzer": "my_custom_analyzer"
                }
            }
        }
    },

    ignore=400

) 

我正在尝试使用client.put_settings将同义词过滤器添加到现有分析器(my_custom_analyzer)中:
      client.put_settings(index='movies', body={
             "settings": {
                    "number_of_shards": 1,
                    "number_of_replicas": 0,

                    "analysis": {
                        "analyzer": {
                            "my_custom_analyzer": {
                                "filter": [
                                    "lowercase",
                                    "my_stops",
                                    "my_synonyms"
                                ],
                                "type": "custom",
                                "tokenizer": "standard"
                            }
                        },
                        "filter": {
                            "my_custom_stops": {
                                "type": "stop",
                                "stopwords": stop_words
                            },
                            "my_custom_synonyms": {
                                "ignore_case": "true",
                                "type": "synonym",
                                "synonyms": ["Harry Potter, HP => HP", "Terminator, TM => TM"]

                            }
                        }
                    }
             },
            "mappings": {
                "properties": {
                    "body": {
                        "type": "text",
                        "analyzer": "my_custom_analyzer",
                        "search_analyzer": "my_custom_analyzer",
                        "search_quote_analyzer": "my_custom_analyzer"
                    }
                }
            }
        },

        ignore=400

    ) 

但是,当我发出搜索查询(搜索“HP”)以查询电影索引时,我正在尝试对文档进行排名,以使包含“哈利·波特” 5次的文档成为列表中的顶部元素。现在,似乎“HP” 3倍的文档在列表中居首位,因此同义词过滤器不起作用。在执行client.put_settings之前,我已经关闭了电影索引,然后重新打开了索引。
任何帮助将不胜感激!

最佳答案

您应该重新索引所有数据,以便将更新的设置应用于所有数据和字段。

已建立索引的数据将不受更新的分析器的影响,只有在更新设置后已建立索引的文档才会受到影响。

不重新索引数据可能会产生错误的结果,因为旧数据是使用旧的自定义分析器而不是新的自定义分析器进行分析的。

解决此问题的最有效方法是创建一个新索引,并使用更新的设置将数据从旧索引移到新索引。

Reindex Api

跟着这些步骤:

POST _reindex
{
  "source": {
    "index": "movies"
  },
  "dest": {
    "index": "new_movies"
  }
}

DELETE movies

PUT movies
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0,
    "analysis": {
      "analyzer": {
        "my_custom_analyzer": {
          "filter": [
            "lowercase",
            "my_custom_stops",
            "my_custom_synonyms"
          ],
          "type": "custom",
          "tokenizer": "standard"
        }
      },
      "filter": {
        "my_custom_stops": {
          "type": "stop",
          "stopwords": "stop_words"
        },
        "my_custom_synonyms": {
          "ignore_case": "true",
          "type": "synonym",
          "synonyms": [
            "Harry Potter, HP => HP",
            "Terminator, TM => TM"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "body": {
        "type": "text",
        "analyzer": "my_custom_analyzer",
        "search_analyzer": "my_custom_analyzer",
        "search_quote_analyzer": "my_custom_analyzer"
      }
    }
  }
}

POST _reindex?wait_for_completion=false  
{
  "source": {
    "index": "new_movies"
  },
  "dest": {
    "index": "movies"
  }
}

验证所有数据到位后,您可以删除new_movies索引。 DELETE new_movies
希望这些帮助

关于python - Elasticsearch-IndicesClient.put_settings无法正常工作,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58896418/

相关文章:

python - 如果列的组合与 Panda Dataframe 相同,如何删除行

python - 对于 Python 3.x 整数,比位移快两倍?

python - AWS Elasticsearch批量插入延迟急剧增加

elasticsearch - 语言分析器无法找到单一结果

python - cassandra.cluster.NoHostAvailable : ('Unable to complete the operation against any hosts' , {})

python - 将元组列表更改为 2 个列表 - Python

python - 组合线性同余发生器

Python Gae Ajax,如何实现通知?

python - 在列表理解期间使用附加更改列表

java.lang.ClassNotFoundException : org. elasticsearch.http.AbstractHttpServerTransport