lucene - 使用边缘 NGrams 进行索引以进行预输入

我正在尝试让 Elasticsearch 为一些文档编制索引以获得预先输入的建议。据我所知，Elasticsearch 中的边缘 NGram 处理是由底层的 Lucene 提供的。不幸的是，事实证明 Lucene 在这方面的文档对我来说很难理解。我想出的最好的是基于 https://gist.github.com/988923 , 但它似乎不起作用(具有这些设置的索引仅返回完整单词的匹配项，就好像这些设置不存在一样):

{
  "settings":{
    "index":{
      "analysis":{
        "analyzer":{
          "typeahead_analyzer":{
            "type":"custom",
            "tokenizer":"edgeNGram",
            "filter":["typeahead_ngram"]
          }
        },
        "filter":{
          "typeahead_ngram":{
            "type":"edgeNGram",
            "min_gram":1,
            "max_gram":8,
            "side":"front"
          }
        }
      }
    }
  }
}

我真的完全不知道分析器、分词器和过滤器是如何组合在一起的——我什至想要一个过滤器吗？我应该只有一个分词器吗？在为要使用的文档编制索引时是否必须引用这些设置？我怎样才能找出下面的 Lucene 对给定索引使用的设置？我该如何调试？帮助:-)

最佳答案

我使用 edgeNGram 解决了这个问题。以下是我用来完成此任务的映射和分析。

{
"analysis": {
    "analyzer": {
        "str_search_analyzer": {
            "tokenizer": "standard",
            "filter": [
                "lowercase"
            ]
        },
        "str_index_analyzer": {
            "tokenizer": "standard",
            "filter": [
                "lowercase",
                "substring"
            ]
        }
    },
    "filter": {
        "substring": {
            "type": "edgeNGram",
            "min_gram": 1,
            "max_gram": 10,
            "side": "front"
        }
    }
}

{
"index_name": {
    "properties": {
        "location": {
            "type": "geo_point"
        },
        "name": {
            "type": "string",
            "index": "analyzed",
            "search_analyzer": "str_search_analyzer",
            "index_analyzer": "str_index_analyzer"
        }
    }
}

一个重要的脚注是，我需要使用带有 AND 运算符的匹配查询来正确查询。

希望这对您有所帮助。

关于lucene - 使用边缘 NGrams 进行索引以进行预输入，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/14163491/

lucene - 使用边缘 NGrams 进行索引以进行预输入

上一篇：elasticsearch - Couchbase XDCR Elasticsearch速度和删除

下一篇：elasticsearch - ElasticSearch中具有独立过滤器的多类型搜索