Elasticsearch匹配短语前缀不匹配所有术语

标签 elasticsearch match missing-data querying

我遇到一个问题,当我在Elasticsearch中使用match_phrase_prefix查询时,它没有返回我期望的所有结果,尤其是当查询是一个单词后跟一个字母时。

进行以下索引映射(这是一个保护敏感数据的虚构示例):

http://localhost:9200/test/drinks/_mapping

返回:
{
  "test": {
    "mappings": {
      "drinks": {
        "properties": {
          "name": {
            "type": "text"
          }
        }
      }
    }
  }
}

在数百万其他记录中,还有:
{
    "_index": "test",
    "_type": "drinks",
    "_id": "2",
    "_score": 1,
    "_source": {
        "name": "Johnnie Walker Black Label"
    }
},
{
    "_index": "test",
    "_type": "drinks",
    "_id": "1",
    "_score": 1,
    "_source": {
        "name": "Johnnie Walker Blue Label"
    }
}

以下查询,是一个单词,后跟两个字母:
POST http://localhost:9200/test/drinks/_search
{
    "query": {
        "match_phrase_prefix" : {
            "name" : "Walker Bl"
        }
    }
}

返回此:
{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 2,
        "max_score": 0.5753642,
        "hits": [
            {
                "_index": "test",
                "_type": "drinks",
                "_id": "2",
                "_score": 0.5753642,
                "_source": {
                    "name": "Johnnie Walker Black Label"
                }
           },
           {
               "_index": "test",
               "_type": "drinks",
               "_id": "1",
               "_score": 0.5753642,
               "_source": {
                   "name": "Johnnie Walker Blue Label"
                }
            }
        ]
    }
}

而此查询只有一个单词和一个字母:
POST http://localhost:9200/test/drinks/_search
{
    "query": {
        "match_phrase_prefix" : {
            "name" : "Walker B"
        }
    }
}

不返回任何结果。这里可能会发生什么?

最佳答案

我假设您正在使用Elasticsearch 5.0及更高版本。
我认为可能是由于max_expansions默认值所致。
如文档here所示,max_expansions参数用于控制最后一项将使用多少个前缀进行扩展。默认值为50,这可能解释了为什么找到带有两个首字母B和L而不是仅包含B的“黑色”和“蓝色”的原因。
该文档对此非常清楚:

The match_phrase_prefix query is a poor-man’s autocomplete. It is very easy to use, which let’s you get started quickly with search-as-you-type but it’s results, which usually are good enough, can sometimes be confusing.

Consider the query string quick brown f. This query works by creating a phrase query out of quick and brown (i.e. the term quick must exist and must be followed by the term brown). Then it looks at the sorted term dictionary to find the first 50 terms that begin with f, and adds these terms to the phrase query.

The problem is that the first 50 terms may not include the term fox so the phase quick brown fox will not be found. This usually isn’t a problem as the user will continue to type more letters until the word they are looking for appears


如果您要寻找良好的性能,我将无法告诉您是否可以将此参数增加到50以上,因为我从未尝试过。

关于Elasticsearch匹配短语前缀不匹配所有术语,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47182126/

相关文章:

r - 用R中的日期中位数插补数据

c# - logstash @timestamp 到 .NET 中强类型的 Elasticsearch 映射

elasticsearch - Searchkick::ImportError尝试重新索引时

javascript - 根据匹配的数据返回数组中的对象

MySQL 最长前缀匹配 100 万条记录与 3000 种可能性

python - 填充 Pandas Dataframe 中的货币缺失数据

julia - 在 Julia : Find the mean of an array with missing values

elasticsearch - 如何在Elasticsearch中获取所有项目的特定值

elasticsearch - Elasticsearch - search_after 参数

Java将2个字符串与变量中的精确匹配项相匹配