我遇到一个问题,当我在Elasticsearch中使用match_phrase_prefix查询时,它没有返回我期望的所有结果,尤其是当查询是一个单词后跟一个字母时。
进行以下索引映射(这是一个保护敏感数据的虚构示例):
http://localhost:9200/test/drinks/_mapping
返回:
{
"test": {
"mappings": {
"drinks": {
"properties": {
"name": {
"type": "text"
}
}
}
}
}
}
在数百万其他记录中,还有:
{
"_index": "test",
"_type": "drinks",
"_id": "2",
"_score": 1,
"_source": {
"name": "Johnnie Walker Black Label"
}
},
{
"_index": "test",
"_type": "drinks",
"_id": "1",
"_score": 1,
"_source": {
"name": "Johnnie Walker Blue Label"
}
}
以下查询,是一个单词,后跟两个字母:
POST http://localhost:9200/test/drinks/_search
{
"query": {
"match_phrase_prefix" : {
"name" : "Walker Bl"
}
}
}
返回此:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.5753642,
"hits": [
{
"_index": "test",
"_type": "drinks",
"_id": "2",
"_score": 0.5753642,
"_source": {
"name": "Johnnie Walker Black Label"
}
},
{
"_index": "test",
"_type": "drinks",
"_id": "1",
"_score": 0.5753642,
"_source": {
"name": "Johnnie Walker Blue Label"
}
}
]
}
}
而此查询只有一个单词和一个字母:
POST http://localhost:9200/test/drinks/_search
{
"query": {
"match_phrase_prefix" : {
"name" : "Walker B"
}
}
}
不返回任何结果。这里可能会发生什么?
最佳答案
我假设您正在使用Elasticsearch 5.0及更高版本。
我认为可能是由于max_expansions默认值所致。
如文档here所示,max_expansions参数用于控制最后一项将使用多少个前缀进行扩展。默认值为50,这可能解释了为什么找到带有两个首字母B和L而不是仅包含B的“黑色”和“蓝色”的原因。
该文档对此非常清楚:
The match_phrase_prefix query is a poor-man’s autocomplete. It is very easy to use, which let’s you get started quickly with search-as-you-type but it’s results, which usually are good enough, can sometimes be confusing.
Consider the query string quick brown f. This query works by creating a phrase query out of quick and brown (i.e. the term quick must exist and must be followed by the term brown). Then it looks at the sorted term dictionary to find the first 50 terms that begin with f, and adds these terms to the phrase query.
The problem is that the first 50 terms may not include the term fox so the phase quick brown fox will not be found. This usually isn’t a problem as the user will continue to type more letters until the word they are looking for appears
如果您要寻找良好的性能,我将无法告诉您是否可以将此参数增加到50以上,因为我从未尝试过。
关于Elasticsearch匹配短语前缀不匹配所有术语,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47182126/