我正在尝试创建目的地并自动完成,我想通过受欢迎程度整数字段提高搜索结果。
我正在尝试使用这个function_score查询
'query' => [
'function_score' => [
'query' => [
"bool" => [
"should" => [
[
"multi_match"=>[
"query"=>$text,
"fields"=>[
"destination_name_*"
],
"type"=>"most_fields",
"boost" => 2
]
],
[
"multi_match"=>[
"query"=>$text,
"fields"=>[
"destination_name_*"
],
"fuzziness" => "1",
"prefix_length"=> 2
]
],
[
"multi_match"=>[
"query"=>$text,
"fields"=>[
"destination_name_*.exact"
],
"boost" => 2
]
]
]
]
],
'field_value_factor' => [
'field'=>'popularity'
]
],
],
映射和设置:
'settings' => [
'analysis' => [
'filter' => [
'ngram_filter' => [
'type' => 'edge_ngram',
'min_gram' => 2,
'max_gram' => 20,
]
],
'analyzer' => [
'ngram_analyzer' => [
'type' => 'custom',
"tokenizer" => "standard",
'filter' => ['lowercase', 'ngram_filter'],
]
]
],
],
'mappings' =>[
'doc' => [
"properties"=> [
"destination_name_en"=> [
"type"=> "text",
"term_vector"=> "yes",
"analyzer"=> "ngram_analyzer",
"search_analyzer"=> "standard",
"fields" => [
"exact" => [
"type" => "text",
"analyzer" => "standard"
]
]
],
"destination_name_es"=> [
"type"=> "text",
"term_vector"=> "yes",
"analyzer"=> "ngram_analyzer",
"search_analyzer"=> "standard",
"fields" => [
"exact" => [
"type" => "text",
"analyzer" => "standard"
]
]
],
"destination_name_pt"=> [
"type"=> "text",
"term_vector"=> "yes",
"analyzer"=> "ngram_analyzer",
"search_analyzer"=> "standard",
"fields" => [
"exact" => [
"type" => "text",
"analyzer" => "standard"
]
]
],
"popularity"=> [
"type"=> "integer",
]
]
]
]
我将坎昆的流行度值设置为10,当我开始写“ca”时,第一个选项是坎昆。这项工作符合预期...
但是,当我尝试找到其他受欢迎程度值为0的城市(如巴亚尔塔港)时,问题就来了。当我写“Puerto Va”时,我得到以下结果:
1.-瓦尔达奥斯塔
2.波多黎各·洛佩兹
3.-布里斯托尔-弗吉尼亚州
还有很多其他...(但不是vallarta港)
重要的是要强调,在此查询中将列出功能分数和field_value_factor,以期达到预期效果(返回第一个位于vallarta的位置)。
我想用一个整数值增加热门城市的容量。
有什么建议吗?
谢谢!
最佳答案
默认情况下,您的field_value_factor
将自然分数乘以popularity
字段的值。因此,如果Puerto Vallarta
的值为0,则其分数将始终为0。它将匹配,但永远不会出现在第一个结果中。
再加上您的 yield 将呈线性增长,这肯定不是您想要的,因为热门城市将完全压倒结果列表。
然后,您应该使用字段值因子doc here的属性modifier
。
如果将其设置为log2p
,它应该可以正常工作。在应用对数函数之前,修饰符log2p
将popularity
字段的值加2。这样,在2个受欢迎的城市和4个受欢迎的城市之间的提升差异就很明显了。但是随着人气分数的提高,差异会减小
例如:
popularity 2 => log(4) => 0.6
popularity 4 => log(6) => 0.77
popularity 20 => log(22) => 1.34
popularity 22 => log(24) => 1.38
将此添加到您的查询:
'field_value_factor' => [
'field'=>'popularity',
'modifier' => 'log2p' <== add this
]
关于elasticsearch - 通过整数字段 boost 结果,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53625282/