elasticsearch - 通过整数字段 boost 结果

我正在尝试创建目的地并自动完成，我想通过受欢迎程度整数字段提高搜索结果。

我正在尝试使用这个function_score查询

'query' => [
                'function_score' => [
                    'query' => [
                        "bool" => [
                            "should" => [   
                                 [
                                    "multi_match"=>[
                                        "query"=>$text,
                                        "fields"=>[
                                           "destination_name_*"
                                        ],
                                        "type"=>"most_fields",
                                        "boost" => 2
                                    ]
                                ],
                                [
                                    "multi_match"=>[
                                        "query"=>$text,
                                        "fields"=>[
                                           "destination_name_*"
                                        ],
                                        "fuzziness" => "1",
                                        "prefix_length"=> 2                                   
                                    ]
                                ],
                                [
                                    "multi_match"=>[
                                        "query"=>$text,
                                        "fields"=>[
                                           "destination_name_*.exact"
                                        ],
                                        "boost" => 2                                   
                                    ]
                                ]
                            ]
                        ]
                    ],
                    'field_value_factor' => [
                        'field'=>'popularity'
                    ]
                ],
            ],

映射和设置:

'settings' => [ 
                'analysis' => [     
                    'filter' =>  [
                        'ngram_filter' => [
                            'type' => 'edge_ngram',
                            'min_gram' => 2,
                            'max_gram' => 20,
                        ]
                    ],
                    'analyzer' => [
                        'ngram_analyzer' => [
                            'type'      => 'custom',
                            "tokenizer" => "standard",
                            'filter'    => ['lowercase', 'ngram_filter'],
                        ]

                    ]
                ],   
            ],
            'mappings' =>[
                'doc' => [
                    "properties"=> [
                        "destination_name_en"=> [
                           "type"=> "text",
                           "term_vector"=> "yes",
                           "analyzer"=> "ngram_analyzer",
                           "search_analyzer"=> "standard",
                           "fields" => [
                                "exact" => [
                                    "type" => "text",
                                    "analyzer" => "standard"
                                ]
                           ]
                        ],
                        "destination_name_es"=> [
                           "type"=> "text",
                           "term_vector"=> "yes",
                           "analyzer"=> "ngram_analyzer",
                           "search_analyzer"=> "standard",
                           "fields" => [
                                "exact" => [
                                    "type" => "text",
                                    "analyzer" => "standard"
                                ]
                           ]
                        ],
                        "destination_name_pt"=> [
                           "type"=> "text",
                           "term_vector"=> "yes",
                           "analyzer"=> "ngram_analyzer",
                           "search_analyzer"=> "standard",
                           "fields" => [
                                "exact" => [
                                    "type" => "text",
                                    "analyzer" => "standard"
                                ]
                           ]
                        ],
                        "popularity"=> [
                           "type"=> "integer",
                        ]
                    ]
                ]
            ]

我将坎昆的流行度值设置为10，当我开始写“ca”时，第一个选项是坎昆。这项工作符合预期...

但是，当我尝试找到其他受欢迎程度值为0的城市(如巴亚尔塔港)时，问题就来了。当我写“Puerto Va”时，我得到以下结果:

1.-瓦尔达奥斯塔
2.波多黎各·洛佩兹
3.-布里斯托尔-弗吉尼亚州
还有很多其他...(但不是vallarta港)

重要的是要强调，在此查询中将列出功能分数和field_value_factor，以期达到预期效果(返回第一个位于vallarta的位置)。

我想用一个整数值增加热门城市的容量。

有什么建议吗？

谢谢!

最佳答案

默认情况下，您的field_value_factor将自然分数乘以popularity字段的值。因此，如果Puerto Vallarta的值为0，则其分数将始终为0。它将匹配，但永远不会出现在第一个结果中。

再加上您的 yield 将呈线性增长，这肯定不是您想要的，因为热门城市将完全压倒结果列表。

然后，您应该使用字段值因子doc here的属性modifier。

如果将其设置为log2p，它应该可以正常工作。在应用对数函数之前，修饰符log2p将popularity字段的值加2。这样，在2个受欢迎的城市和4个受欢迎的城市之间的提升差异就很明显了。但是随着人气分数的提高，差异会减小

例如:

popularity 2 => log(4) => 0.6
popularity 4 => log(6) => 0.77
popularity 20 => log(22) => 1.34
popularity 22 => log(24) => 1.38

将此添加到您的查询:

                'field_value_factor' => [
                    'field'=>'popularity',
                    'modifier' => 'log2p' <== add this
                ]

关于elasticsearch - 通过整数字段 boost 结果，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/53625282/

elasticsearch - 通过整数字段 boost 结果

上一篇：html - 无法使用html标签<audio>播放 'wav'文件

下一篇：elasticsearch - 更加重视领域的存在