我有一个关于ElasticSearch和类似此查询的问题。
具有映射:
{
"directory.v1": {
"mappings": {
"profile.event": {
"properties": {
"event": {
"properties": {
"naics": {
"type": "nested",
"properties": {
"type": {
"type": "keyword"
},
"value": {
"type": "keyword"
}
}
}
}
},
"user_id": {
"type": "long"
}
}
}
}
}
}
和文档(A)作为源,而文档(B)的查询方式与此类似(针对A)
配置文件A(用作来源):
{
"_index": "directory.v1",
"_type": "profile.event",
"_id": "83731111.559",
"_score": 1,
"_source": {
"user_id": 8373,
"event": {
"naics": [
{
"value": 331,
"type": "naics"
},
{
"value": 74,
"type": "naics"
},
{
"value": 938,
"type": "naics"
},
{
"value": 2048,
"type": "naics"
},
{
"value": 939,
"type": "naics"
},
{
"value": 2049,
"type": "naics"
},
{
"value": 940,
"type": "naics"
},
{
"value": 2050,
"type": "naics"
},
{
"value": 941,
"type": "naics"
},
{
"value": 2051,
"type": "naics"
},
{
"value": 942,
"type": "naics"
},
{
"value": 2052,
"type": "naics"
},
{
"value": 943,
"type": "naics"
},
{
"value": 2053,
"type": "naics"
},
{
"value": 944,
"type": "naics"
},
{
"value": 2054,
"type": "naics"
},
{
"value": 945,
"type": "naics"
},
{
"value": 2055,
"type": "naics"
},
{
"value": 473,
"type": "naics"
},
{
"value": 128,
"type": "naics"
},
{
"value": 10,
"type": "naics"
},
{
"value": 1242,
"type": "naics"
},
{
"value": 472,
"type": "naics"
},
{
"value": 1241,
"type": "naics"
}
]
}
}
}
配置文件B:
{
"_index": "directory.v1",
"_type": "profile.event",
"_id": "46124111.559",
"_score": 1,
"_source": {
"user_id": 46124,
"event": {
"naics": [
{
"value": 331,
"type": "naics"
},
{
"value": 74,
"type": "naics"
},
{
"value": 938,
"type": "naics"
},
{
"value": 2048,
"type": "naics"
},
{
"value": 939,
"type": "naics"
},
{
"value": 2049,
"type": "naics"
},
{
"value": 940,
"type": "naics"
},
{
"value": 2050,
"type": "naics"
},
{
"value": 941,
"type": "naics"
},
{
"value": 2051,
"type": "naics"
},
{
"value": 942,
"type": "naics"
},
{
"value": 2052,
"type": "naics"
},
{
"value": 943,
"type": "naics"
},
{
"value": 2053,
"type": "naics"
},
{
"value": 944,
"type": "naics"
},
{
"value": 2054,
"type": "naics"
},
{
"value": 945,
"type": "naics"
},
{
"value": 2055,
"type": "naics"
}
]
}
}
}
其中B文档具有A文档中包含的所有元素(naic)。
这样我真的不明白为什么要查询:
{
"query": {
"nested": {
"path": "event.naics",
"query": {
"more_like_this": {
"like": [
{
"_id": "83731111.559",
"_type": "profile.event"
}
],
"fields": [
"event.naics.value"
],
"min_term_freq": 1,
"min_doc_freq": 1,
"minimum_should_match": "8%"
}
}
}
}
}
我有结果!!
但是当我增加min_should_match> = 9%时,它根本不匹配,也没有结果。
还尝试做这样的事情,这使我得到的结果高达11%
{
"query": {
"nested": {
"path": "event.naics",
"query": {
"more_like_this": {
"like": [
{
"_id": "83731111.559",
"_type": "profile.event"
}
],
"fields": [
"event.naics.*"
],
"min_term_freq": 1,
"min_doc_freq": 1,
"minimum_should_match": "11%"
}
}
}
}
}
源文件的termvecor是:
{
"_index": "directory.v1",
"_type": "profile.event",
"_id": "83731111.559",
"_version": 5,
"found": true,
"took": 0,
"term_vectors": {}
}
最佳答案
如果您获得了字段event.naics.value的文档“A”的术语 vector ,则将看到总共有24个术语,每个术语的频率为1。
因此,当您执行8%的匹配时,该值将向下舍入为所生成的24个should子句中的1个子句,因此您将获得一个匹配项。但是24个中的9%将舍入到2个子句,这不是bueno,因为每个嵌套文档只有一个值。
有关计算的详细信息,请参见本页底部
https://github.com/elastic/elasticsearch/blob/99f88f15c5febbca2d13b5b5fda27b844153bf1a/server/src/main/java/org/elasticsearch/common/lucene/search/Queries.java
更有可能这个来源在这里
https://github.com/elastic/elasticsearch/blob/46a79127edfb0cc93b7580624010ff81ca0cb2f4/server/src/main/java/org/elasticsearch/common/lucene/search/MoreLikeThisQuery.java
术语 vector
POST /directory.v1/profile.event/83731111.559/_termvectors
{
"fields":["event.naics.value"],
"offsets" : false,
"payloads" : false,
"positions" : false,
"term_statistics" : true,
"field_statistics" : true
}
关于elasticsearch - 如何使minimum_should_match与嵌套映射一起使用?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49244302/