elasticsearch - 如何使minimum_should_match与嵌套映射一起使用?

标签 elasticsearch

我有一个关于ElasticSearch和类似此查询的问题。

具有映射:

{
  "directory.v1": {
    "mappings": {
      "profile.event": {
        "properties": {
          "event": {
            "properties": {
              "naics": {
                "type": "nested",
                "properties": {
                  "type": {
                    "type": "keyword"
                  },
                  "value": {
                    "type": "keyword"
                  }
                }
              }
            }
          },
          "user_id": {
            "type": "long"
          }
        }
      }
    }
  }
}

和文档(A)作为源,而文档(B)的查询方式与此类似(针对A)

配置文件A(用作来源):
{
  "_index": "directory.v1",
  "_type": "profile.event",
  "_id": "83731111.559",
  "_score": 1,
  "_source": {
    "user_id": 8373,
    "event": {
      "naics": [
        {
          "value": 331,
          "type": "naics"
        },
        {
          "value": 74,
          "type": "naics"
        },
        {
          "value": 938,
          "type": "naics"
        },
        {
          "value": 2048,
          "type": "naics"
        },
        {
          "value": 939,
          "type": "naics"
        },
        {
          "value": 2049,
          "type": "naics"
        },
        {
          "value": 940,
          "type": "naics"
        },
        {
          "value": 2050,
          "type": "naics"
        },
        {
          "value": 941,
          "type": "naics"
        },
        {
          "value": 2051,
          "type": "naics"
        },
        {
          "value": 942,
          "type": "naics"
        },
        {
          "value": 2052,
          "type": "naics"
        },
        {
          "value": 943,
          "type": "naics"
        },
        {
          "value": 2053,
          "type": "naics"
        },
        {
          "value": 944,
          "type": "naics"
        },
        {
          "value": 2054,
          "type": "naics"
        },
        {
          "value": 945,
          "type": "naics"
        },
        {
          "value": 2055,
          "type": "naics"
        },
        {
          "value": 473,
          "type": "naics"
        },
        {
          "value": 128,
          "type": "naics"
        },
        {
          "value": 10,
          "type": "naics"
        },
        {
          "value": 1242,
          "type": "naics"
        },
        {
          "value": 472,
          "type": "naics"
        },
        {
          "value": 1241,
          "type": "naics"
        }
      ]
    }
  }
}

配置文件B:
{
  "_index": "directory.v1",
  "_type": "profile.event",
  "_id": "46124111.559",
  "_score": 1,
  "_source": {
    "user_id": 46124,
    "event": {
      "naics": [
        {
          "value": 331,
          "type": "naics"
        },
        {
          "value": 74,
          "type": "naics"
        },
        {
          "value": 938,
          "type": "naics"
        },
        {
          "value": 2048,
          "type": "naics"
        },
        {
          "value": 939,
          "type": "naics"
        },
        {
          "value": 2049,
          "type": "naics"
        },
        {
          "value": 940,
          "type": "naics"
        },
        {
          "value": 2050,
          "type": "naics"
        },
        {
          "value": 941,
          "type": "naics"
        },
        {
          "value": 2051,
          "type": "naics"
        },
        {
          "value": 942,
          "type": "naics"
        },
        {
          "value": 2052,
          "type": "naics"
        },
        {
          "value": 943,
          "type": "naics"
        },
        {
          "value": 2053,
          "type": "naics"
        },
        {
          "value": 944,
          "type": "naics"
        },
        {
          "value": 2054,
          "type": "naics"
        },
        {
          "value": 945,
          "type": "naics"
        },
        {
          "value": 2055,
          "type": "naics"
        }
      ]
    }
  }
}

其中B文档具有A文档中包含的所有元素(naic)。

这样我真的不明白为什么要查询:
   {
      "query": {
        "nested": {
          "path": "event.naics",
          "query": {
            "more_like_this": {
              "like": [
                {
                  "_id": "83731111.559",
                  "_type": "profile.event"
                }
              ],
              "fields": [
                "event.naics.value"
              ],
              "min_term_freq": 1,
              "min_doc_freq": 1,
              "minimum_should_match": "8%"
            }
          }
        }
      }
    }

我有结果!!

但是当我增加min_should_match> = 9%时,它根本不匹配,也没有结果。

还尝试做这样的事情,这使我得到的结果高达11%
{
  "query": {
    "nested": {
      "path": "event.naics",
      "query": {
        "more_like_this": {
          "like": [
            {
              "_id": "83731111.559",
              "_type": "profile.event"
            }
          ],
          "fields": [
            "event.naics.*"
          ],
          "min_term_freq": 1,
          "min_doc_freq": 1,
          "minimum_should_match": "11%"
        }
      }
    }
  }
}

源文件的termvecor是:
{
    "_index": "directory.v1",
    "_type": "profile.event",
    "_id": "83731111.559",
    "_version": 5,
    "found": true,
    "took": 0,
    "term_vectors": {}
}

最佳答案

如果您获得了字段event.naics.value的文档“A”的术语 vector ,则将看到总共有24个术语,每个术语的频率为1。
因此,当您执行8%的匹配时,该值将向下舍入为所生成的24个should子句中的1个子句,因此您将获得一个匹配项。但是24个中的9%将舍入到2个子句,这不是bueno,因为每个嵌套文档只有一个值。

有关计算的详细信息,请参见本页底部
https://github.com/elastic/elasticsearch/blob/99f88f15c5febbca2d13b5b5fda27b844153bf1a/server/src/main/java/org/elasticsearch/common/lucene/search/Queries.java

更有可能这个来源在这里
https://github.com/elastic/elasticsearch/blob/46a79127edfb0cc93b7580624010ff81ca0cb2f4/server/src/main/java/org/elasticsearch/common/lucene/search/MoreLikeThisQuery.java

术语 vector

POST /directory.v1/profile.event/83731111.559/_termvectors
{
  "fields":["event.naics.value"],
  "offsets" : false,
  "payloads" : false,
  "positions" : false,
  "term_statistics" : true,
  "field_statistics" : true
}

关于elasticsearch - 如何使minimum_should_match与嵌套映射一起使用?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49244302/

相关文章:

php - 在 elasticaserch 中应用游标分页

java - ElasticSearch 中 BoolQuery 的 "filter"的用途是什么?

docker - 将 ElasticSearch Docker 容器部署到 AWS Fargate

elasticsearch - Logstash管道无法与csvfile一起使用

java - NoNodeAvailableException[配置的节点均不可用 : [{#transport#-1}{IfVRlxsUSDGVyFzl_Rabkg}{10. 10.10.109}{10.10.10.109:9300}]]

elasticsearch - 为什么在Elasticsearch 0.90 more_like_this_field查询中不支持stop_words?

node.js - 在 Elasticsearch 的搜索中包含 ObjectId

apache-spark - Hadoop - Elasticsearch - Spark 版本兼容性

events - Logstash:在过滤器中创建新事件

python - 无法将JSON发布到ElasticSearch-未找到uri [\]和方法[POST]的处理程序