elasticsearch - 为什么同一查询中某些结果分数包含 queryWeight,而其他分数则不包含 queryWeight?

标签 elasticsearch lucene

我正在多个字段上使用一个术语执行 query_string 查询,_alltags.name ,并尝试了解得分。查询:{"query":{"query_string":{"query":"animal","fields":["_all","tags.name"]}}} 。以下是查询返回的文档:

  • 文档 1tags.name 完全匹配,但不在 _all 上.
  • 文档 8tags.name 完全匹配以及 _all .

文档 8 应该获胜,而且确实如此,但我对评分结果感到困惑。文档 1 似乎因 tags.name 而受到惩罚分数乘以 IDF 两次,而文档 8 的 tags.name分数仅乘以 IDF 一次。简而言之:

  • 它们都有一个组件 weight(tags.name:animal in 0) [PerFieldSimilarity] .
  • 在文档 1 中,我们有 weight = score = queryWeight x fieldWeight .
  • 在文档 8 中,我们有 weight = fieldWeight !

queryWeight包含idf ,这会导致文档 1 受到其 idf 的惩罚两次。

谁能理解这一点?

其他信息

  • 如果我删除 _all从查询的字段中,queryWeight完全从解释中消失了。
  • 添加"use_dis_max":true作为一个选项没有任何作用。
    • 但是,另外添加 "tie_breaker":0.7 (或任何值)确实通过为文档 8 提供我们在文档 1 中看到的更复杂的公式来影响文档 8。
    • 想法: bool 查询(就是这样)可能故意这样做,以便为匹配多个子查询的查询赋予更多权重。但是,这对于 dis_max 查询没有任何意义,它应该只返回子查询的最大值。

以下是相关的解释请求。查找嵌入的评论。

文档 1(仅匹配 tags.name):

curl -XGET 'http://localhost:9200/questions/question/1/_explain?pretty' -d '{"query":{"query_string":{"query":"animal","fields":["_all","tags.name"]}}}' :

{
  "ok" : true,
  "_index" : "questions_1390104463",
  "_type" : "question",
  "_id" : "1",
  "matched" : true,
  "explanation" : {
    "value" : 0.058849156,
    "description" : "max of:",
    "details" : [ {
      "value" : 0.058849156,
      "description" : "weight(tags.name:animal in 0) [PerFieldSimilarity], result of:",
      // weight = score = queryWeight x fieldWeight
      "details" : [ {
        // score and queryWeight are NOT a part of the other explain!
        "value" : 0.058849156,
        "description" : "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
        "details" : [ {
          "value" : 0.30685282,
          "description" : "queryWeight, product of:",
          "details" : [ {
            // This idf is NOT a part of the other explain!
            "value" : 0.30685282,
            "description" : "idf(docFreq=1, maxDocs=1)"
          }, {
            "value" : 1.0,
            "description" : "queryNorm"
          } ]
        }, {
          "value" : 0.19178301,
          "description" : "fieldWeight in 0, product of:",
          "details" : [ {
            "value" : 1.0,
            "description" : "tf(freq=1.0), with freq of:",
            "details" : [ {
              "value" : 1.0,
              "description" : "termFreq=1.0"
            } ]
          }, {
            "value" : 0.30685282,
            "description" : "idf(docFreq=1, maxDocs=1)"
          }, {
            "value" : 0.625,
            "description" : "fieldNorm(doc=0)"
          } ]
        } ]
      } ]
    } ]
  }

文档 8(同时匹配 _alltags.name):

curl -XGET 'http://localhost:9200/questions/question/8/_explain?pretty' -d '{"query":{"query_string":{"query":"animal","fields":["_all","tags.name"]}}}' :

{
  "ok" : true,
  "_index" : "questions_1390104463",
  "_type" : "question",
  "_id" : "8",
  "matched" : true,
  "explanation" : {
    "value" : 0.15342641,
    "description" : "max of:",
    "details" : [ {
      "value" : 0.033902764,
      "description" : "btq, product of:",
      "details" : [ {
        "value" : 0.033902764,
        "description" : "weight(_all:anim in 0) [PerFieldSimilarity], result of:",
        "details" : [ {
          "value" : 0.033902764,
          "description" : "fieldWeight in 0, product of:",
          "details" : [ {
            "value" : 0.70710677,
            "description" : "tf(freq=0.5), with freq of:",
            "details" : [ {
              "value" : 0.5,
              "description" : "phraseFreq=0.5"
            } ]
          }, {
            "value" : 0.30685282,
            "description" : "idf(docFreq=1, maxDocs=1)"
          }, {
            "value" : 0.15625,
            "description" : "fieldNorm(doc=0)"
          } ]
        } ]
      }, {
        "value" : 1.0,
        "description" : "allPayload(...)"
      } ]
    }, {
      "value" : 0.15342641,
      "description" : "weight(tags.name:animal in 0) [PerFieldSimilarity], result of:",
      // weight = fieldWeight
      // No score or queryWeight in sight!
      "details" : [ {
        "value" : 0.15342641,
        "description" : "fieldWeight in 0, product of:",
        "details" : [ {
          "value" : 1.0,
          "description" : "tf(freq=1.0), with freq of:",
          "details" : [ {
            "value" : 1.0,
            "description" : "termFreq=1.0"
          } ]
        }, {
          "value" : 0.30685282,
          "description" : "idf(docFreq=1, maxDocs=1)"
        }, {
          "value" : 0.5,
          "description" : "fieldNorm(doc=0)"
        } ]
      } ]
    } ]
  }
}

最佳答案

我没有答案。只是想提一下我在 Elasticsearch 论坛上发布的问题:https://groups.google.com/forum/#!topic/elasticsearch/xBKlFkq0SP0 当我得到答案时,我会在这里通知。

关于elasticsearch - 为什么同一查询中某些结果分数包含 queryWeight,而其他分数则不包含 queryWeight?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21213947/

相关文章:

python - 如何在Flask python应用程序中运行Elasticsearch的实例?

java - Lucene:分配自定义 ID?

java - Lucene 不是空查询?

java - 将精确匹配排名为最高的最佳 lucene 设置是什么

elasticsearch - 如何在查询匹配中设置最小词频?

groovy - Elasticsearch:通过将元素插入其数组字段来更新现有文档

elasticsearch - 如何在 Elasticsearch 中索引分层数据?

java - 为什么 Lucene Token Filter 类必须声明为 "final"?

solr - 确切的词不会提升 Solr

elasticsearch - Elasticsearch中的刷新是原子的吗?