elasticsearch - 了解elasticsearch查询说明

标签 elasticsearch explain

我试图理解 flex 文档中的Explain API评分:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-explain.html

当我仅凭几个文档就无法在自己的简单索引中弄清楚时,我尝试在上述文档页面上重现计算结果。

在示例中,它显示了一个1.3862944的“值”,其描述为:“idf,计算为log(1 +(docCount-docFreq + 0.5)/(docFreq + 0.5))”。在“详细信息”下,它为字段提供以下值:docFreq:1.0,docCount:5.0

使用提供的docFreq和docCount值,我将其计算为:log(1 +(5.0-1.0 + 0.5)/(1.0 + 0.5))= 0.602,与示例中的1.3862944不同。

我无法获得任何匹配的值。

我读错了吗?

以下是整个帖子

GET /twitter/_doc/0/_explain   
{ 
  "query" : {
    "match" : { "message" : "elasticsearch" }
  }
}

这将产生以下结果:
{
   "_index": "twitter",
   "_type": "_doc",
   "_id": "0",
   "matched": true,
   "explanation": {
       "value": 1.6943599,
       "description": "weight(message:elasticsearch in 0) [PerFieldSimilarity], result of:",
       "details": [
       {
        "value": 1.6943599,
        "description": "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
        "details": [
           {
              "value": 1.3862944,  <== This is the one I am trying
              "description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
              "details": [
                 {
                    "value": 1.0,
                    "description": "docFreq",
                    "details": []
                 },
                 {
                    "value": 5.0,
                    "description": "docCount",
                    "details": []
                  }
               ]
           },
            {
              "value": 1.2222223,
              "description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
              "details": [
                 {
                    "value": 1.0,
                    "description": "termFreq=1.0",
                    "details": []
                 },
                 {
                    "value": 1.2,
                    "description": "parameter k1",
                    "details": []
                 },
                 {
                    "value": 0.75,
                    "description": "parameter b",
                    "details": []
                 },
                 {
                    "value": 5.4,
                    "description": "avgFieldLength",
                    "details": []
                 },
                 {
                    "value": 3.0,
                    "description": "fieldLength",
                    "details": []
                 }
              ]
           }
        ]
     }
  ]
}
}

最佳答案

一如既往的解释非常准确,让我帮助您了解这些计算:

这是初始公式:

log(1 + (5.0 - 1.0 + 0.5) / (1.0 + 0.5))

下一步将是:
log(1 + 4.5 / 1.5)

多一个:
log(4) = ?

这是棘手的部分。您将此log视为以10为底的对数。但是,如果您看一下Lucene scorer的代码,您会发现它是一个ln,它恰好是1.386294
部分代码:
public float idf(long docFreq, long numDocs) {
    return (float)(Math.log(numDocs/(double)(docFreq+1)) + 1.0);
  }

其中Math.log定义如下:
public static double log(double a)

Returns the natural logarithm (base e) of a double value.

关于elasticsearch - 了解elasticsearch查询说明,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49520453/

相关文章:

ruby-on-rails - Rails,Tire,Elasticsearch:如何使用同义词?

amazon-web-services - AWS IAM角色不适用于Elasticsearch 5.0.0-alpha2,但适用于2.3版本

elasticsearch - Elasticsearch-突出显示不适用于附件

mysql - mysql 中 ref 的两个常量不应该是一个类型常量吗?

MySQL 查询忽略 where 子句的索引

sql - 如何防止更改某些值的执行计划

mysql - mysql explain 中更高的行计数意味着好还是坏?

ElasticSearch - NEST - 如何组合 AND 和 OR 语句

java - 如何使用 Elastic 对最后的 Null 进行排序?

mysql 解释和内部连接至少对我来说没有按预期工作