Solr:每个文档的 fieldNorm 不同,没有文档提升

标签 solr lucene relevance solr-boost

我希望我的搜索结果按分数排序,他们正在这样做,但分数计算不正确。这就是说,不一定不正确,但与预期不同,我不确定为什么。我的目标是消除任何改变分数的因素。

如果我对两个对象执行匹配的搜索(其中对象 A 的分数预计高于对象 B),则首先返回对象 B。

在此示例中,假设我的查询是单个术语:“apples”。

ObjectA's title: "apples are apples" (2/3 terms)
ObjectA's description: "There were apples in the apples-apples and now the apples went all apples all over the apples!" (6/18 terms)
ObjectB's title: "apples are great" (1/3 terms)
ObjectB's description: "There were apples in the apples-room and now the apples went all bad all over the apples!" (4/18 terms)

标题字段没有提升(或者更确切地说,提升为 1),描述字段的提升为 0.8。我没有通过 solrconfig.xml 或我正在通过的查询指定文档提升。如果有另一种方法来指定文档增强,我可能会遗漏一种。

分析explain打印输出后,看起来ObjectA正在正确计算出比ObjectB更高的分数,就像我想要的那样,除了一个> 区别:ObjectB 的 title fieldNorm 始终高于 ObjectA 的。

<小时/>

下面是解释打印输出。您知道:标题字段为 mditem5_tns,描述字段为 mditem7_tns:

ObjectB:
1.3327172 = (MATCH) sum of:
  1.0352166 = (MATCH) max plus 0.1 times others of:
    0.9766194 = (MATCH) weight(mditem5_tns:appl in 0), product of:
      0.53929156 = queryWeight(mditem5_tns:appl), product of:
        1.8109303 = idf(docFreq=3, maxDocs=9)
        0.2977981 = queryNorm
      1.8109303 = (MATCH) fieldWeight(mditem5_tns:appl in 0), product of:
        1.0 = tf(termFreq(mditem5_tns:appl)=1)
        1.8109303 = idf(docFreq=3, maxDocs=9)
        1.0 = fieldNorm(field=mditem5_tns, doc=0)
    0.58597165 = (MATCH) weight(mditem7_tns:appl^0.8 in 0), product of:
      0.43143326 = queryWeight(mditem7_tns:appl^0.8), product of:
        0.8 = boost
        1.8109303 = idf(docFreq=3, maxDocs=9)
        0.2977981 = queryNorm
      1.3581977 = (MATCH) fieldWeight(mditem7_tns:appl in 0), product of:
        2.0 = tf(termFreq(mditem7_tns:appl)=4)
        1.8109303 = idf(docFreq=3, maxDocs=9)
        0.375 = fieldNorm(field=mditem7_tns, doc=0)
  0.2975006 = (MATCH) FunctionQuery(1000.0/(1.0*float(top(rord(lastmodified)))+1000.0)), product of:
    0.999001 = 1000.0/(1.0*float(1)+1000.0)
    1.0 = boost
    0.2977981 = queryNorm

ObjectA:
1.2324848 = (MATCH) sum of:
  0.93498427 = (MATCH) max plus 0.1 times others of:
    0.8632177 = (MATCH) weight(mditem5_tns:appl in 0), product of:
      0.53929156 = queryWeight(mditem5_tns:appl), product of:
        1.8109303 = idf(docFreq=3, maxDocs=9)
        0.2977981 = queryNorm
      1.6006513 = (MATCH) fieldWeight(mditem5_tns:appl in 0), product of:
        1.4142135 = tf(termFreq(mditem5_tns:appl)=2)
        1.8109303 = idf(docFreq=3, maxDocs=9)
        0.625 = fieldNorm(field=mditem5_tns, doc=0)
    0.7176658 = (MATCH) weight(mditem7_tns:appl^0.8 in 0), product of:
      0.43143326 = queryWeight(mditem7_tns:appl^0.8), product of:
        0.8 = boost
        1.8109303 = idf(docFreq=3, maxDocs=9)
        0.2977981 = queryNorm
      1.6634457 = (MATCH) fieldWeight(mditem7_tns:appl in 0), product of:
        2.4494898 = tf(termFreq(mditem7_tns:appl)=6)
        1.8109303 = idf(docFreq=3, maxDocs=9)
        0.375 = fieldNorm(field=mditem7_tns, doc=0)
  0.2975006 = (MATCH) FunctionQuery(1000.0/(1.0*float(top(rord(lastmodified)))+1000.0)), product of:
    0.999001 = 1000.0/(1.0*float(1)+1000.0)
    1.0 = boost
    0.2977981 = queryNorm

最佳答案

该问题是由词干分析器引起的。它将“apples are apples”扩展为“apples appl are apples appl”,从而使字段更长。由于文档 B 仅包含 1 个由词干分析器扩展的术语,因此该字段比文档 A 更短。

这会导致不同的 fieldNorms。

关于Solr:每个文档的 fieldNorm 不同,没有文档提升,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/3102895/

相关文章:

java - Lucene 不是空查询?

java - html文件的lucene索引

mysql - 跨多个相关表的相关性搜索

asp.net-mvc - 如何用MongoDB实现关键字和位置搜索?

java - 使用 SolrJ 时,我可以将它指向请求处理程序吗?

solr - 使用身份验证在控制台中创建 Solr 核心

java - 如何安全地关闭 IndexReader?

php - 显示相关内容的链接

search - Solr 相关性和提升最佳方法

java - Solr 和搜索运算符 - lucene apache solr