Solr 方面总和而不是计数

我是 Solr 的新手，我对实现一个特殊方面很感兴趣。

样本文件:

{ hostname: google.com, time_spent: 100 }
{ hostname: facebook.com, time_spent: 10 }
{ hostname: google.com, time_spent: 30 }
{ hostname: reddit.com, time_spent: 20 }
...

我想返回具有以下结构的构面:

{ google.com: 130, reddit.com: 20, facebook.com: 10 }

尽管 solr 返回值比这要详细得多，但重要的一点是构面的“计数”如何是文档的 time_spent 值的总和，而不是与构面匹配的文档的实际计数。

想法1:

我可以使用一个支点:

q:*:*
&facet=true
&facet.pivot=hostname,time_spent

但是，这将返回每个唯一主机名的所有唯一时间花费值的计数。我可以在我的应用程序中手动总结这一点，但这似乎很浪费。

想法#2

我可以使用统计模块:

q:*:*
&stats=true
&stats.field=time_spent
&stats.facet=hostname

然而，这有两个问题。首先，返回的结果包含所有主机名。这确实是个问题，因为我的数据集有超过 100 万个主机名。此外，返回的结果是未排序的 - 我需要按总时间降序呈现主机名。

您对此的帮助将不胜感激!

谢谢!

最佳答案

Solr >=5.1，这是可能的:

Facet Sorting

The default sort for a field or terms facet is by bucket count descending. We can optionally sort ascending or descending by any facet function that appears in each bucket. For example, if we wanted to find the top buckets by average price, then we would add sort:"x desc" to the previous facet request:
$ curl http://localhost:8983/solr/query -d 'q=*:*&
 json.facet={
   categories:{
     type : terms,
     field : cat,
     sort : "x desc",   // can also use sort:{x:desc}
     facet:{
       x : "avg(price)",
       y : "sum(price)"
     }
   }
 }
'

请参阅 Yonik 的博客:http://yonik.com/solr-facet-functions/

对于您的用例，这将是:

json.facet={
  hostname_time:{
    type: terms,
    field: hostname,
    sort: "time_total desc",
    facet:{
      time_total: "sum(time_spent)",
    }
  }
}

调用sum()嵌套方面仅在 6.3.0 中对我们有用。

关于Solr 方面总和而不是计数，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/25297846/

Solr 方面总和而不是计数

上一篇：continuous-integration - 如何安全配置CI服务器以对二进制文件进行数字签名？

下一篇：json - 将任何Scala对象转换为JSON