我想获取按总数排序的用户列表的常用字词。
例:
我有一个用户使用的单词索引。
docs:
[
{
user_id: 1,
word: 'food',
count: 2
},
{
user_id: 1,
word: 'thor',
count: 1
},
{
user_id: 1,
word: 'beer',
count: 7
},
{
user_id: 2,
word: 'summer',
count: 12
},
{
user_id: 2,
word: 'thor',
count: 4
},
{
user_id: 1,
word: 'beer',
count: 2
},
..otheruserdetails..
]
输入:
user_ids: [1, 2]
所需的输出:
[
{
'word': 'beer',
'total_count': 9
},
{
'word': 'thor',
'total_count': 5
}
]
我到目前为止所拥有的:
user_id
获取所有文档( bool(boolean) 应查询)但是,这是不可行的,因为word文档会变得越来越大,并且应用程序层将无法跟上。任何方式将其移至ES查询?
最佳答案
您可以使用Terms aggregation和Value Count aggregation
可以将“术语聚合”视为“分组依据”。输出将给出一个唯一的userIds列表,该用户下所有单词的列表以及每个单词的最终计数
{
"from": 0,
"size": 10,
"query": {
"terms": {
"user_id": [
"1",
"2"
]
}
},
"aggs": {
"users": {
"terms": {
"field": "user_id",
"size": 10
},
"aggs": {
"words": {
"terms": {
"field": "word.keyword",
"size": 10
},
"aggs": {
"word_count": {
"value_count": {
"field": "word.keyword"
}
}
}
}
}
}
}
}
结果
"hits" : [
{
"_index" : "index89",
"_type" : "_doc",
"_id" : "gFRzr3ABAWOsYG7t2tpt",
"_score" : 1.0,
"_source" : {
"user_id" : 1,
"word" : "thor",
"count" : 1
}
},
{
"_index" : "index89",
"_type" : "_doc",
"_id" : "flRzr3ABAWOsYG7t0dqI",
"_score" : 1.0,
"_source" : {
"user_id" : 1,
"word" : "food",
"count" : 2
}
},
{
"_index" : "index89",
"_type" : "_doc",
"_id" : "f1Rzr3ABAWOsYG7t19ps",
"_score" : 1.0,
"_source" : {
"user_id" : 2,
"word" : "thor",
"count" : 4
}
},
{
"_index" : "index89",
"_type" : "_doc",
"_id" : "gVRzr3ABAWOsYG7t8NrR",
"_score" : 1.0,
"_source" : {
"user_id" : 1,
"word" : "food",
"count" : 2
}
},
{
"_index" : "index89",
"_type" : "_doc",
"_id" : "glRzr3ABAWOsYG7t-Npj",
"_score" : 1.0,
"_source" : {
"user_id" : 1,
"word" : "thor",
"count" : 1
}
},
{
"_index" : "index89",
"_type" : "_doc",
"_id" : "g1Rzr3ABAWOsYG7t_9po",
"_score" : 1.0,
"_source" : {
"user_id" : 2,
"word" : "thor",
"count" : 4
}
}
]
},
"aggregations" : {
"users" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 1,
"doc_count" : 4,
"words" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "food",
"doc_count" : 2,
"word_count" : {
"value" : 2
}
},
{
"key" : "thor",
"doc_count" : 2,
"word_count" : {
"value" : 2
}
}
]
}
},
{
"key" : 2,
"doc_count" : 2,
"words" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "thor",
"doc_count" : 2,
"word_count" : {
"value" : 2
}
}
]
}
}
]
}
}
关于elasticsearch - Elasticsearch 路口查询,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60562308/