elasticsearch - 选择与 Elasticsearch 不同的

我有一些属于几个作者的文档集:

[
  { id: 1, author_id: 'mark', content: [...] },
  { id: 2, author_id: 'pierre', content: [...] },
  { id: 3, author_id: 'pierre', content: [...] },
  { id: 4, author_id: 'mark', content: [...] },
  { id: 5, author_id: 'william', content: [...] },
  ...
]

我想根据作者的 id 检索和分页不同的最佳匹配文档选择:

[
  { id: 1, author_id: 'mark', content: [...], _score: 100 },
  { id: 3, author_id: 'pierre', content: [...], _score: 90 },
  { id: 5, author_id: 'william', content: [...], _score: 80 },
  ...
]

这是我目前正在做的(伪代码):

unique_docs = res.results.to_a.uniq{ |doc| doc.author_id }

问题就在分页上:如何选择 20 个“不同”的文档？

有人指点term facets ，但我实际上并没有做标签云:

谢谢，
编辑

最佳答案

截至目前ElasticSearch does not provide a group_by equivalent ，这是我尝试手动完成的。
虽然 ES 社区正在努力直接解决这个问题(可能是一个插件)，但这里有一个基本的尝试可以满足我的需要。

假设。

我正在寻找相关内容
我假设前 300 个文档是相关的，所以我考虑将我的研究限制在这个选择上，无论是多少还是一些这些都来自相同的几位作者。
对于我的需要，我“真的”不需要完整的分页，这就足够了通过 ajax 更新的“显示更多”按钮。

缺点

结果不准确
因为我们每次获取 300 个文档，所以我们不知道会产生多少个独特的文档(可能是同一作者的 300 个文档!)。您应该了解它是否适合每位作者的平均文档数，并且可能考虑限制。
你需要做2次查询(等待远程调用成本):
- 第一个查询要求 300 个相关文档，仅包含以下字段:id 和 author_id
- 在第二个查询中检索分页 ID 的完整文档

这是一些 ruby 伪代码:https://gist.github.com/saxxi/6495116

关于elasticsearch - 选择与 Elasticsearch 不同的，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/17949400/

elasticsearch - 选择与 Elasticsearch 不同的

上一篇：Elasticsearch 内存占用过高

下一篇：elasticsearch - Elasticsearch 中的存储字段