elasticsearch - 如何在Elasticsearch中获得总单词出现次数?

标签 elasticsearch elastic-stack elasticsearch-5 elasticsearch-aggregation elasticsearch-dsl

有没有一种方法来获取搜索到的字符串出现的总数而不是结果命中数?
嵌套文档的数据结构有点复杂,但是我在下面添加了数据的简单版本。如果有人能够帮助您找到答案,我可以将其转换为我的代码版本。
Elasticsearch 数据为:

[
    {
      "page": 1,
      "text": "Sample PDF Document.\nLorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum."
    },
    {
      "page": 2,
      "text": "sample PDF sample Document test content"
    },
    {
      "page": 3,
      "text": "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.\n sample content"
    },
    {
      "page": 4,
      "text": "PDF test sample Document lorem ipsum sample.Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. Sample content."
    },
    {
      "page": 5,
      "text": "PDF Document"
    },
    {
      "page": 6,
      "text": "sdsd"
    },
    {
      "page": 7,
      "text": "lorem ipsum"
    }
  ]
我能够进行过滤器聚合,但是文本sample PDF sample Document test content将返回计数为1,但单词sample在同一字段中是两次。

最佳答案

检查此answer。它也可以进行重构以处理嵌套字段,并且仅计算给定的单词子集。注意,由于所有单词拆分都会重复执行,因此速度可能会很慢。

关于elasticsearch - 如何在Elasticsearch中获得总单词出现次数?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63174836/

相关文章:

elasticsearch - Logstash错误:解析xml文件

java - Elasticsearch 索引包含并以搜索开头

c# - NEST Elasticsearch高级排序

java - ElasticSearch Java API AggregationBuilder 过滤器/全局 "name"参数导致 JSON 格式错误

mysql - ElasticSearch如何与Mysql集成

lucene - 如何使用ElasticSearch进行 “is contained in”

elasticsearch - 使用 logstash 更新/删除现有日志条目

elasticsearch - ElasticSearch 中的索引与部分更新

elasticsearch - Kibana/Elasticsearch 6.8-delete_by_query返回原因 “blocked by: [FORBIDDEN/8/index write (api)];”

elasticsearch - 在 ELK 中配置 number_of_shards 和 number_of_replicas