有没有一种方法来获取搜索到的字符串出现的总数而不是结果命中数?
嵌套文档的数据结构有点复杂,但是我在下面添加了数据的简单版本。如果有人能够帮助您找到答案,我可以将其转换为我的代码版本。
Elasticsearch 数据为:
[
{
"page": 1,
"text": "Sample PDF Document.\nLorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum."
},
{
"page": 2,
"text": "sample PDF sample Document test content"
},
{
"page": 3,
"text": "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.\n sample content"
},
{
"page": 4,
"text": "PDF test sample Document lorem ipsum sample.Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. Sample content."
},
{
"page": 5,
"text": "PDF Document"
},
{
"page": 6,
"text": "sdsd"
},
{
"page": 7,
"text": "lorem ipsum"
}
]
我能够进行过滤器聚合,但是文本sample PDF sample Document test content
将返回计数为1,但单词sample
在同一字段中是两次。
最佳答案
检查此answer。它也可以进行重构以处理嵌套字段,并且仅计算给定的单词子集。注意,由于所有单词拆分都会重复执行,因此速度可能会很慢。
关于elasticsearch - 如何在Elasticsearch中获得总单词出现次数?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63174836/