lucene - 无需保存数据的 Elasticsearch 引擎

标签 lucene elasticsearch

Elastic / Lucene真的需要在文档中存储所有索引数据吗？您难道不就通过通过传递数据，以便Lucene may index the words into its hash table并为每个文档都具有一个字段的URL(或对您有意义的指针)返回每个文档的来源？

一个简单的例子就是索引Wikipedia.org。如果我将每个网页传递给Elastic / Lucene进行索引-如果Lucene为每个网页的主要文本建立索引并有相应的URL字段来答复搜索，为什么我需要将每个网页的主要文本保存在字段中？

我们花了很多钱来存储大量冗余数据-我只是想知道为什么Lucene是从哈希表中搜索而不是从实际字段中搜索，所以我们将数据保存到...如果我们不想要，为什么要保存该数据？

有没有一种方法可以在Elastic中索引全文文档而不必保存这些文档中的所有全文数据？

最佳答案

_source字段有很多选项。这是实际存储原始文档的字段。您可以完全禁用它，也可以决定保留哪些字段。可以在文档中找到更多信息:

https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-source-field.html

关于lucene - 无需保存数据的 Elasticsearch 引擎，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/31928414/

上一篇：powershell - 在PowerShell中获取PATH环境变量

下一篇：powershell - 使用 powershell 从 Exchange 服务器发送电子邮件

php - 如何在Elasticsearch中使用聚合？我尝试使用以下代码，但显示错误

elasticsearch - {"error":"Content-Type header [] is not supported","status":406} When Inserting Data to Elasticsearch with Golang

elasticsearch - Elasticsearch查询结果与分析字符串不一致

java - 检索使用 ScheduledExecutorService 计划的任务实例

Lucene.net 模糊短语搜索

elasticsearch - Elasticsearch:查找在数组中没有匹配某些值的嵌套对象的文档

elasticsearch - 真正简单标签的logstash _grokparsefailure

c# - RavenDb "In"运算符如何工作？

java - `lucene 5.0.0` 的 SmartChineseAnalyzer 可以指定或添加我的自定义词典吗？