elasticsearch - Ngram的 Elasticsearch 索引？

说我有一个句子This is a new city

Elastic搜索是否为单词的所有可能排列/组合创建索引。例如，单词“city”会创建索引“it”，“ty”，“ity”，“cit”等吗？

这些索引是在文档存储时还是在运行时创建的？

这些索引是保存在内存中还是数据库中？

最佳答案

取决于您的tokenizer。默认情况下，Elasticsearch使用的Standant Tokenizer是divides text into terms on word boundaries, as defined by the Unicode Text Segmentation algorithm.，这意味着您的句子将被标记为this, is, a, new, city。您可以根据需要创建自定义标记器。

当您将文档放入Elasticsearch时会对其建立索引。

数据保存在文件系统中:https://www.elastic.co/blog/found-dive-into-elasticsearch-storage

这是有关内部构件的博客文章:https://www.elastic.co/blog/found-elasticsearch-from-the-bottom-up

关于elasticsearch - Ngram的 Elasticsearch 索引？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/45164650/

上一篇：powershell - 检测 PowerShell session 正在使用哪种字体

下一篇：python - 在条目小部件 tkinter 中键入声音

相关文章：

java - Elasticsearch 堆大小问题/内存不足问题

java - 如何仅标记 Lucene 中的某些单词

lucene - 避免 lucence QueryParser Parse 异常？

python - 为文件中的每个单词创建字典并计算其后单词的频率

在R中用空格替换tibble中的单词而不使用反连接

database - 具有多个类别的ElasticSearch文档

elasticsearch - 通过 Brew 升级 Elasticsearch - 现在不会以无法识别的 VM 选项 'UseConcMarkSweepGC' 错误启动

elasticsearch - 索引 Elasticsearch 中没有固定类型

java - Elasticsearch:比较日期(无痛脚本)

python - 考虑到随着 n 的增加，内存需求也迅速增加，人们如何使用 n-gram 进行情感分析？