只是想知道为什么 elasticsearch 仍然使用这种简单的路由值方法来决定数据必须存储到哪个分片。实际上,这种方法限制了我们将来更改分片的数量。如果 elasticsearch 使用一致性哈希(或什至更好的技术)之类的方法,它可以让我们有机会在未来更改分片编号。有人对此有解释或想法吗?
最佳答案
从 Elasticsearch 版本 6.1.0 开始,索引拆分是可能的。请参阅发行说明:https://www.elastic.co/blog/elasticsearch-6-1-0-released .
Split Index documentation实际上更详细地解释了为什么 Elasticsearch 不使用 Consistent Hashing。
Consistent hashing only requires 1/N-th of the keys to be relocated when growing the number of shards from N to N+1. However Elasticsearch’s unit of storage, shards, are Lucene indices. Because of their search-oriented data structure, taking a significant portion of a Lucene index, be it only 5% of documents, deleting them and indexing them on another shard typically comes with a much higher cost than with a key-value store.
关于elasticsearch - 为什么elasticsearch仍然使用模数的简单路由值?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46236029/