Elasticsearch 2.2.0 StressTest 上的高 CPU 和高 IOPS

我最近将 ES 从 1.5.2 版本升级到 2.2.0 版本，并添加了 Shield。我正在尝试使用 Locust 执行压力测试用数据爆炸集群(通过nodejs应用程序)。与之前的压力测试(1.5.2)相比，我得到了奇怪的结果:

        1.5.2 ver             2.2.0 ver

cpu     50% avg, 90% peak     87% avg, 96% peak

IOPS    30 avg, 300 peak      800 avg, 1122 peak

为什么ES这么努力？

另一个我无法理解的奇怪的事情，我认为与上面的内容有关，是插件头中的输出。以前(1.5.2)我看到索引将数据存储为:

Index_name

size: 10.3Gi (20.6Gi)

docs: 17,073,010 (17,073,010)

但是现在(2.2.0)它是这样的:

Index_name

size: 13.7Gi (29.3Gi)

docs: 10,217,220 (20,434,440)

正如你所看到的，在 ES 2.2.0 中数据本身翻倍了，为什么会发生这种情况？我的 v2.2.0 ES 配置有问题吗？

最佳答案

我在 Elasticsearch community forum 中得到了答案.

Zachary Tong 的回答:

Agreeing with the points @rusty raised: Doc values on by default adds some CPU/IO overhead and some more disk space, translog flushes on every action now (instead of every 5s) and the replica issue.

In addition to that, there was a change at the Lucene layer. Incoming blob of text, but the tl;dr is that Lucene identifies idle resources and utilizes them, making the resource usage look higher when it's really just getting work done faster.

So, in Elasticsearch 1.x, we forcefully throttled Lucene's segment merging process to prevent it from over-saturating your nodes/cluster.

The problem is that a strict threshold is almost never the right answer. If you are indexing heavily, you often want to increase the threshold to let Lucene use all your CPU and Disk IO. If you aren't indexing much, you likely want the threshold lower. But you also want it to be able to "burst" the limit for one-off merges when your cluster is relatively idle.

In Lucene 5.x (used in ES 2.0+), they added a new style of merge throttling that monitors how active the index is, and automatically adjusts the throttle threshold (see https://issues.apache.org/jira/browse/LUCENE-61191, https://github.com/elastic/elasticsearch/pull/92431 and https://github.com/elastic/elasticsearch/pull/91451).

In practice, what this means is that your indexing tends to be faster in ES 2.0+ because segments are allowed to merge as fast as your cluster can handle, without over-saturating your cluster. But it also means that your cluster will happily use any idle resources, which is why you see more resource utilization.

Basically, Lucene identified that those resources weren't being used...so it put them to work to finish the task faster.

关于Elasticsearch 2.2.0 StressTest 上的高 CPU 和高 IOPS，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/35748713/

Elasticsearch 2.2.0 StressTest 上的高 CPU 和高 IOPS

上一篇：elasticsearch - Grok 用于解析 log4j 日志

下一篇：azure - Windows 计算机文件复制 - DevOps 任务和 IP 地址