elasticsearch - 如何通过Elasticsearch减少高CPU使用率

我在Spring Boot 2.2.7和Spring Data Elasticsearch中使用Elasticsearch 6.8.10。
我有统计数据和趋势数据流存储在Kafka主题中。这些主题是使用Spring Kafka阅读的，并存储在MongoDB和Elasticsearch中以进行分析和报告。我遇到的问题是，在处理队列并将数据写入Elasticsearch时，Elasticsearch CPU的消耗量连续约为250％。这会导致整个应用程序出现零星的超时错误。我知道索引是一项繁重的操作，但我试图了解如何减少CPU使用率。
数据:

大约统计信息队列项(1.2M)

统计文档大小(220字节)

VM配置详细信息是:

4 CPU，16GB内存，20GB磁盘(SSD)

在Google Cloud Platform中的VM上运行。

VM仅用于Elasticsearch

Docker Elasticsearch配置详细信息:

我目前正在使用单节点

version: '2.4'
services:

  elasticsearch:
    container_name: elasticsearch
    image: 'docker.elastic.co/elasticsearch/elasticsearch:6.8.10'
    ports:
      - '9200:9200'
      - '9300:9300'
    mem_limit: 16GB
    environment:
      - discovery.type=single-node
      - cluster.name=docker-cluster
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms8g -Xmx8g"      
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    volumes:
      - 'esdata1:/usr/share/elasticsearch/data'
    restart: always

volumes:
  esdata1:
    driver: local

Spring Stat文档示例:

碎片= 1，副本= 0

@Document(indexName = "stats_test", type = "stat", shards = 1, replicas = 0)
public class EsStat {

    @Id
    @Field(type = FieldType.Keyword)
    private String id;

    @Field(type = FieldType.Keyword)
    private String entityOrRelationshipId;

    @Field(type = FieldType.Keyword)
    private String articleId;

    @Field(type = FieldType.Keyword)
    private String status;

    @Field(type = FieldType.Keyword)
    private String type;

    @JsonFormat(shape = JsonFormat.Shape.STRING, pattern = "yyyy-MM-dd'T'HH:mm:ss.SSSZ")
    @Field(type = FieldType.Date, format = DateFormat.custom, pattern = "yyyy-MM-dd'T'HH:mm:ss.SSSZ")
    private ZonedDateTime date;

    @JsonProperty("type")
    @Field(type = FieldType.Keyword)
    private String dataSource;

    // getter and setters 
}

Stats Spring存储库:

通过Spring Data Elasticsearch存储库完成索引:

public interface StatElasticsearchRepository extends ElasticsearchRepository<EsStat, String> {
}

统计信息映射:

{
  "stats": {
    "mappings": {
      "stat": {
        "properties": {
          "_class": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "articleId": {
            "type": "keyword"
          },
          "dataSource": {
            "type": "keyword"
          },
          "date": {
            "type": "date",
            "format": "yyyy-MM-dd'T'HH:mm:ss.SSSZ"
          },
          "entityOrRelationshipId": {
            "type": "keyword"
          },
          "id": {
            "type": "keyword"
          },
          "status": {
            "type": "keyword"
          },
          "type": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

如何确定为什么CPU使用率如此之高以及如何减少它？
任何意见或建议将不胜感激。如果需要，我很乐意添加更多的配置/输出。

最佳答案

如果没有日志，集群和索引设置以及如何建立索引，很难猜测到该问题，我是否可以建议您仔细阅读improving indexing和reindexing performance上的这些简短提示，并告诉我们您缺少的最佳做法，以便我们可以继续工作其中。
另外，我建议您尝试更改一些可以动态完成的设置，并告诉我们它们在多大程度上提高了性能。

关于elasticsearch - 如何通过Elasticsearch减少高CPU使用率，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/62583787/

elasticsearch - 如何通过Elasticsearch减少高CPU使用率

上一篇：elasticsearch - 在ElasticSearch中使用 token 化器“asciifolding”的“pattern”

下一篇：audio - 可以在Flash中应用声音过滤器吗？