Elasticsearch 服务在数据插入 jvm 堆时挂起并终止

标签 elasticsearch garbage-collection jvm heap-memory

我使用的是 elasticsearch 5.6.13 版本,我需要对 elasticsearch 进行一些专家配置。我在同一系统中有 3 个节点(节点 1、节点 2、节点 3),其中节点 1 是主节点,其他节点是 2 个数据节点。我有大约 40 个索引,我使用默认的 5 个主分片创建了所有这些索引,其中一些有 2 个副本。 我现在面临的问题是,我的数据(抓取)每天都在增长,我的一个索引中有 400GB 的数据。同样,其他 3 个索引也非常加载。 从最近几天开始,我在插入数据时遇到了问题,我的 elasticsearch 挂起,然后服务被终止,这影响了我的处理。我尝试了几件事。我正在分享系统规范和当前的 ES 配置 + 日志。请提出一些解决方案。

系统规范: 内存:160GB, CPU:AMD EPYC 7702P 64 核处理器, 硬盘:2TB SSD(安装ES的硬盘还剩500GB)

ES 配置 JVM 选项: -Xms26g, -Xmx26g (我只是尝试这个但不确定我的场景的完美堆大小是多少) 我只是编辑上面的行,文件的其余部分是默认的。我在所有三个节点上编辑这个 jvm.options 文件。

ES 日志

[2021-09-22T12:05:17,983][WARN ][o.e.m.j.JvmGcMonitorService] [sashanode1] [gc][170] overhead, spent [7.1s] collecting in the last [7.2s] [2021-09-22T12:05:21,868][WARN ][o.e.m.j.JvmGcMonitorService] [sashanode1] [gc][171] overhead, spent [3.7s] collecting in the last [1.9s] [2021-09-22T12:05:51,190][WARN ][o.e.m.j.JvmGcMonitorService] [sashanode1] [gc][172] overhead, spent [27.7s] collecting in the last [23.3s] [2021-09-22T12:06:54,629][WARN ][o.e.m.j.JvmGcMonitorService] [cluster_name] [gc][173] overhead, spent [57.5s] collecting in the last [1.1m] [2021-09-22T12:06:56,536][WARN ][o.e.m.j.JvmGcMonitorService] [cluster_name] [gc][174] overhead, spent [1.9s] collecting in the last [1.9s] [2021-09-22T12:07:02,176][WARN ][o.e.m.j.JvmGcMonitorService] [cluster_name] [gc][175] overhead, spent [5.4s] collecting in the last [5.6s] [2021-09-22T12:06:56,546][ERROR][o.e.i.e.Engine ] [cluster_name] [index_name][3] merge failed java.lang.OutOfMemoryError: Java heap space

[2021-09-22T12:06:56,548][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [cluster_name] fatal error in thread [elasticsearch[cluster_name][bulk][T#25]], exiting java.lang.OutOfMemoryError: Java heap space

一些日志

[2021-09-22T12:10:06,526][INFO ][o.e.n.Node ] [cluster_name] initializing ... [2021-09-22T12:10:06,589][INFO ][o.e.e.NodeEnvironment ] [cluster_name] using [1] data paths, mounts [[(D:)]], net usable_space [563.3gb], net total_space [1.7tb], spins? [unknown], types [NTFS] [2021-09-22T12:10:06,589][INFO ][o.e.e.NodeEnvironment ] [cluster_name] heap size [1.9gb], compressed ordinary object pointers [true] [2021-09-22T12:10:07,239][INFO ][o.e.n.Node ] [cluster_name] node name [sashanode1], node ID [2p-ux-OXRKGuxmN0efvF9Q] [2021-09-22T12:10:07,240][INFO ][o.e.n.Node ] [cluster_name] version[5.6.13], pid[57096], build[4d5320b/2018-10-30T19:05:08.237Z], OS[Windows Server 2019/10.0/amd64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_261/25.261-b12] [2021-09-22T12:10:07,240][INFO ][o.e.n.Node ] [cluster_name] JVM arguments [-Xms2g, -Xmx2g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -Djdk.io.permissionsUseCanonicalPath=true, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j.skipJansi=true, -XX:+HeapDumpOnOutOfMemoryError, -Delasticsearch, -Des.path.home=D:\Databases\ES\elastic and kibana 5.6.13\es_node_1, -Des.default.path.logs=D:\Databases\ES\elastic and kibana 5.6.13\es_node_1\logs, -Des.default.path.data=D:\Databases\ES\elastic and kibana 5.6.13\es_node_1\data, -Des.default.path.conf=D:\Databases\ES\elastic and kibana 5.6.13\es_node_1\config, exit, -Xms2048m, -Xmx2048m, -Xss1024k]

同样在我的 ES 文件夹中有很多随机名称的文件 (java_pid197036.hprof) 可以共享更多详细信息,请建议任何进一步的配置。 谢谢

_cluster/stats?pretty&human 的输出是

{ "_nodes": { "total": 3, "successful": 3, "failed": 0 }, "cluster_name": "cluster_name", "timestamp": 1632375228033, "status": "red", "indices": { "count": 42, "shards": { "total": 508, "primaries": 217, "replication": 1.3410138248847927, "index": { "shards": { "min": 2, "max": 60, "avg": 12.095238095238095 }, "primaries": { "min": 1, "max": 20, "avg": 5.166666666666667 }, "replication": { "min": 1.0, "max": 2.0, "avg": 1.2857142857142858 } } }, "docs": { "count": 107283077, "deleted": 1047418 }, "store": { "size": "530.2gb", "size_in_bytes": 569385384976, "throttle_time": "0s", "throttle_time_in_millis": 0 }, "fielddata": { "memory_size": "0b", "memory_size_in_bytes": 0, "evictions": 0 }, "query_cache": { "memory_size": "0b", "memory_size_in_bytes": 0, "total_count": 0, "hit_count": 0, "miss_count": 0, "cache_size": 0, "cache_count": 0, "evictions": 0 }, "completion": { "size": "0b", "size_in_bytes": 0 }, "segments": { "count": 3781, "memory": "2gb", "memory_in_bytes": 2174286255, "terms_memory": "1.7gb", "terms_memory_in_bytes": 1863786029, "stored_fields_memory": "105.6mb", "stored_fields_memory_in_bytes": 110789048, "term_vectors_memory": "0b", "term_vectors_memory_in_bytes": 0, "norms_memory": "31.9mb", "norms_memory_in_bytes": 33527808, "points_memory": "13.1mb", "points_memory_in_bytes": 13742470, "doc_values_memory": "145.3mb", "doc_values_memory_in_bytes": 152440900, "index_writer_memory": "0b", "index_writer_memory_in_bytes": 0, "version_map_memory": "0b", "version_map_memory_in_bytes": 0, "fixed_bit_set": "0b", "fixed_bit_set_memory_in_bytes": 0, "max_unsafe_auto_id_timestamp": 1632340789677, "file_sizes": { } } }, "nodes": { "count": { "total": 3, "data": 3, "coordinating_only": 0, "master": 1, "ingest": 3 }, "versions": [ "5.6.13" ], "os": { "available_processors": 192, "allocated_processors": 96, "names": [ { "name": "Windows Server 2019", "count": 3 } ], "mem": { "total": "478.4gb", "total_in_bytes": 513717497856, "free": "119.7gb", "free_in_bytes": 128535437312, "used": "358.7gb", "used_in_bytes": 385182060544, "free_percent": 25, "used_percent": 75 } }, "process": { "cpu": { "percent": 5 }, "open_file_descriptors": { "min": -1, "max": -1, "avg": 0 } }, "jvm": { "max_uptime": "1.9d", "max_uptime_in_millis": 167165106, "versions": [ { "version": "1.8.0_261", "vm_name": "Java HotSpot(TM) 64-Bit Server VM", "vm_version": "25.261-b12", "vm_vendor": "Oracle Corporation", "count": 3 } ], "mem": { "heap_used": "5gb", "heap_used_in_bytes": 5460944144, "heap_max": "5.8gb", "heap_max_in_bytes": 6227755008 }, "threads": 835 }, "fs": { "total": "1.7tb", "total_in_bytes": 1920365228032, "free": "499.1gb", "free_in_bytes": 535939969024, "available": "499.1gb", "available_in_bytes": 535939969024 }, "plugins": [ ], "network_types": { "transport_types": { "netty4": 3 }, "http_types": { "netty4": 3 } } } }

jvm.options 文件。

## JVM configuration

################################################################
## IMPORTANT: JVM heap size
################################################################
##
## You should always set the min and max JVM heap
## size to the same value. For example, to set
## the heap to 4 GB, set:
##
## -Xms4g
## -Xmx4g
##
## See https://www.elastic.co/guide/en/elasticsearch/reference/current/heap-size.html
## for more information
##
################################################################

# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space

-Xms26g
-Xmx26g

################################################################
## Expert settings
################################################################
##
## All settings below this section are considered
## expert settings. Don't tamper with them unless
## you understand what you are doing
##
################################################################

## GC configuration
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly

## optimizations

# pre-touch memory pages used by the JVM during initialization
-XX:+AlwaysPreTouch

## basic

# force the server VM (remove on 32-bit client JVMs)
-server

# explicitly set the stack size (reduce to 320k on 32-bit client JVMs)
-Xss1m

# set to headless, just in case
-Djava.awt.headless=true

# ensure UTF-8 encoding by default (e.g. filenames)
-Dfile.encoding=UTF-8

# use our provided JNA always versus the system one
-Djna.nosys=true

# use old-style file permissions on JDK9
-Djdk.io.permissionsUseCanonicalPath=true

# flags to configure Netty
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0

# log4j 2
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true
-Dlog4j.skipJansi=true

## heap dumps

# generate a heap dump when an allocation from the Java heap fails
# heap dumps are created in the working directory of the JVM
-XX:+HeapDumpOnOutOfMemoryError

# specify an alternative path for heap dumps
# ensure the directory exists and has sufficient space
#-XX:HeapDumpPath=${heap.dump.path}

## GC logging

#-XX:+PrintGCDetails
#-XX:+PrintGCTimeStamps
#-XX:+PrintGCDateStamps
#-XX:+PrintClassHistogram
#-XX:+PrintTenuringDistribution
#-XX:+PrintGCApplicationStoppedTime

# log GC status to a file with time stamps
# ensure the directory exists
#-Xloggc:${loggc}

# By default, the GC log file will not rotate.
# By uncommenting the lines below, the GC log file
# will be rotated every 128MB at most 32 times.
#-XX:+UseGCLogFileRotation
#-XX:NumberOfGCLogFiles=32
#-XX:GCLogFileSize=128M

# Elasticsearch 5.0.0 will throw an exception on unquoted field names in JSON.
# If documents were already indexed with unquoted fields in a previous version
# of Elasticsearch, some operations may throw errors.
#
# WARNING: This option will be removed in Elasticsearch 6.0.0 and is provided
# only for migration purposes.
#-Delasticsearch.json.allow_unquoted_field_names=true

和 elasticsearch.yml(主节点)

cluster.name: cluster_name
node.name: node1
node.master : true
network.host: 0.0.0.0
discovery.zen.ping.unicast.hosts: ["192.168.11.159", "192.168.11.157"]

最佳答案

我的问题解决了,这是由于堆大小问题,实际上我正在运行 ES 作为服务并且堆大小默认为 2 GB,并且没有反射(reflect)。 我只是使用堆大小为 10 GB 的更新 options.jvm 文件安装新服务,然后运行我的集群。它反射(reflect)了从 2 GB 到 10 GB 的堆大小。 我的问题解决了。感谢您的建议。

要检查您的堆大小,请使用此命令。

http://localhost:9200/_cat/nodes?h=heap*&v

关于Elasticsearch 服务在数据插入 jvm 堆时挂起并终止,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69280083/

相关文章:

elasticsearch - 在Elasticsearch中排除空数组字段-但包括缺少该字段的文档

c# - .NET 垃圾收集器的麻烦。 block 15-40 分钟

jvm - 为什么 JVM 有 `invokespecial` 和 `invokestatic` 操作码?

regex - 流利的正则表达式:将多个值分组

elasticsearch - Elasticsearch:使用Metric Aggregation的结果来过滤存储桶的元素并运行其他聚合

elasticsearch - 多次比赛不能以恒定分数工作

java - WeakReference 没有收集在大括号中?

java finalize方法的使用

java - Intellij Idea 使用什么 JVM 启动?

java - JVM以编程方式获取堆中最大的对象