我们在项目中使用分布式 Hazelcast map ,最近遇到了获取延迟极高的问题。我们使用了 IMap.get(...)
调用,在某些情况下需要几个小时才能完成。在此事件之后,我们切换到带有超时的 IMap.getAsync(...)
API,这为我们解决了问题,但我很好奇是否有人遇到类似的问题。
我们的 Hazelcast 版本是 3.9.0。在事件期间,我们将 hazelcast.operation.call.timeout.millis
设置为 5000,并将 async-backup-count="3"
与 read-backup-数据=“true”
。由于不相关的后台处理,我们在某些主机上还出现了零星的 CPU 使用率峰值(几分钟内高达 100%),这可能会影响 Hazelcast。
我们在日志中发现的唯一可疑的事情是,在事件发生期间,所有主机都在提示某个特定主机,如下所示:
Oct 24, 2018 3:53:01 PM com.hazelcast.spi.impl.operationservice.impl.InboundResponseHandler
WARNING: [hostX]:5702 [dev] [3.9] No Invocation found for call timeout response with callId739863 sent from [hostY]:5702
Oct 24, 2018 3:53:01 PM com.hazelcast.spi.impl.operationservice.impl.InboundResponseHandler
WARNING: [hostX]:5702 [dev] [3.9] No Invocation found for call timeout response with callId739864 sent from [hostY]:5702
Oct 24, 2018 3:53:01 PM com.hazelcast.spi.impl.operationservice.impl.InboundResponseHandler
WARNING: [hostX]:5702 [dev] [3.9] No Invocation found for call timeout response with callId739852 sent from [hostY]:5702
Oct 24, 2018 3:53:01 PM com.hazelcast.spi.impl.operationservice.impl.InboundResponseHandler
WARNING: [hostX]:5702 [dev] [3.9] No Invocation found for call timeout response with callId739870 sent from [hostY]:5702
Oct 24, 2018 3:53:01 PM com.hazelcast.spi.impl.operationservice.impl.InboundResponseHandler
WARNING: [hostX]:5702 [dev] [3.9] No Invocation found for call timeout response with callId739874 sent from [hostY]:5702
Oct 24, 2018 3:53:01 PM com.hazelcast.spi.impl.operationservice.impl.InboundResponseHandler
在hostY
的日志中:
Oct 24, 2018 3:53:01 PM com.hazelcast.spi.impl.operationservice.impl.InvocationMonitor
WARNING: [hostY]:5702 [dev] [3.9] MonitorInvocationsTask delayed 14294 ms
Oct 24, 2018 3:53:01 PM com.hazelcast.spi.impl.operationservice.impl.InvocationMonitor
WARNING: [hostY]:5702 [dev] [3.9] BroadcastOperationControlTask delayed 13544 ms
有什么想法吗?
最佳答案
从 hostY
的日志来看,hostY
似乎遭受 GC 暂停的困扰。 MonitorInitationsTask
计划每秒运行一次,但它表示其执行延迟了 14 秒。由于您的配置 (hazelcast.operation.call.timeout.millis/4 = 1250 ms
),BroadcastOperationControlTask
应该几乎每秒调度一次,但类似地它会延迟 13 秒。
您可以通过启用 GC 日志来验证这一点。此外,当内存和/或 CPU 使用率超过某个阈值时,Hazelcast 应该定期打印 HealthMonitor
日志。
关于hazelcast - Hazelcast 分布式 map 获取操作的延迟极高,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53176843/