ignite - Apache Ignite.NET : some Ignite nodes fail to start up after update to v2. 9 - 检测到堆栈崩溃

标签 ignite

我正在 Linux 节点上的 Kubernetes 集群中运行 Apache Ignite .Net。

最近我将 ignite 2.8.1 集群更新到了 v2.9。更新后,集群中的某些服务无法启动,并显示以下消息:

*** 检测到堆栈崩溃 ***:终止

有趣的是,大多数情况下,同一微服务的第二个实例都会发生这种情况。第一个实例通常会成功启动(但有时第一个实例也会失败)。另一个观察结果是,它发生在发布服务网格服务的节点上。有时完整的集群回收(杀死所有节点然后再次启动它们)有助于让所有节点启动,有时则不然。

我在更新过程中搞砸了什么吗?我首先应该检查什么?

以下是 Ignite 日志的摘录。

2020-12-08 22:05:25,683 [1] DEBUG  [(null)] - Classpath resolved to: /app/libs/spring-jdbc-4.3.26.RELEASE.jar;/app/libs/spring-messaging-4.3.29.RELEASE.jar;/app/libs/ignite-indexing-2.9.0.jar;/app/libs/opencensus-impl-core-0.22.0.jar;/app/libs/jackson-annotations-2.10.1.jar;/app/libs/lucene-analyzers-common-7.4.0.jar;/app/libs/jackson-dataformat-smile-2.10.1.jar;/app/libs/commons-logging-1.1.1.jar;/app/libs/spring-context-4.3.26.RELEASE.jar;/app/libs/tyrus-standalone-client-1.15.jar;/app/libs/jackson-core-2.10.1.jar;/app/libs/spring-core-4.3.29.RELEASE.jar;/app/libs/control-center-agent-2.9.0.0.jar;/app/libs/commons-codec-1.11.jar;/app/libs/disruptor-3.4.2.jar;/app/libs/javassist-3.21.0-GA.jar;/app/libs/spring-tx-4.3.26.RELEASE.jar;/app/libs/spring-core-4.3.26.RELEASE.jar;/app/libs/commons-logging-1.2.jar;/app/libs/spring-beans-4.3.26.RELEASE.jar;/app/libs/h2-1.4.197.jar;/app/libs/ignite-core-2.9.0.jar;/app/libs/spring-aop-4.3.26.RELEASE.jar;/app/libs/reflections8-0.11.7.jar;/app/libs/cache-api-1.0.0.jar;/app/libs/spring-websocket-4.3.29.RELEASE.jar;/app/libs/lucene-core-7.4.0.jar;/app/libs/jackson-databind-2.10.1.jar;/app/libs/ignite-spring-2.9.0.jar;/app/libs/grpc-context-1.19.0.jar;/app/libs/lucene-queryparser-7.4.0.jar;/app/libs/spring-web-4.3.29.RELEASE.jar;/app/libs/ignite-shmem-1.0.0.jar;/app/libs/guava-26.0-android.jar;/app/libs/spring-expression-4.3.26.RELEASE.jar:/app/libs/spring-jdbc-4.3.26.RELEASE.jar:/app/libs/spring-messaging-4.3.29.RELEASE.jar:/app/libs/ignite-indexing-2.9.0.jar:/app/libs/opencensus-impl-core-0.22.0.jar:/app/libs/jackson-annotations-2.10.1.jar:/app/libs/lucene-analyzers-common-7.4.0.jar:/app/libs/jackson-dataformat-smile-2.10.1.jar:/app/libs/commons-logging-1.1.1.jar:/app/libs/spring-context-4.3.26.RELEASE.jar:/app/libs/tyrus-standalone-client-1.15.jar:/app/libs/jackson-core-2.10.1.jar:/app/libs/spring-core-4.3.29.RELEASE.jar:/app/libs/control-center-agent-2.9.0.0.jar:/app/libs/commons-codec-1.11.jar:/app/libs/disruptor-3.4.2.jar:/app/libs/javassist-3.21.0-GA.jar:/app/libs/spring-tx-4.3.26.RELEASE.jar:/app/libs/spring-core-4.3.26.RELEASE.jar:/app/libs/commons-logging-1.2.jar:/app/libs/spring-beans-4.3.26.RELEASE.jar:/app/libs/h2-1.4.197.jar:/app/libs/ignite-core-2.9.0.jar:/app/libs/spring-aop-4.3.26.RELEASE.jar:/app/libs/reflections8-0.11.7.jar:/app/libs/cache-api-1.0.0.jar:/app/libs/spring-websocket-4.3.29.RELEASE.jar:/app/libs/lucene-core-7.4.0.jar:/app/libs/jackson-databind-2.10.1.jar:/app/libs/ignite-spring-2.9.0.jar:/app/libs/grpc-context-1.19.0.jar:/app/libs/lucene-queryparser-7.4.0.jar:/app/libs/spring-web-4.3.29.RELEASE.jar:/app/libs/ignite-shmem-1.0.0.jar:/app/libs/guava-26.0-android.jar:/app/libs/spring-expression-4.3.26.RELEASE.jar:
2020-12-08 22:05:25,860 [1] DEBUG  [(null)] - JVM started.
[22:05:26,184][INFO][main][XmlBeanDefinitionReader] Loading XML bean definitions from URL [file:/app/./kubernetes.config
...
2020-12-08 22:05:37,936 [70] INFO  org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander [(null)] - Completed rebalance future: RebalanceFuture [state=STARTED, grp=CacheGroupContext [grp=ignite-sys-cache], topVer=AffinityTopologyVersion [topVer=82, minorTopVer=0], rebalanceId=1, routines=4, receivedBytes=1200, receivedKeys=0, partitionsLeft=0, startTime=1607465137846, endTime=-1, lastCancelledTime=-1, next=null]
2020-12-08 22:05:37,936 [70] DEBUG org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander [(null)] - Partitions have been scheduled to resend [reason=Rebalance is done, grp=ignite-sys-cache]
2020-12-08 22:05:37,937 [70] DEBUG org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander [(null)] - Finished rebalancing partition: [grp=ignite-sys-cache, topVer=AffinityTopologyVersion [topVer=82, minorTopVer=0], supplier=12ca76f0-3286-4779-a426-408d5d6cf226, p=61]
2020-12-08 22:05:37,937 [70] DEBUG org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander [(null)] - Will not request next demand message [grp=ignite-sys-cache, topVer=AffinityTopologyVersion [topVer=82, minorTopVer=0], supplier=12ca76f0-3286-4779-a426-408d5d6cf226, rebalanceFuture=RebalanceFuture [state=STARTED, grp=CacheGroupContext [grp=ignite-sys-cache], topVer=AffinityTopologyVersion [topVer=82, minorTopVer=0], rebalanceId=1, routines=4, receivedBytes=1200, receivedKeys=0, partitionsLeft=0, startTime=1607465137846, endTime=1607465137937, lastCancelledTime=-1, next=null]]
2020-12-08 22:05:37,943 [71] DEBUG org.apache.ignite.internal.processors.odbc.ClientListenerProcessor [(null)] - Grid runnable started: nio-acceptor-client-listener
2020-12-08 22:05:37,944 [72] DEBUG org.apache.ignite.internal.processors.odbc.ClientListenerProcessor [(null)] - Grid runnable started: grid-nio-worker-client-listener-0
2020-12-08 22:05:37,944 [1] DEBUG org.apache.ignite.internal.processors.service.IgniteServiceProcessor [(null)] - Started service processor.
2020-12-08 22:05:37,954 [73] DEBUG org.apache.ignite.internal.processors.service.ServiceDeploymentManager [(null)] - Grid runnable started: services-deployment-worker
2020-12-08 22:05:37,955 [73] DEBUG org.apache.ignite.internal.processors.service.ServiceDeploymentTask [(null)] - Started services deployment task init: [depId=ServiceDeploymentProcessId [topVer=AffinityTopologyVersion [topVer=81, minorTopVer=0], reqId=null], locId=c894369e-d55b-4d7b-8e5e-c990d0547121, evt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=c894369e-d55b-4d7b-8e5e-c990d0547121, consistentId=product-service-deployment-7c69d99ff6-vc6nb, addrs=ArrayList [10.0.2.27, 127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47500, product-service-deployment-7c69d99ff6-vc6nb/10.0.2.27:47500], discPort=47500, order=81, intOrder=44, lastExchangeTime=1607465137554, loc=true, ver=2.9.0#20201015-sha1:70742da8, isClient=false], topVer=81, msgTemplate=null, span=org.apache.ignite.internal.processors.tracing.NoopSpan@3f4cf36, nodeId8=c894369e, msg=null, type=NODE_JOINED, tstamp=1607465136027]]
2020-12-08 22:05:38,017 [73] DEBUG org.apache.ignite.internal.processors.resource.GridResourceProcessor [(null)] - Injecting resources [obj=org.apache.ignite.internal.processors.platform.cluster.PlatformClusterNodeFilterImpl@5d421915]
2020-12-08 22:05:38,038 [1] DEBUG org.apache.ignite.internal.processors.rest.GridRestProcessor [(null)] - REST processor started.
2020-12-08 22:05:38,056 [74] DEBUG org.apache.ignite.internal.processors.rest.GridRestProcessor [(null)] - Grid runnable started: session-timeout-worker
2020-12-08 22:05:38,098 [32] DEBUG org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor [(null)] - Timeout has occurred [obj=CancelableTask [id=d5e43644671-3ea29289-4345-4d80-8eab-97397473a5a9, endTime=1607465138070, period=10000, cancel=false, task=org.apache.ignite.internal.processors.query.h2.ConnectionManager$$Lambda$307/57085696@6197e588], process=true]
2020-12-08 22:05:38,110 [1] DEBUG org.apache.ignite.internal.processors.resource.GridResourceProcessor [(null)] - Injecting resources [obj=org.gridgain.control.agent.processor.lifecycle.ClusterLifecycleProcessor$$Lambda$586/893320639@55cff952]
2020-12-08 22:05:38,142 [75] DEBUG org.apache.ignite.internal.managers.communication.GridIoManager [(null)] - Message set has not been changed: GridCommunicationMessageSet [nodeId=3f89e86c-f636-4324-895b-1a77cec8ed11, endTime=1607465141249, timeoutId=8fe43644671-3ea29289-4345-4d80-8eab-97397473a5a9, topic=TOPIC_COMM_USER, plc=0, msgs=ConcurrentLinkedDeque [], reserved=false, timeout=5000, skipOnTimeout=true, lastTs=1607465136249]
2020-12-08 22:05:38,148 [1] WARN  org.gridgain.control.agent.ControlCenterAgent [(null)] - Current Ignite configuration does not support tracing functionality and Control Center agent will not collect traces (consider adding ignite-opencensus module to classpath).
2020-12-08 22:05:38,152 [1] DEBUG org.apache.ignite.internal.processors.resource.GridResourceProcessor [(null)] - Injecting resources [obj=org.gridgain.control.agent.ControlCenterAgent$$Lambda$591/1985869725@151335cb]
2020-12-08 22:05:38,175 [76] DEBUG org.apache.ignite.internal.managers.communication.GridIoManager [(null)] - Message set has not been changed: GridCommunicationMessageSet [nodeId=3f89e86c-f636-4324-895b-1a77cec8ed11, endTime=1607465141249, timeoutId=8fe43644671-3ea29289-4345-4d80-8eab-97397473a5a9, topic=TOPIC_COMM_USER, plc=0, msgs=ConcurrentLinkedDeque [], reserved=false, timeout=5000, skipOnTimeout=true, lastTs=1607465136249]
2020-12-08 22:05:38,476 [73] DEBUG org.apache.ignite.internal.processors.service.ServiceDeploymentTask [(null)] - Calculated service assignment : [srvcId=56296344671-81118589-d216-4762-a835-3df2230389c5, srvcTop={c894369e-d55b-4d7b-8e5e-c990d0547121=1, 3f89e86c-f636-4324-895b-1a77cec8ed11=1}]
2020-12-08 22:05:38,484 [73] DEBUG org.apache.ignite.internal.processors.resource.GridResourceProcessor [(null)] - Injecting resources [obj=org.apache.ignite.internal.processors.platform.dotnet.PlatformDotNetServiceImpl@20119802]
*** stack smashing detected ***: <unknown> terminated

谢谢!

最佳答案

检测到堆栈粉碎通常表示 C# 代码中的 NullReferenceException

在运行应用之前将 COMPlus_EnableAlternateStackCheck 环境变量设置为 1 以查看完整的堆栈跟踪(这适用于 .NET Core 3.0 及更高版本)。

https://ignite.apache.org/docs/latest/net-specific/net-troubleshooting#stack-smashing-detected-dotnet-terminated

关于ignite - Apache Ignite.NET : some Ignite nodes fail to start up after update to v2. 9 - 检测到堆栈崩溃,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65235845/

相关文章:

java - Apache ignite 缓存事件的单个缓存(映射)事件注册

java - Apache 点燃: Node has not been connected to topology

ignite - Apache Ignite - (jvm-pause-detector-worker) JVM 暂停时间可能过长 :

java - 将 Apache Ignite BinaryObject 与 SQL 表混合

java - 在本地机器上运行 ignite

java - 意外的异常 - apache ignite - Web session 集群

java - Ignite C++ 和缓存亲和性

java - Apache Ignite 中的二进制编码器

java - Apache Ignite 查询/索引

ignite - Apache Ignite 近缓存可以在堆外吗?