java - 成员重新连接后 JGroups GMS 加入失败

标签 java jgroups

我正在使用 JGroups 4.0.15.Final 来实现集群应用程序。当我启动两个节点时一切正常,它们连接并开始工作。但是当我杀死/重新启动一个节点时,它们将不再连接在一起。

注意:它们是我应用程序的一个专长,因为我正在为 TCP 协议(protocol)使用自定义套接字工厂。该工厂创建具有一些特殊要求的加密套接字。但是连接似乎很好并且套接字可以工作。我唯一注意到的是,在 NODE_2 被验证为已死后,它们仍在通过 TCP 协议(protocol)尝试在其套接字上进行连接。

我已经尝试了几个小时来寻找问题,但就是找不到。有时,节点会在多次连接尝试后再次连接,但这可能需要几分钟到一个多小时。我认为这与轮回问题有关,但我不确定。

我目前正在分析 TCP 协议(protocol)的 TRACE 日志,但仍然无法指出两个节点需要几分钟才能再次合并的原因。

当我重新启动 NODE_2 时,两者(NODE_1 = 1.1.1.1NODE_2 = 1.1.1.2)都已连接并且一切正常美好的。在两边我都看到建立了连接:

NODE_1:

2019-01-17 09:51:55.089 [TQ-Bundl |                  ] TRACE TCP                      - 1.1.1.1:7800: failed connecting to 1.1.1.2:7800: java.net.SocketException: Failed to connect to '/1.1.1.2:7800' with Socket #1041583140. ConnectException: Connection refused (Connection refused)
2019-01-17 09:51:55.089 [TQ-Bundl |                  ] TRACE TCP                      - 1.1.1.1:7800: removed connection to 1.1.1.2:7800 | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.089 [TQ-Bundl |                  ] TRACE TCP                      - JGRP000029: NODE_1 (flags=1): failed sending message to 1.1.1.2:7800 (117 bytes): java.net.SocketException: Failed to connect to '/1.1.1.2:7800' with Socket #1041583140. ConnectException: Connection refused (Connection refused), headers: TCPPING: [GET_MBRS_REQ cluster=CLUSTER initial_discovery=false], TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
----------------------------
NODE_2 is comes back online
----------------------------
2019-01-17 09:51:55.791 [TcpServe |                  ] TRACE TCP                      - 1.1.1.1:7800: accepted connection from 1.1.1.2:7800 | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.798 [jgrp-29, |                  ] TRACE TCP                      - NODE_1 (flags=1): received [dst: 1.1.1.1:7800, src: NODE_2 (2 headers), size=50 bytes, flags=OOB|DONT_BUNDLE|INTERNAL], headers are TCPPING: [GET_MBRS_REQ cluster=CLUSTER initial_discovery=true], TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.798 [jgrp-29, |                  ] TRACE TCP                      - NODE_1 (flags=1): sending msg to 1.1.1.2:7800, src=NODE_1 (flags=1), headers are MERGE3: INFO: view_id=[NODE_1 (flags=1)|2], logical_name=NODE_1, physical_addr=1.1.1.1:7800, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.798 [jgrp-29, |                  ] TRACE TCPPING                  - NODE_1 (flags=1): received GET_MBRS_REQ from NODE_2, sending response NODE_1 (flags=1), name=NODE_1, addr=1.1.1.1:7800, coord | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.799 [jgrp-29, |                  ] TRACE TCP                      - NODE_1 (flags=1): sending msg to NODE_2, src=NODE_1 (flags=1), headers are TCPPING: [GET_MBRS_RSP cluster=null initial_discovery=false], TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.852 [jgrp-29, |                  ] TRACE TCP                      - NODE_1 (flags=1): received [dst: NODE_1 (flags=1), src: NODE_2 (3 headers), size=0 bytes, flags=OOB|INTERNAL], headers are GMS: GmsHeader[JOIN_REQ]: mbr=NODE_2, UNICAST3: DATA, seqno=1, first, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.853 [jgrp-29, |                  ] TRACE UNICAST3                 - NODE_1 (flags=1) <-- DATA(NODE_2: #1, conn_id=0, first) | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:56.207 [jgrp-29, |                  ] TRACE UNICAST3                 - NODE_1 (flags=1) --> ACK(NODE_2: #6) | (Log4J2LogImpl.java:56)  

NODE_2:

2019-01-17 09:51:48.998 [cc-clust |                  ] DEBUG Configurator             - set property TCP.bind_addr to default value /fe80:0:0:0:250:56ff:feb4:3785%eth2 | (Log4J2LogImpl.java:66) 
2019-01-17 09:51:49.000 [cc-clust |                  ] DEBUG Configurator             - set property TCP.diagnostics_addr to default value /ff0e:0:0:0:0:0:75:75 | (Log4J2LogImpl.java:66) 
2019-01-17 09:51:49.004 [cc-clust |                  ] DEBUG TCP                      - thread pool min/max/keep-alive: 0/20/3000 use_fork_join=false, internal pool: 0/4/30000 (2 cores available) | (Log4J2LogImpl.java:71) 
2019-01-17 09:51:49.011 [cc-clust |                  ] TRACE NAKACK2                  - null: set max_xmit_req_size from 0 to 511600 | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:49.012 [cc-clust |                  ] TRACE UNICAST3                 - null: set max_xmit_req_size from 0 to 511600 | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.535 [cc-clust |                  ] TRACE STABLE                   - NODE_2: stable task started | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.553 [cc-clust |                  ] TRACE TCP                      - joined /ff0e:0:0:0:0:0:75:75:7500 on eth2 | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.554 [cc-clust |                  ] TRACE TCP                      - joined /ff0e:0:0:0:0:0:75:75:7500 on eth1 | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.555 [cc-clust |                  ] TRACE TCP                      - joined /ff0e:0:0:0:0:0:75:75:7500 on eth0 | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.555 [cc-clust |                  ] TRACE TCP                      - joined /ff0e:0:0:0:0:0:75:75:7500 on lo | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.581 [cc-clust |                  ] TRACE TCPPING                  - NODE_2: sending discovery request to 1.1.1.1:7800 | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.582 [cc-clust |                  ] TRACE TCP                      - NODE_2: sending msg to 1.1.1.1:7800, src=NODE_2, headers are TCPPING: [GET_MBRS_REQ cluster=CLUSTER initial_discovery=true], TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.591 [TQ-Bundl |                  ] TRACE TCP                      - 1.1.1.2:7800: connecting to 1.1.1.1:7800 | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.823 [jgrp-1,s |                  ] TRACE TCP                      - NODE_2: received [dst: NODE_2, src: 0c8f19dc-b0da-7fbd-3dae-993890717623 (flags=1) (2 headers), size=52 bytes, flags=OOB|DONT_BUNDLE|INTERNAL], headers are TCPPING: [GET_MBRS_RSP cluster=null initial_discovery=false], TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.826 [jgrp-2,s |                  ] TRACE TCP                      - NODE_2: received [dst: 1.1.1.2:7800, src: 0c8f19dc-b0da-7fbd-3dae-993890717623 (flags=1) (2 headers), size=0 bytes, flags=INTERNAL], headers are MERGE3: INFO: view_id=[0c8f19dc-b0da-7fbd-3dae-993890717623 (flags=1)|2], logical_name=NODE_1, physical_addr=1.1.1.1:7800, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.830 [jgrp-1,s |                  ] TRACE TCPPING                  - NODE_2: received GET_MBRS_RSP from NODE_1 (flags=1): NODE_1 (flags=1), name=NODE_1, addr=1.1.1.1:7800, coord | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.832 [cc-clust |                  ] TRACE GMS                      - NODE_2: discovery took 262 ms, members: 1 rsps (1 coords) [done] | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.835 [cc-clust |                  ] DEBUG GMS                      - NODE_2: sending JOIN(NODE_2) to NODE_1 (flags=1) | (Log4J2LogImpl.java:71) 
2019-01-17 09:51:55.844 [cc-clust |                  ] TRACE UNICAST3                 - NODE_2: created sender window for NODE_1 (flags=1) (conn-id=0) | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:55.848 [cc-clust |                  ] TRACE UNICAST3                 - NODE_2 --> DATA(NODE_1 (flags=1): #1, conn_id=0, first) | (Log4J2LogImpl.java:46) 
2019-01-17 09:51:55.848 [cc-clust |                  ] TRACE TCP                      - NODE_2: sending msg to NODE_1 (flags=1), src=NODE_2, headers are GMS: GmsHeader[JOIN_REQ]: mbr=NODE_2, UNICAST3: DATA, seqno=1, first, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:56.210 [jgrp-1,s |                  ] TRACE TCP                      - NODE_2: received [dst: NODE_2, src: NODE_1 (flags=1) (2 headers), size=0 bytes, flags=INTERNAL], headers are UNICAST3: ACK, seqno=6, ts=6, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 09:51:56.211 [jgrp-1,s |                  ] TRACE UNICAST3                 - NODE_2 <-- ACK(NODE_1 (flags=1): #6, conn-id=0, ts=6) | (Log4J2LogImpl.java:56) 

在此之后,两者都处于循环中以处理某些事情。当 NODE_2 尝试加入(将 JOIN(NODE_2) 发送到 NODE_1)时,NODE_1 似乎没有注意到它,而且我没有看到 NODE_1 的日志中涉及 GMS 协议(protocol)。我还看到很多消息一遍又一遍地询问第一个序列号 (*UNICAST3 NODE_2 --> SEND_FIRST_SEQNO(NODE_1 (flags=1)) | (Log4J2LogImpl.java:56) *)。这会持续一段时间(时间从几秒到有时几小时不等),直到加入/合并突然起作用:

NODE_1:

2019-01-17 10:11:36.043 [jgrp-125 |                  ] TRACE TCP                      - NODE_1 (flags=1): sending msg to NODE_2, src=NODE_1 (flags=1), headers are GMS: GmsHeader[MERGE_RSP]merge_id=NODE_2::21, UNICAST3: DATA, seqno=19750, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:36.544 [jgrp-125 |                  ] TRACE UNICAST3                 - NODE_1 (flags=1) --> XMIT(NODE_2: #19750) | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:36.544 [jgrp-125 |                  ] TRACE TCP                      - NODE_1 (flags=1): sending msg to NODE_2, src=NODE_1 (flags=1), headers are GMS: GmsHeader[MERGE_RSP]merge_id=NODE_2::21, UNICAST3: DATA, seqno=19750, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:36.547 [jgrp-125 |                  ] TRACE TCP                      - NODE_1 (flags=1): received [dst: NODE_1 (flags=1), src: NODE_2 (2 headers), size=0 bytes, flags=OOB], headers are UNICAST3: SEND_FIRST_SEQNO, seqno=0, ts=581, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:36.547 [jgrp-125 |                  ] TRACE UNICAST3                 - NODE_1 (flags=1) <-- SEND_FIRST_SEQNO(NODE_2) | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:36.547 [jgrp-125 |                  ] TRACE TCP                      - NODE_1 (flags=1): sending msg to NODE_2, src=NODE_1 (flags=1), headers are UNICAST3: DATA, seqno=19725, first, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:36.716 [jgrp-125 |                  ] TRACE TCP                      - NODE_1 (flags=1): received [dst: NODE_1 (flags=1), src: NODE_2 (2 headers), size=0 bytes, flags=INTERNAL], headers are UNICAST3: ACK, seqno=19725, ts=582, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:36.716 [jgrp-125 |                  ] TRACE UNICAST3                 - NODE_1 (flags=1) <-- ACK(NODE_2: #19725, conn-id=0, ts=582) | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.296 [jgrp-125 |                  ] TRACE TCP                      - NODE_1 (flags=1): received [dst: NODE_1 (flags=1), src: NODE_2 (2 headers), size=0 bytes, flags=INTERNAL], headers are MERGE3: VIEW_REQ: , logical_name=null, physical_addr=null, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.296 [jgrp-125 |                  ] TRACE TCP                      - NODE_1 (flags=1): sending msg to NODE_2, src=NODE_1 (flags=1), headers are MERGE3: VIEW_RSP: , logical_name=null, physical_addr=null, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.300 [jgrp-125 |                  ] TRACE TCP                      - NODE_1 (flags=1): received [dst: NODE_1 (flags=1), src: NODE_2 (3 headers), size=23 bytes, flags=OOB|INTERNAL], headers are GMS: GmsHeader[MERGE_REQ]: merge_id=NODE_2::22, UNICAST3: DATA, seqno=32, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.300 [jgrp-125 |                  ] TRACE UNICAST3                 - NODE_1 (flags=1) <-- DATA(NODE_2: #32, conn_id=0) | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.301 [jgrp-125 |                  ] TRACE UNICAST3                 - NODE_1 (flags=1): delivering NODE_2#32 | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.301 [jgrp-125 |                  ] DEBUG STABLE                   - suspending message garbage collection | (Log4J2LogImpl.java:66) 
2019-01-17 10:11:37.302 [jgrp-125 |                  ] DEBUG STABLE                   - NODE_1 (flags=1): resume task started, max_suspend_time=220000 | (Log4J2LogImpl.java:71) 
2019-01-17 10:11:37.302 [jgrp-125 |                  ] TRACE GMS                      - NODE_1 (flags=1): got merge request from NODE_2, merge_id=NODE_2::22, mbrs=[NODE_1 (flags=1)] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.303 [jgrp-125 |                  ] TRACE UNICAST3                 - NODE_1 (flags=1) --> DATA(NODE_2: #19751, conn_id=0) | (Log4J2LogImpl.java:46) 
2019-01-17 10:11:37.303 [jgrp-125 |                  ] TRACE TCP                      - NODE_1 (flags=1): sending msg to NODE_2, src=NODE_1 (flags=1), headers are GMS: GmsHeader[MERGE_RSP]merge_id=NODE_2::22, UNICAST3: DATA, seqno=19751, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.307 [jgrp-127 |                  ] TRACE TCP                      - NODE_1 (flags=1): received message batch of 1 messages from NODE_2 | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.310 [jgrp-127 |                  ] TRACE UNICAST3                 - NODE_1 (flags=1) <-- DATA(NODE_2: #33 - #33) | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.310 [jgrp-127 |                  ] TRACE UNICAST3                 - NODE_1 (flags=1): delivering #33 - #33 (1 messages) | (Log4J2LogImpl.java:46) 
2019-01-17 10:11:37.313 [jgrp-127 |                  ] TRACE GMS                      - NODE_1 (flags=1): mcasting view MergeView::[NODE_1 (flags=1)|25] (2) [NODE_1 (flags=1), NODE_2], 2 subgroups: [NODE_1 (flags=1)|3] (2) [NODE_1 (flags=1), NODE_2], [NODE_2|24] (1) [NODE_2] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.314 [jgrp-127 |                  ] DEBUG NAKACK2                  - 
[NODE_1 (flags=1) mergeDigest()]
existing digest:  NODE_1 (flags=1): [3 (3)], NODE_2: [0 (0)]
new digest:       NODE_1 (flags=1): [3 (3)], NODE_2: [21 (21)]
resulting digest: NODE_1 (flags=1): [3 (3)], NODE_2: [21 (21)] | (Log4J2LogImpl.java:66)
2019-01-17 10:11:37.314 [jgrp-127 |                  ] DEBUG GMS                      - NODE_1 (flags=1): installing view MergeView::[NODE_1 (flags=1)|25] (2) [NODE_1 (flags=1), NODE_2], 2 subgroups: [NODE_1 (flags=1)|3] (2) [NODE_1 (flags=1), NODE_2], [NODE_2|24] (1) [NODE_2] | (Log4J2LogImpl.java:71) 

NODE_2:

2019-01-17 10:11:36.547 [jgrp-174 |                  ] TRACE UNICAST3                 - NODE_2 --> SEND_FIRST_SEQNO(NODE_1 (flags=1)) | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:36.547 [jgrp-174 |                  ] TRACE TCP                      - NODE_2: sending msg to NODE_1 (flags=1), src=NODE_2, headers are UNICAST3: SEND_FIRST_SEQNO, seqno=0, ts=581, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:36.550 [jgrp-174 |                  ] TRACE TCP                      - NODE_2: received message batch of 1 messages from NODE_1 (flags=1) | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:36.550 [jgrp-174 |                  ] TRACE UNICAST3                 - NODE_2: created receiver window for NODE_1 (flags=1) at seqno=#19725 for conn-id=0 | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:36.553 [jgrp-174 |                  ] TRACE UNICAST3                 - NODE_2 <-- DATA(NODE_1 (flags=1): #19725 - #19725) | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:36.553 [jgrp-174 |                  ] TRACE UNICAST3                 - NODE_2: delivering #19725 - #19725 (1 messages) | (Log4J2LogImpl.java:46) 
2019-01-17 10:11:36.713 [jgrp-174 |                  ] TRACE UNICAST3                 - NODE_2 --> ACK(NODE_1 (flags=1): #19725) | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:36.713 [jgrp-174 |                  ] TRACE TCP                      - NODE_2: sending msg to NODE_1 (flags=1), src=NODE_2, headers are UNICAST3: ACK, seqno=19725, ts=582, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.296 [jgrp-174 |                  ] TRACE MERGE3                   - NODE_2: merge participants are [NODE_2, NODE_1 (flags=1)] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.296 [jgrp-174 |                  ] TRACE TCP                      - NODE_2: sending msg to NODE_1 (flags=1), src=NODE_2, headers are MERGE3: VIEW_REQ: , logical_name=null, physical_addr=null, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.299 [jgrp-177 |                  ] TRACE TCP                      - NODE_2: received [dst: NODE_2, src: NODE_1 (flags=1) (2 headers), size=67 bytes, flags=INTERNAL], headers are MERGE3: VIEW_RSP: , logical_name=null, physical_addr=null, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.299 [jgrp-174 |                  ] DEBUG GMS                      - NODE_2: I will be the merge leader. Starting the merge task. Views: {NODE_1 (flags=1)=[NODE_1 (flags=1)|3] (2) [NODE_1 (flags=1), NODE_2], NODE_2=MergeView::[NODE_2|24] (1) [NODE_2], 1 subgroups: [NODE_2|23] (1) [NODE_2]} | (Log4J2LogImpl.java:71) 
2019-01-17 10:11:37.300 [MergeTas |                  ] DEBUG STABLE                   - suspending message garbage collection | (Log4J2LogImpl.java:66) 
2019-01-17 10:11:37.300 [MergeTas |                  ] DEBUG STABLE                   - NODE_2: resume task started, max_suspend_time=220000 | (Log4J2LogImpl.java:71) 
2019-01-17 10:11:37.300 [MergeTas |                  ] DEBUG GMS                      - NODE_2: merge task NODE_2::22 started with 2 participants | (Log4J2LogImpl.java:71) 
2019-01-17 10:11:37.300 [MergeTas |                  ] TRACE GMS                      - NODE_2: sending MERGE_REQ to [NODE_1 (flags=1), NODE_2] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.300 [MergeTas |                  ] TRACE UNICAST3                 - NODE_2 --> DATA(NODE_1 (flags=1): #32, conn_id=0) | (Log4J2LogImpl.java:46) 
2019-01-17 10:11:37.300 [MergeTas |                  ] TRACE TCP                      - NODE_2: sending msg to NODE_1 (flags=1), src=NODE_2, headers are GMS: GmsHeader[MERGE_REQ]: merge_id=NODE_2::22, UNICAST3: DATA, seqno=32, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.301 [MergeTas |                  ] TRACE UNICAST3                 - NODE_2 --> DATA(NODE_2: #64, conn_id=1) | (Log4J2LogImpl.java:46) 
2019-01-17 10:11:37.301 [MergeTas |                  ] TRACE TCP                      - NODE_2: sending msg to NODE_2, src=NODE_2, headers are GMS: GmsHeader[MERGE_REQ]: merge_id=NODE_2::22, UNICAST3: DATA, seqno=64, conn_id=1, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.301 [MergeTas |                  ] TRACE TCP                      - NODE_2: looping back message [dst: NODE_2, src: NODE_2 (3 headers), size=22 bytes, flags=OOB|INTERNAL], headers are GMS: GmsHeader[MERGE_REQ]: merge_id=NODE_2::22, UNICAST3: DATA, seqno=64, conn_id=1, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.301 [jgrp-174 |                  ] TRACE TCP                      - NODE_2: received [dst: NODE_2, src: NODE_2 (3 headers), size=22 bytes, flags=OOB|INTERNAL], headers are GMS: GmsHeader[MERGE_REQ]: merge_id=NODE_2::22, UNICAST3: DATA, seqno=64, conn_id=1, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.302 [jgrp-174 |                  ] TRACE UNICAST3                 - NODE_2 <-- DATA(NODE_2: #64, conn_id=1) | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.302 [jgrp-174 |                  ] TRACE UNICAST3                 - NODE_2: delivering NODE_2#64 | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.302 [jgrp-174 |                  ] TRACE GMS                      - NODE_2: got merge request from NODE_2, merge_id=NODE_2::22, mbrs=[NODE_2] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.303 [jgrp-174 |                  ] TRACE UNICAST3                 - NODE_2 --> DATA(NODE_2: #65, conn_id=1) | (Log4J2LogImpl.java:46) 
2019-01-17 10:11:37.303 [jgrp-174 |                  ] TRACE TCP                      - NODE_2: sending msg to NODE_2, src=NODE_2, headers are GMS: GmsHeader[MERGE_RSP]merge_id=NODE_2::22, UNICAST3: DATA, seqno=65, conn_id=1, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.303 [jgrp-174 |                  ] TRACE TCP                      - NODE_2: looping back message [dst: NODE_2, src: NODE_2 (3 headers), size=51 bytes, flags=OOB|INTERNAL], headers are GMS: GmsHeader[MERGE_RSP]merge_id=NODE_2::22, UNICAST3: DATA, seqno=65, conn_id=1, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.304 [jgrp-179 |                  ] TRACE UNICAST3                 - NODE_2: delivering #65 - #65 (1 messages) | (Log4J2LogImpl.java:46) 
2019-01-17 10:11:37.304 [jgrp-179 |                  ] TRACE GMS                      - NODE_2: got merge response from NODE_2, merge_id=NODE_2::22, merge data is sender=NODE_2, view=[NODE_2|24] (1) [NODE_2], digest=NODE_2: [21 (21)] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.304 [jgrp-177 |                  ] TRACE TCP                      - NODE_2: received [dst: NODE_2, src: NODE_2 (3 headers), size=51 bytes, flags=OOB|INTERNAL, transient_flags=OOB_DELIVERED], headers are GMS: GmsHeader[MERGE_RSP]merge_id=NODE_2::22, UNICAST3: DATA, seqno=65, conn_id=1, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.304 [jgrp-177 |                  ] TRACE UNICAST3                 - NODE_2 <-- DATA(NODE_2: #65, conn_id=1) | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.305 [jgrp-177 |                  ] TRACE TCP                      - NODE_2: received [dst: NODE_2, src: NODE_1 (flags=1) (3 headers), size=53 bytes, flags=OOB|INTERNAL], headers are GMS: GmsHeader[MERGE_RSP]merge_id=NODE_2::22, UNICAST3: DATA, seqno=19751, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.305 [jgrp-177 |                  ] TRACE UNICAST3                 - NODE_2 <-- DATA(NODE_1 (flags=1): #19751, conn_id=0) | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.305 [jgrp-177 |                  ] TRACE UNICAST3                 - NODE_2: delivering NODE_1 (flags=1)#19751 | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.306 [jgrp-177 |                  ] TRACE GMS                      - NODE_2: got merge response from NODE_1 (flags=1), merge_id=NODE_2::22, merge data is sender=NODE_1 (flags=1), view=[NODE_1 (flags=1)|3] (1) [NODE_1 (flags=1)], digest=NODE_1 (flags=1): [3 (3)] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.306 [MergeTas |                  ] TRACE GMS                      - NODE_2: collected 2 merge response(s) in 6 ms | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.307 [MergeTas |                  ] TRACE GMS                      - NODE_2: consolidated view=MergeView::[NODE_1 (flags=1)|25] (2) [NODE_1 (flags=1), NODE_2], 2 subgroups: [NODE_1 (flags=1)|3] (2) [NODE_1 (flags=1), NODE_2], MergeView::[NODE_2|24] (1) [NODE_2], 1 subgroups: [NODE_2|23] (1) [NODE_2]
consolidated digest=NODE_1 (flags=1): [3 (3)], NODE_2: [21 (21)] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.307 [MergeTas |                  ] DEBUG GMS                      - NODE_2: installing merge view [NODE_1 (flags=1)|25] (2 members) in 2 coords | (Log4J2LogImpl.java:71) 
2019-01-17 10:11:37.307 [MergeTas |                  ] TRACE UNICAST3                 - NODE_2 --> DATA(NODE_1 (flags=1): #33, conn_id=0) | (Log4J2LogImpl.java:46) 
2019-01-17 10:11:37.307 [MergeTas |                  ] TRACE TCP                      - NODE_2: sending msg to NODE_1 (flags=1), src=NODE_2, headers are GMS: GmsHeader[INSTALL_MERGE_VIEW], UNICAST3: DATA, seqno=33, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.307 [MergeTas |                  ] TRACE UNICAST3                 - NODE_2 --> DATA(NODE_2: #66, conn_id=1) | (Log4J2LogImpl.java:46) 
2019-01-17 10:11:37.307 [MergeTas |                  ] TRACE TCP                      - NODE_2: sending msg to NODE_2, src=NODE_2, headers are GMS: GmsHeader[INSTALL_MERGE_VIEW], UNICAST3: DATA, seqno=66, conn_id=1, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.308 [MergeTas |                  ] TRACE TCP                      - NODE_2: looping back message [dst: NODE_2, src: NODE_2 (3 headers), size=108 bytes], headers are GMS: GmsHeader[INSTALL_MERGE_VIEW], UNICAST3: DATA, seqno=66, conn_id=1, TP: [cluster_name=CLUSTER] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.308 [MergeTas |                  ] DEBUG GMS                      - NODE_2: merge NODE_2::22 took 9 ms | (Log4J2LogImpl.java:71) 
2019-01-17 10:11:37.309 [jgrp-179 |                  ] TRACE TCP                      - NODE_2: received message batch of 1 messages from NODE_2 | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.309 [jgrp-179 |                  ] TRACE UNICAST3                 - NODE_2 <-- DATA(NODE_2: #66 - #66) | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.309 [jgrp-179 |                  ] TRACE UNICAST3                 - NODE_2: delivering #66 - #66 (1 messages) | (Log4J2LogImpl.java:46) 
2019-01-17 10:11:37.310 [jgrp-179 |                  ] TRACE GMS                      - NODE_2: mcasting view MergeView::[NODE_1 (flags=1)|25] (2) [NODE_1 (flags=1), NODE_2], 2 subgroups: [NODE_1 (flags=1)|3] (2) [NODE_1 (flags=1), NODE_2], [NODE_2|24] (1) [NODE_2] | (Log4J2LogImpl.java:56) 
2019-01-17 10:11:37.310 [jgrp-179 |                  ] DEBUG NAKACK2                  - 
[NODE_2 mergeDigest()]
existing digest:  NODE_2: [21 (21)]
new digest:       NODE_1 (flags=1): [3 (3)], NODE_2: [21 (21)]
resulting digest: NODE_1 (flags=1): [3 (3)], NODE_2: [21 (21)] | (Log4J2LogImpl.java:66) 
2019-01-17 10:11:37.310 [jgrp-179 |                  ] DEBUG GMS                      - NODE_2: installing view MergeView::[NODE_1 (flags=1)|25] (2) [NODE_1 (flags=1), NODE_2], 2 subgroups: [NODE_1 (flags=1)|3] (2) [NODE_1 (flags=1), NODE_2], [NODE_2|24] (1) [NODE_2] | (Log4J2LogImpl.java:71) 

此后一切正常,直到一个节点重新启动或其他。我在这里真的一无所知,需要一些帮助。有谁知道为什么 NODE_1 没有更早开始加入/合并?感谢您的帮助。

这是我当前的配置:

public class JGroupsConfiguration {

    final Protocol[] PROTOCOL_STACK = {
            new TCP()
                    .setValue("bind_port", EngineConfiguration.get().clusterSrcPort().get())
                    .setValue("max_bundle_size", 64000)
                    .setValue("sock_conn_timeout", 300)
                    .setValue("thread_pool_min_threads", 0)
                    .setValue("thread_pool_max_threads", 20)
                    .setValue("thread_pool_keep_alive_time", 3000),
            new TCPPING()
                    .setValue("async_discovery", false)
                    .setValue("port_range", 0)
                    .setValue("send_cache_on_join", true)
                    .setValue("return_entire_cache", true),
            new MERGE3()
                    .setValue("min_interval", 10000)
                    .setValue("max_interval", 30000),
            new FD()
                    .setValue("timeout", 3000)
                    .setValue("max_tries", 3),
            new VERIFY_SUSPECT()
                    .setValue("timeout", 1500),
            new BARRIER(),
            new NAKACK2()
                    .setValue("use_mcast_xmit", false)
                    .setValue("discard_delivered_msgs", true),
            new UNICAST3()
                    // Disables retransmitting of queued messages to reconnected nodes.
                    .setValue("max_retransmit_time", 0),
            new STABLE()
                    .setValue("desired_avg_gossip", 50000)
                    .setValue("max_bytes", 4000000),
            new GMS()
                    .setValue("print_local_addr", true)
                    .setValue("join_timeout", 2000),
            new FRAG3()
                    .setValue("frag_size", 60000),
            new STATE()
    };
}

最佳答案

原来是UNICAST3协议(protocol)的问题。该协议(protocol)尝试将每条消息重新发送给重新连接的集群成员,然后才真正允许它们作为成员合并到集群中。这就是节点之间所有 SEQNO 消息的原因。在我们从堆栈中删除协议(protocol)后,重新连接的成员的合并工作正常。 注意:我们的应用程序不需要 UNICAST3 的特性,因为我们不依赖可靠的消息传递!

关于java - 成员重新连接后 JGroups GMS 加入失败,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54478034/

相关文章:

java - Hibernate Search、Infinispan、jgroups、Wildfly 集群集成配置

java - EhCache + JGroups 给出 "Exception on flushing of replication queue: null"

java - 删除后 Netty SslHandler 握手超时

java - 如何通过http长连接从服务端定时向客户端发送请求

java - 最佳选择 - Spring XML 解析以读取某些元素

java - 如何在 Java 中检测类 Unix 操作系统?

java - 以错误的编码读取属性文件中的属性

linux - Jgroups 使用哪些端口?

JGroups 吞噬内存