我正在尝试使用 akka (.net) 实现简单的集群用例。
- 集群 - 用于节点上/下事件。
- Remote - 用于向特定节点发送消息。
有两个参与者:监听集群事件的主节点和连接到集群的从节点。
Address address = new Address("akka.tcp", "ClusterSystem", "master", 8080);
cluster.Join(address);
当 ClusterEvent.MemberUp 消息被重新接收到 Master Node 创建 actor 链接时:
ClusterEvent.MemberUp up = message as ClusterEvent.MemberUp;
ActorSelection nodeActor = system.ActorSelection(up.Member.Address + "/user/slave_0");
向这个 actor 发送消息会导致错误:
与远程系统 akka.tcp://ClusterSystem@slave:8090 的关联失败;地址现在被门控 5000 毫秒。原因是:[分离]
主配置:
akka {
actor {
provider = ""Akka.Cluster.ClusterActorRefProvider, Akka.Cluster""
}
remote {
helios.tcp {
port = 8080
hostname = master
bind-hostname = master
bind-port = 8080
send-buffer-size = 512000b
receive-buffer-size = 512000b
maximum-frame-size = 1024000b
tcp-keepalive = on
}
}
cluster{
failure-detector {
heartbeat - interval = 10 s
}
auto-down-unreachable-after = 10s
gossip-interval = 5s
}
stdout-loglevel = DEBUG
loglevel = DEBUG
debug {{
receive = on
autoreceive = on
lifecycle = on
event-stream = on
unhandled = on
}}
}
从机配置:
akka {
actor {
provider = ""Akka.Cluster.ClusterActorRefProvider, Akka.Cluster""
}
remote {
helios.tcp {
port = 8090
hostname = slave
bind-hostname = slave
bind-port = 8090
send-buffer-size = 512000b
receive-buffer-size = 512000b
maximum-frame-size = 1024000b
tcp-keepalive = on
}
}
cluster{
failure-detector {
heartbeat - interval = 10 s
}
auto-down-unreachable-after = 10s
gossip-interval = 5s
}
stdout-loglevel = DEBUG
loglevel = DEBUG
debug {{
receive = on
autoreceive = on
lifecycle = on
event-stream = on
unhandled = on
}}
}
最佳答案
这是你的问题:
cluster{
failure-detector {
heartbeat - interval = 10 s
}
auto-down-unreachable-after = 10s
gossip-interval = 5s
}
heartbeat-interval 和 auto-down-unreachable-after 的持续时间相同 - 因此您的节点几乎总是会在 10 秒后自动断开关联,因为您押注故障检测器可能会失败的竞争条件。
auto-down-unreachable-after 是一个危险的设置 - 不要使用它。你最终会出现脑裂或更糟的情况。
并确保您的故障检测器间隔始终低于您的自动停机间隔。
关于具有远程节点 : Disassociated exception 的 Akka (.net) 集群,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32351736/