java - Flink TaskManager 未重新连接到新的 Jobmanager

标签 java apache-zookeeper apache-flink flink-streaming

我已经将 Flink 配置为 HA 模式,如上所述 here :

我想测试容错能力,因此我做了以下操作:

  1. 设置具有 2 个 JobManager 和 1 个 TaskManager 的 Flink 集群
  2. 在任务管理器上启动流式作业
  3. 终止 Activity 的作业管理器(以模拟崩溃)
  4. 领导人选举正在按预期进行。
  5. 但注意到任务管理器正在重新连接到新的作业管理器。它只是尝试每 10 秒重新连接到前一个领导者。

将任务管理器日志粘贴到此处:

2018-07-25 19:46:08,508 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerConfiguration  - Messages have a max timeout of 10000 ms
2018-07-25 19:46:08,515 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Starting RPC endpoint for org.apache.flink.runtime.taskexecutor.TaskExecutor at akka://flink/user/taskmanager_0 .
2018-07-25 19:46:08,524 INFO  org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService  - Starting ZooKeeperLeaderRetrievalService /leader/resource_manager_lock.
2018-07-25 19:46:08,525 INFO  org.apache.flink.runtime.taskexecutor.JobLeaderService        - Start job leader service.
2018-07-25 19:46:08,529 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Connecting to ResourceManager akka.tcp://flink@10.10.97.210:46477/user/resourcemanager(b91b9aeb3565be973c9bb47259414e0a).
2018-07-25 19:46:08,574 WARN  akka.remote.transport.netty.NettyTransport                    - Remote connection to [null] failed with java.net.ConnectException: Connection refused: /10.10.97.210:46477
2018-07-25 19:46:08,576 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@10.10.97.210:46477] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@10.10.97.210:46477]] Caused by: [Connection refused: /10.10.97.210:46477]
2018-07-25 19:46:08,579 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not resolve ResourceManager address akka.tcp://flink@10.10.97.210:46477/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@10.10.97.210:46477/user/resourcemanager..
2018-07-25 19:46:18,606 WARN  akka.remote.transport.netty.NettyTransport                    - Remote connection to [null] failed with java.net.ConnectException: Connection refused: /10.10.97.210:46477
2018-07-25 19:46:18,607 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@10.10.97.210:46477] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@10.10.97.210:46477]] Caused by: [Connection refused: /10.10.97.210:46477]
2018-07-25 19:46:18,607 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not resolve ResourceManager address akka.tcp://flink@10.10.97.210:46477/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@10.10.97.210:46477/user/resourcemanager..
  1. 重新启动任务管理器没有帮助
  2. 重新启动集群没有帮助

如有遗漏,请指导我。

最佳答案

查看日志:

连接被拒绝:/10.10.97.210:46477

端口 46477 是否已从防火墙打开/排除?

只需检查您是否在 flink 配置中设置了以下内容:

jobmanager.rpc.port: 6123 
blob.server.port: 50100-50200 

然后解锁这些端口。

关于java - Flink TaskManager 未重新连接到新的 Jobmanager,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51521271/

相关文章:

docker - Solr Cloud无法连接到(随机)Zookeeper节点(完整的Docker设置)

python - 在卡祖笛中获得选举食谱的现任领导者

java - @scr.property nameRef 的未弃用替代品是什么?

java - 微服务通信

java - 在 Apple M1 Silicon 上运行 Apache Flink 1.12 作业

java - Apache Flink WordCount 示例 - 线程 "main"java.lang.NoClassDefFoundError : org/apache/flink/api/common/functions/FlatMapFunction 中的异常

apache-flink - 当 JSON 模式不同时,如何在 PyFlink SQL 中引用嵌套的 JSON?

java - 使用“查看寻呼机”在选项卡中未显示列表

java - 如何在jlist中添加图片

java - 如何获取任意按下的键值