Hadoop ResourceManager HA 连接到 ResourceManager at/0.0.0.0 :8032

标签 hadoop high-availability failover resourcemanager

扩展其中一个问题: Hadoop: Connecting to ResourceManager failed

Hadoop 2.6.1

我确实配置了 ResourceManager HA。

当我确实终止“本地”ResourceManager(以检查集群)时,就会发生故障转移,并且其他服务器上的 ResourceManager 变为事件状态。不幸的是,当我尝试使用“本地”实例节点管理器运行作业时,它不会将请求“故障转移”到事件的 ResourceManager。

yarn@stg-hadoop106:~$ jps
26738 Jps
23463 DataNode
23943 DFSZKFailoverController
24297 NodeManager
25690 ResourceManager
23710 JournalNode
23310 NameNode

#kill and start ResourceManager, so the failover occur
yarn@stg-hadoop106:~$ kill -9 25690
~/hadoop/sbin/yarn-daemon.sh  start resourcemanager

yarn@stg-hadoop106:~$ ~/hadoop/bin/yarn  rmadmin -getServiceState rm1
standby
yarn@stg-hadoop106:~$ ~/hadoop/bin/yarn  rmadmin -getServiceState rm2
active

#run my class:

14:56:51.476 [main] INFO  o.apache.samza.job.yarn.ClientHelper - trying to connect to RM 0.0.0.0:8032
2015-10-29 14:56:51 RMProxy [INFO] Connecting to ResourceManager at /0.0.0.0:8032
14:56:51.572 [main] DEBUG o.a.h.s.a.util.KerberosName - Kerberos krb5 configuration not found, setting default realm to empty
2015-10-29 14:56:51 NativeCodeLoader [WARN] Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14:56:51.575 [main] DEBUG o.a.hadoop.util.PerformanceAdvisory - Falling back to shell based
2015-10-29 14:56:52 Client [INFO] Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2015-10-29 14:56:53 Client [INFO] Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

yarn-site.xml

 <property>
     <name>yarn.resourcemanager.ha.enabled</name>
     <value>true</value>
 </property>
 <property>
     <name>yarn.resourcemanager.cluster-id</name>
     <value>clusterstaging</value>
 </property>
 <property>
     <name>yarn.resourcemanager.ha.rm-ids</name>
     <value>rm1,rm2,rm3</value>
 </property>
 <property>
     <name>yarn.resourcemanager.hostname.rm1</name>
     <value>stg-hadoop106</value>
 </property>
 <property>
     <name>yarn.resourcemanager.hostname.rm2</name>
     <value>stg-hadoop107</value>
 </property>
 <property>
     <name>yarn.resourcemanager.hostname.rm3</name>
     <value>stg-hadoop108</value>
 </property>
 <property>
     <name>yarn.resourcemanager.zk-address</name>
     <value>A:2181,B:2181,C:2181</value>
 </property>

我没有配置

<name>yarn.resourcemanager.hostname</name>

因为它应该“按原样”工作 - 如果我错了请纠正我:)

我试过了

<name>yarn.client.failover-proxy-provider</name>

但没有成功

有什么想法吗? 也许我错误地期望客户端找到事件的 RM 节点?

你知道如何在“auto-failover”选项中切换节点主备吗?

~/hadoop/bin/yarn  rmadmin -failover rm1 rm2
    Exception in thread "main" java.lang.UnsupportedOperationException: RMHAServiceTarget doesn't have a corresponding ZKFC address

~/hadoop/bin/yarn  rmadmin -transitionToActive rm1 rm2
    Automatic failover is enabled for org.apache.hadoop.yarn.client.RMHAServiceTarget@2b72cb8a
    Refusing to manually manage HA state, since it may cause

最佳答案

如果在自动故障转移模式下启用 HA-RM,则无法触发事件到备用或反之亦然。并且您应该提供 yarn.client.failover-proxy-provider 参数,这是客户端用于故障转移到 Active RM 的类。以及配置 yarn.resourcemanager.hostname 以识别 RM(即 rm1、rm2)。

如果没有启用自动故障转移,您可以使用下面的触发 yarn rmadmin -transitionToStandby rm1

请做以上修改并回复结果

关于Hadoop ResourceManager HA 连接到 ResourceManager at/0.0.0.0 :8032,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33418065/

相关文章:

mongodb - mongodb中的Mapreduce

java - hadoop中的分区文件是如何创建的

hadoop - 在 mapreduce 中,为什么映射器不通过网络将输出键值直接发送到缩减器?

sql-server - 连接字符串 USING Application Intent 错误

NHibernate 和数据库连接故障转移?

postgresql - Postgres 9.0和pgpool复制: single point of failure?

hadoop - Hbase- 即使删除列族后 Hadoop DFS 大小也没有减少

hadoop - Hadoop HA配置问题

mysql - 我可以查询给定主机给出了多少个 connect_errors 吗?

mongodb - 在 J=1 和 W=Majority 的 MongoDB 副本集上是否仍然可以回滚?