hadoop - yarn 作业不会超过 "state: ACCEPTED"

标签 hadoop apache-spark mapreduce hdfs hadoop-yarn

提前感谢您的帮助。 我正在使用提供的 Hadoop 示例运行 yarn 作业。作业永远不会完成并停留在“ACCEPTED”状态。查看正在打印的内容,似乎作业正在等待完成——并且客户端不断地探测作业状态。

示例作业(来自 Hadoop 2.6.0):

spark-submit --master yarn-client --driver-memory 4g --executor-memory 2g --executor-cores 4  --class org.apache.spark.examples.SparkPi /home/john/spark/spark-1.6.1-bin-hadoop2.6/lib/spark-examples-1.6.1-hadoop2.6.0.jar 100

输出:

....
....
 disabled; ui acls disabled; users with view permissions: Set(john); users with modify permissions: Set(jogn)
16/07/27 17:36:09 INFO yarn.Client: Submitting application 1 to ResourceManager
16/07/27 17:36:09 INFO impl.YarnClientImpl: Submitted application application_1469665943738_0001
16/07/27 17:36:10 INFO yarn.Client: Application report for application_1469665943738_0001 (state: ACCEPTED)
16/07/27 17:36:10 INFO yarn.Client:
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: default
         start time: 1469666169333
         final status: UNDEFINED
         tracking URL: http://cpt-bdx021:8088/proxy/application_1469665943738_0001/
         user: john
16/07/27 17:36:11 INFO yarn.Client: Application report for application_1469665943738_0001 (state: ACCEPTED)
16/07/27 17:36:12 INFO yarn.Client: Application report for application_1469665943738_0001 (state: ACCEPTED)
16/07/27 17:36:13 INFO yarn.Client: Application report for application_1469665943738_0001 (state: ACCEPTED)
16/07/27 17:36:14 INFO yarn.Client: Application report for application_1469665943738_0001 (state: ACCEPTED)
16/07/27 17:36:15 INFO yarn.Client: Application report for application_1469665943738_0001 (state: ACCEPTED)
16/07/27 17:36:16 INFO yarn.Client: Application report for application_1469665943738_0001 (state: ACCEPTED)
16/07/27 17:36:17 INFO yarn.Client: Application report for application_1469665943738_0001 (state: ACCEPTED)
16/07/27 17:36:18 INFO yarn.Client: Application report for application_1469665943738_0001 (state: ACCEPTED)
16/07/27 17:36:19 INFO yarn.Client: Application report for application_1469665943738_0001 (state: ACCEPTED)
16/07/27 17:36:20 INFO yarn.Client: Application report for application_1469665943738_0001 (state: ACCEPTED)
16/07/27 17:36:21 INFO yarn.Client: Application report for application_1469665943738_0001 (state: ACCEPTED)
16/07/27 17:36:22 INFO yarn.Client: Application report for application_1469665943738_0001 (state: ACCEPTED)
...........
...........
...........

更新(看起来作业已提交给 ResourceManager——因此“已接受”,但 ResourceManager“看不到”任何节点或 hadoop worker 实际将作业传送到):

$ jps
jps
12404 Jps
12211 NameNode
12315 DataNode
11743 ApplicationHistoryServer
11876 ResourceManager
11542 NodeManager

$ yarn node -list
        16/07/27 23:07:53 INFO client.RMProxy: Connecting to ResourceManager at /192.168.0.5.55:8032
        Total Nodes:0
                 Node-Id             Node-State Node-Http-Address       Number-of-Running-Containers

更新(2):我正在使用默认的 etc/container-executor.cfg 文件:

yarn.nodemanager.linux-container-executor.group=#configured value of yarn.nodemanager.linux-container-executor.group
banned.users=#comma separated list of users who can not run applications
min.user.id=1000#Prevent other super-users
allowed.system.users=##comma separated list of system users who CAN run applications

此外,就我而言,我想指出我没有 hadoop 用户或 hadoop` 用户组。我正在使用登录系统时使用的默认帐户。如果那很重要。谢谢!


更新(3):NodeManager 日志

org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at 192.168.0.5.55:8031
2016-07-28 00:23:26,083 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out 0 NM container statuses: []
2016-07-28 00:23:26,087 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registering with RM using containers :[]
2016-07-28 00:23:26,233 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: Rolling master-key for container-tokens, got key with id -160570002
2016-07-28 00:23:26,236 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM: Rolling master-key for container-tokens, got key with id -1876215653
2016-07-28 00:23:26,237 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as 192.168.0.5.55:53034 with total resource of <memory:8192, vCores:8>
2016-07-28 00:23:26,237 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Notifying ContainerManager to unblock new container-requests

最佳答案

您的工作从未完成的原因是因为它从未进入状态 RUNNING(从状态 ACCEPTED)。有一个调度程序负责调度哪些应用程序将获取资源,从而状态为 RUNNING。

有两个可用的调度器:公平调度器和容量调度器。您可以在 Hadoop Yarn 文档中找到详细信息。如果你能提供 yarn-site.xml、capacity-scheduler.xml 和 fair-scheduler.xml 文件,我会给你更好的帮助:)。

关于hadoop - yarn 作业不会超过 "state: ACCEPTED",我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38619130/

相关文章:

hadoop - 在不使用 HIVE 的情况下在 HDFS 中以 ORC 格式存储 avro 数据

hadoop - Hive 安装问题 : Hive metastore database is not initialized

hadoop - hadoop复制不起作用

api - 我在哪里可以下载所有必要的类来编写 Hadoop MapReduce 作业?

java - 在单个类中实现 Function 和 Buffer 是个好主意吗?

apache-spark - 为什么spark Streaming从kafka接收数据使用的内存比<executorMemory * executorCount + driverMemory>更多?

docker-spark 上的 Hadoop “Unable to load native-hadoop library for your platform” 错误?

python - 如何在 Jupyter Notebook 中正确设置 SparkContext 的配置?

hadoop - 如何在hadoop中使用关键字匹配从多个页面获取整个页面内容

hadoop - 如何从两个映射器到一个化简器使用键和值?