apache-spark - yarn 容器故障引起的 Spark

标签 apache-spark hadoop hadoop-yarn hadoop2

For reference: I solved this issue by adding Netty 4.1.17 in hadoop/share/hadoop/common

无论我尝试运行什么 jar(包括来自 https://spark.apache.org/docs/latest/running-on-yarn.html 的示例),在 Yarn 上运行 Spark 时,我总是收到有关容器故障的错误。我在命令提示符中收到此错误:

Diagnostics: Exception from container-launch.
Container id: container_1530118456145_0001_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
    at org.apache.hadoop.util.Shell.run(Shell.java:482)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

当我查看日志时,我发现了这个错误:

Exception in thread "main" java.lang.NoSuchMethodError:io.netty.buffer.PooledByteBufAllocator.metric()Lio/netty/buffer/PooledByteBufAllocatorMetric;
    at org.apache.spark.network.util.NettyMemoryMetrics.registerMetrics(NettyMemoryMetrics.java:80)
    at org.apache.spark.network.util.NettyMemoryMetrics.<init>(NettyMemoryMetrics.java:76)
    at org.apache.spark.network.client.TransportClientFactory.<init>(TransportClientFactory.java:109)
    at org.apache.spark.network.TransportContext.createClientFactory(TransportContext.java:99)
    at org.apache.spark.rpc.netty.NettyRpcEnv.<init>(NettyRpcEnv.scala:71)
    at org.apache.spark.rpc.netty.NettyRpcEnvFactory.create(NettyRpcEnv.scala:461)
    at org.apache.spark.rpc.RpcEnv$.create(RpcEnv.scala:57)
    at org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:530)
    at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:347)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply$mcV$sp(ApplicationMaster.scala:260)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$5.run(ApplicationMaster.scala:815)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1758)
    at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:814)
    at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:259)
    at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:839)
    at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:869)
    at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)

知道为什么会这样吗?这是在根据本教程设置的伪分布式集群上运行的:https://wiki.apache.org/hadoop/Hadoop2OnWindows。 Spark 在本地运行良好,看到这个 jar 是随 Spark 一起提供的,我怀疑这是 jar 内部的问题。 (无论如何,我在另一个 jar 中添加了一个 Netty 依赖项,但我仍然遇到同样的错误)。

我的 spark-defaults.conf 中唯一设置的是 spark.yarn.jars,它指向我上传所有 Spark jar 的 hdfs 目录。 io.netty.buffer.PooledByteBufAllocator 包含在这些 jar 中。

Spark 2.3.1、Hadoop 2.7.6

最佳答案

我遇到了完全相同的问题。以前我使用 Hadoop 2.6.5 和兼容的 spark 版本,一切正常。当我切换到 Hadoop 2.7.6 时,出现了问题。不确定是什么原因,但我将 netty.4.1.17.Final jar 文件复制到 hadoop 库文件夹,然后问题就消失了。

关于apache-spark - yarn 容器故障引起的 Spark ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51068075/

相关文章:

hadoop - 如何更改hadoop中分区的reducer输出名称

hadoop - Hive:Mapreduce 文件丢失

python - 不要在 Spark (Python) 中写入 None 或空行

hadoop - 无法从远程客户端连接到 HDFS 数据节点

apache-spark - 当资源不足时,Spark作业会等待Yarn的资源多长时间?

rest - YARN - "Resource­Manager Web UI"端口的可能值

java - 如何避免spark scala中的循环依赖异常并使代码以循环依赖运行

apache-spark - 当状态数据增长时,Spark Structured Streaming 如何处理内存中状态?

caching - 在 PySpark 环境中创建缓存的最佳方式

hadoop - 如何编写脚本以在centos中启动多个服务?