hadoop - Spark 作业在 Yarn 集群上运行 java.io.FileNotFoundException : File does not exits ,，即使文件存在于主节点上

我对 Spark 还很陌生。我尝试搜索但找不到合适的解决方案。我已经在两个机器上安装了 hadoop 2.7.2(一个主节点和另一个工作节点)我已经通过以下链接 http://javadev.org/docs/hadoop/centos/6/installation/multi-node-installation-on-centos-6-non-sucure-mode/ 设置了集群我以 root 用户身份运行 hadoop 和 Spark 应用程序来测试集群。

我已在主节点上安装了 Spark，并且 Spark 正在启动，没有任何错误。但是，当我使用 Spark Submit 提交作业时，即使该文件存在于错误中同一位置的主节点中，我也会收到 File Not Found 异常。我正在执行 Spark Submit 命令，请在下面找到日志输出命令。

/bin/spark-submit  --class com.test.Engine  --master yarn --deploy-mode      cluster /app/spark-test.jar

16/04/21 19:16:13 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/04/21 19:16:13 INFO RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/04/21 19:16:14 INFO Client: Requesting a new application from cluster with 1 NodeManagers
16/04/21 19:16:14 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
16/04/21 19:16:14 INFO Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
16/04/21 19:16:14 INFO Client: Setting up container launch context for our AM
16/04/21 19:16:14 INFO Client: Setting up the launch environment for our AM container
16/04/21 19:16:14 INFO Client: Preparing resources for our AM container
16/04/21 19:16:14 INFO Client: Source and destination file systems are the same. Not copying file:/mi/spark/lib/spark-assembly-1.6.1-hadoop2.6.0.jar
16/04/21 19:16:14 INFO Client: Source and destination file systems are the same. Not copying file:/app/spark-test.jar
16/04/21 19:16:14 INFO Client: Source and destination file systems are the same. Not copying file:/tmp/spark-120aeddc-0f87-4411-9400-22ba01096249/__spark_conf__5619348744221830008.zip
16/04/21 19:16:14 INFO SecurityManager: Changing view acls to: root
16/04/21 19:16:14 INFO SecurityManager: Changing modify acls to: root
16/04/21 19:16:14 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
16/04/21 19:16:15 INFO Client: Submitting application 1 to ResourceManager
16/04/21 19:16:15 INFO YarnClientImpl: Submitted application application_1461246306015_0001
16/04/21 19:16:16 INFO Client: Application report for application_1461246306015_0001 (state: ACCEPTED)
16/04/21 19:16:16 INFO Client: 
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1461246375622
     final status: UNDEFINEDsparkcluster01.testing.com
     tracking URL: http://sparkcluster01.testing.com:8088/proxy/application_1461246306015_0001/
     user: root
16/04/21 19:16:17 INFO Client: Application report for application_1461246306015_0001 (state: ACCEPTED)
16/04/21 19:16:18 INFO Client: Application report for application_1461246306015_0001 (state: ACCEPTED)
16/04/21 19:16:19 INFO Client: Application report for application_1461246306015_0001 (state: ACCEPTED)
16/04/21 19:16:20 INFO Client: Application report for application_1461246306015_0001 (state: ACCEPTED)
16/04/21 19:16:21 INFO Client: Application report for application_1461246306015_0001 (state: FAILED)
16/04/21 19:16:21 INFO Client: 
     client token: N/A
     diagnostics: Application application_1461246306015_0001 failed 2 times due to AM Container for appattempt_1461246306015_0001_000002 exited with  exitCode: -1000
For more detailed output, check application tracking page:http://sparkcluster01.testing.com:8088/cluster/app/application_1461246306015_0001Then, click on links to logs of each attempt.
Diagnostics: java.io.FileNotFoundException: File file:/app/spark-test.jar does not exist
Failing this attempt. Failing the application.
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1461246375622
     final status: FAILED
     tracking URL: http://sparkcluster01.testing.com:8088/cluster/app/application_1461246306015_0001
     user: root
Exception in thread "main" org.ap/app/spark-test.jarache.spark.SparkException: Application application_1461246306015_0001 finished with failed status
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:1034)
    at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081)
    at org.apache.spark.deploy.yarn.Client.main(Client.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

I even tried running the spark on HDFS file system by placing my application on HDFS and giving the HDFS path in the Spark Submit command. Even then its throwing File Not Found Exception on some Spark Conf file. I am executing below Spark Submit command and please find the logs output below the command.

 ./bin/spark-submit  --class com.test.Engine  --master yarn --deploy-mode cluster hdfs://sparkcluster01.testing.com:9000/beacon/job/spark-test.jar

16/04/21 18:11:45 INFO RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/04/21 18:11:46 INFO Client: Requesting a new application from cluster with 1 NodeManagers
16/04/21 18:11:46 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
16/04/21 18:11:46 INFO Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
16/04/21 18:11:46 INFO Client: Setting up container launch context for our AM
16/04/21 18:11:46 INFO Client: Setting up the launch environment for our AM container
16/04/21 18:11:46 INFO Client: Preparing resources for our AM container
16/04/21 18:11:46 INFO Client: Source and destination file systems are the same. Not copying file:/mi/spark/lib/spark-assembly-1.6.1-hadoop2.6.0.jar
16/04/21 18:11:47 INFO Client: Uploading resource hdfs://sparkcluster01.testing.com:9000/beacon/job/spark-test.jar -> file:/root/.sparkStaging/application_1461234217994_0017/spark-test.jar
16/04/21 18:11:49 INFO Client: Source and destination file systems are the same. Not copying file:/tmp/spark-f4eef3ac-2add-42f8-a204-be7959c26f21/__spark_conf__6818051470272245610.zip
16/04/21 18:11:50 INFO SecurityManager: Changing view acls to: root
16/04/21 18:11:50 INFO SecurityManager: Changing modify acls to: root
16/04/21 18:11:50 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
16/04/21 18:11:50 INFO Client: Submitting application 17 to ResourceManager
16/04/21 18:11:50 INFO YarnClientImpl: Submitted application application_1461234217994_0017
16/04/21 18:11:51 INFO Client: Application report for application_1461234217994_0017 (state: ACCEPTED)
16/04/21 18:11:51 INFO Client: 
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1461242510849
     final status: UNDEFINED
     tracking URL: http://sparkcluster01.testing.com:8088/proxy/application_1461234217994_0017/
     user: root
16/04/21 18:11:52 INFO Client: Application report for application_1461234217994_0017 (state: ACCEPTED)
16/04/21 18:11:53 INFO Client: Application report for application_1461234217994_0017 (state: ACCEPTED)
16/04/21 18:11:54 INFO Client: Application report for application_1461234217994_0017 (state: FAILED)
16/04/21 18:11:54 INFO Client: 
     client token: N/A
     diagnostics: Application application_1461234217994_0017 failed 2 times due to AM Container for appattempt_1461234217994_0017_000002 exited with  exitCode: -1000
For more detailed output, check application tracking page:http://sparkcluster01.testing.com:8088/cluster/app/application_1461234217994_0017Then, click on links to logs of each attempt.
Diagnostics: File file:/tmp/spark-f4eef3ac-2add-42f8-a204-be7959c26f21/__spark_conf__6818051470272245610.zip does not exist
java.io.FileNotFoundException: File file:/tmp/spark-f4eef3ac-2add-42f8-a204-be7959c26f21/__spark_conf__6818051470272245610.zip does not exist
    at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:609)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:822)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:599)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
    at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
    at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
    at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
    at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358)
    at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Failing this attempt. Failing the application.
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1461242510849
     final status: FAILED
     tracking URL: http://sparkcluster01.testing.com:8088/cluster/app/application_1461234217994_0017
     user: root
Exception in thread "main" org.apache.spark.SparkException: Application application_1461234217994_0017 finished with failed status
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:1034)
    at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081)
    at org.apache.spark.deploy.yarn.Client.main(Client.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/04/21 18:11:55 INFO ShutdownHookManager: Shutdown hook called
16/04/21 18:11:55 INFO ShutdownHookManager: Deleting directory /tmp/spark-f4eef3ac-2add-42f8-a204-be7959c26f21

最佳答案

spark 配置未指向正确的 hadoop 配置目录。 2.7.2 的 hadoop 配置位于文件路径 hadoop 2.7.2./etc/hadoop/而不是/root/hadoop2.7.2/conf。当我在spark-env.sh下指向HADOOP_CONF_DIR=/root/hadoop2.7.2/etc/hadoop/时，spark提交开始工作并且文件未找到异常消失了。早些时候它指向/root/hadoop2.7.2/conf (它不存在)。如果 Spark 没有指向正确的 hadoop 配置目录，则可能会导致类似的错误。我认为这可能是 Spark 中的一个错误，它应该优雅地处理它，而不是抛出模棱两可的错误消息。

关于hadoop - Spark 作业在 Yarn 集群上运行 java.io.FileNotFoundException : File does not exits ,，即使文件存在于主节点上，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/36753546/

hadoop - Spark 作业在 Yarn 集群上运行 java.io.FileNotFoundException : File does not exits ,，即使文件存在于主节点上

上一篇：actionscript-3 - 在as3中写入16位字节数组

下一篇：pic - 使用 microchip c18 编译器在 pic18f 上创建大缓冲区