apache-spark - Spark 错误和hadoop错误

标签 apache-spark hadoop

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/tmp/hadoop-hp/nm-local-dir/usercache/hp/filecache/28/__spark_libs__5301477595013800425.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hp/hadoop-2.7.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
18/04/06 21:28:08 WARN SparkConf: spark.master yarn-cluster is deprecated in Spark 2.0+, please instead use "yarn" with specified deploy mode.
java.io.FileNotFoundException: /home/hp/data/gTree.txt (No such file or directory)
    at java.io.FileInputStream.open0(Native Method)
    at java.io.FileInputStream.open(FileInputStream.java:195)
    at java.io.FileInputStream.<init>(FileInputStream.java:138)
    at java.io.FileInputStream.<init>(FileInputStream.java:93)
    at com.exsparkbasic.ExSparkBasic.kAnonymity_spark.loadGenTree(kAnonymity_spark.java:50)
    at com.exsparkbasic.ExSparkBasic.kAnonymity_spark.run(kAnonymity_spark.java:391)
    at com.exsparkbasic.ExSparkBasic.kAnonymity_spark.main(kAnonymity_spark.java:427)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:627)
java.io.FileNotFoundException: /home/hp/data/t1_resizingBy_10000.txt (No such file or directory)
    at java.io.FileInputStream.open0(Native Method)
    at java.io.FileInputStream.open(FileInputStream.java:195)
    at java.io.FileInputStream.<init>(FileInputStream.java:138)
    at java.io.FileInputStream.<init>(FileInputStream.java:93)
    at com.exsparkbasic.ExSparkBasic.kAnonymity_spark.loadData(kAnonymity_spark.java:149)
    at com.exsparkbasic.ExSparkBasic.kAnonymity_spark.run(kAnonymity_spark.java:392)
    at com.exsparkbasic.ExSparkBasic.kAnonymity_spark.main(kAnonymity_spark.java:427)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:627)
18/04/06 21:28:39 ERROR ApplicationMaster: User class threw exception: java.lang.NullPointerException
java.lang.NullPointerException
    at com.exsparkbasic.ExSparkBasic.kAnonymity_spark.performAnonymity(kAnonymity_spark.java:365)
    at com.exsparkbasic.ExSparkBasic.kAnonymity_spark.run(kAnonymity_spark.java:394)
    at com.exsparkbasic.ExSparkBasic.kAnonymity_spark.main(kAnonymity_spark.java:427)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:627)

/home/hp/data/gTree.txt 存在并已添加到 Hadoop 文件系统中。但我仍然遇到错误。

jar文件中的Java代码

FileInputStream stream = new FileInputStream ("/home/hp/data/gTree.txt");
InputStreamReader reader = new InputStreamReader (stream);
BufferedReader buffer = new BufferedReader (reader);

这部分出现错误。

在使用Hadoop yarn clusters时,是否应该设置文件的路径值?

hp@master:~$ ls -al /home/hp/data/gTree.txt
-rw-rw-r-- 1 hp hp 419 11월 16 16:17 /home/hp/data/gTree.txt

hp@master:~$ hadoop fs -ls /home/hp/data/gTree.txt
-rw-r--r--   3 hp supergroup        419 2018-04-06 21:06 /home/hp/data/gTree.txt

最佳答案

出现此问题是因为您特别提到了文件 /home/hp/data/gTree.txt使用 FileInputStream它从本地文件系统而不是从 hdfs 读取。

由于在数据节点中运行的 Spark 应用程序代码正在尝试从本地文件系统读取此文件,因此遇到异常。

根据您的用例,您可能必须使用 hdfs://<NN:port>/<File Name>引用文件。 你很可能想做 SparkContext.textFile() .引用这个example .

关于apache-spark - Spark 错误和hadoop错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49693038/

相关文章:

apache-spark - spark UI 对应用程序的内存使用有什么影响?

apache-spark - 如何释放 Dataproc 中 block 池使用的空间

apache-spark - pySpark 数据帧 "assert isinstance(dataType, DataType), "数据类型应为数据类型”

java - 解决 Apache Spark 中的依赖问题

dictionary - 如何将由map创建的序列文件保留在hadoop中

amazon-web-services - 使用YARN运行分布式xgboost的问题

python - python : java. lang.reflect.InaccessibleObjectException 上的 Apache-Spark 错误

Hadoop集群无密码ssh设置

hadoop - 从 Hive 中拆分数组的末尾进行评估

hadoop - 如何配置配置单元服务器以远程模式运行?