java - Apache Spark : Importing jars

我在 Windows 计算机上使用 Apache Spark。我对此比较陌生，在将代码上传到集群之前，我在本地工作。

我编写了一个非常简单的 scala 程序，一切正常:

println("creating Dataframe from json")
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val rawData = sqlContext.read.json("test_data.txt")
println("this is the test data table")
rawData.show()
println("finished running")

程序正确执行。我现在想要添加一些处理，调用一些我预先打包在 JAR 文件中的简单 Java 函数。我正在运行 scala shell。正如入门页面上所述，我使用以下命令启动 shell:

c:\Users\eshalev\Desktop\spark-1.4.1-bin-hadoop2.6\bin\spark-shell --master local[4] --jars myjar-1.0-SNAPSHOT.jar

重要事实:我的本地计算机上没有安装 hadoop。但由于我只是解析一个文本文件，这应该不重要，并且直到我使用 --jars 才重要。

我现在继续运行相同的 scala 程序。还没有对 jar 文件的引用...这次我得到:

...some SPARK debug code here and then...
    15/09/08 14:27:37 INFO Executor: Fetching http://10.61.97.179:62752/jars/myjar-1.0-SNAPSHOT.jar with timestamp 144
    1715239626
    15/09/08 14:27:37 INFO Utils: Fetching http://10.61.97.179:62752/jars/myjar-1.0-SNAPSHOT.jar-1.0 to C:\Users\eshalev\A
    ppData\Local\Temp\spark-dd9eb37f-4033-4c37-bdbf-5df309b5eace\userFiles-ebe63c02-8161-4162-9dc0-74e3df6f7356\fetchFileTem
    p2982091960655942774.tmp
    15/09/08 14:27:37 INFO Executor: Fetching http://10.61.97.179:62752/jars/myjar-1.0-SNAPSHOT.jar with timestamp 144
    1715239626
    15/09/08 14:27:37 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
    java.lang.NullPointerException
            at java.lang.ProcessBuilder.start(Unknown Source)
            at org.apache.hadoop.util.Shell.runCommand(Shell.java:482)
            at org.apache.hadoop.util.Shell.run(Shell.java:455)
            at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
            at org.apache.hadoop.fs.FileUtil.chmod(FileUtil.java:873)
            at org.apache.hadoop.fs.FileUtil.chmod(FileUtil.java:853)
            at org.apache.spark.util.Utils$.fetchFile(Utils.scala:465)
... aplenty more spark debug messages here, and then ...
this is the test data table
<console>:20: error: not found: value rawData
              rawData.show()
              ^
finished running

我仔细检查了http://10.61.97.179:62752/jars/myjar-1.0-SNAPSHOT.jar-1.0-SNAPSHOT.jar ，我可以很好地下载它。话又说回来，代码中还没有任何内容引用该 jar。如果启动 shell 时不带 --jar 一切正常。

最佳答案

我在另一个集群上尝试了这个，它是spark 1.3.1并且安装了hadoop。它完美地工作了。

在我的单节点设置的堆栈跟踪中提到 hadoop 的次数使我相信需要实际安装 hadoop 才能使用 --jars 标志。

另一个选项是我的 Spark 1.4 设置的问题，在此之前该设置一直完美运行。

关于java - Apache Spark : Importing jars，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/32458368/

java - Apache Spark : Importing jars

上一篇：java - MongoDB Java 驱动程序 Jackson Mapper MongoJack

下一篇：java - 如何在另一个线程中创建 Android 通知？