我已经使用 Apache Spark 有一段时间了,但是现在我在执行以下示例时遇到了以前从未发生过的错误(我刚刚更新到 Spark 2.1.1):
./opt/sparkFiles/spark-2.1.1-bin-hadoop2.7/bin/run-example SparkPi
这是实际的堆栈跟踪:
17/07/05 10:50:54 ERROR SparkContext: Failed to add file:/opt/sparkFiles/spark-2.1.1-bin-hadoop2.7/examples/jars/spark-warehouse/ to Spark environment
java.lang.IllegalArgumentException: Directory /opt/sparkFiles/spark-2.1.1-bin-hadoop2.7/examples/jars/spark-warehouse is not allowed for addJar
at org.apache.spark.SparkContext.liftedTree1$1(SparkContext.scala:1735)
at org.apache.spark.SparkContext.addJar(SparkContext.scala:1729)
at org.apache.spark.SparkContext$$anonfun$11.apply(SparkContext.scala:466)
at org.apache.spark.SparkContext$$anonfun$11.apply(SparkContext.scala:466)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:466)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2320)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:868)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:860)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860)
at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31)
at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Pi is roughly 3.1433757168785843
不知道是真的错误还是我遗漏了什么,因为无论如何执行示例,您可以看到Pi大致......最后的结果。
以下是 的配置行spark-env.sh :
export SPARK_MASTER_IP=X.X.X.X
export SPARK_MASTER_WEBUI_PORT=YYYY
export SPARK_WORKER_CORES=4
export SPARK_WORKER_MEMOiRY=7g
以下是 的配置行spark-defaults.sh :
spark.master local[*]
spark.driver.cores 4
spark.driver.memory 2g
spark.executor.cores 4
spark.executor.memory 4g
spark.ui.showConsoleProgress false
spark.driver.extraClassPath /opt/sparkFiles/spark-2.1.1-bin-hadoop2.7/lib/postgresql-9.4.1207.jar
spark.eventLog.enabled true
spark.eventLog.dir file:///opt/sparkFiles/spark-2.1.1-bin-hadoop2.7/logs
spark.history.fs.logDirectory file:///opt/sparkFiles/spark-2.1.1-bin-hadoop2.7/logs
Apache Spark 版本: 2.1.1
Java 版本: 1.8.0_91
Python 版本: 2.7.5
我试过用这个配置它,但没有成功:
spark.sql.warehouse.dir file:///c:/tmp/spark-warehouse
这很奇怪,因为当我编译脚本并使用 spark-submit 启动它时,我没有收到此错误。没有找到任何jira票或其他东西。
最佳答案
我的 也有类似的问题Java Spark 代码。即使您的问题出在 Python-Spark 中,这也可能对您/某人有所帮助。
我必须使用 --jar 选项指定一些依赖 jars 来触发。最初,我提供了目录(包含所有依赖项 jar)的路径(即 --jars <path-to-dependency>/
),但出现了上述错误。
--jars 选项(spark-submit 的)似乎只接受实际 jar 的路径( <path-to-directory>/<name>.jar
),而不仅仅是目录路径( <path-to-directory>/
)。
当我将所有依赖项移动到单个依赖项 jar 并将其指定给 --jar 选项时,问题为我解决,如下所示bash ~/spark/bin/spark-submit --class "<class-name>" --jars '<path-to-dependency-jars/<dependency-jar>.jar' --master local <dependency-jar>.jar <input-val1> <input-vale2>
关于apache-spark - 错误 SparkContext 无法在 Apache Spark 2.1.1 中添加文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44929984/