hadoop - Spark 1.4 缺少 Kafka 库

标签 hadoop apache-spark apache-kafka spark-streaming hortonworks-data-platform

我正在尝试运行在 spark 1.3.1 中完美运行的 Python spark 脚本。 我已经下载了 spark 1.4 并尝试运行脚本,但它一直在说

Spark Streaming's Kafka libraries not found in class path. Try one of the following.

  1. Include the Kafka library and its dependencies with in the spark-submit command as

    $ bin/spark-submit --packages org.apache.spark:spark-streaming-kafka:1.4.0 ...
    
  2. Download the JAR of the artifact from Maven Central http://search.maven.org/, Group Id = org.apache.spark, Artifact Id = spark-streaming-kafka-assembly, Version = 1.4.0. Then, include the jar in the spark-submit command as

    $ bin/spark-submit --jars <spark-streaming-kafka-assembly.jar> ...
    

我已经在我的提交命令中明确引用了这些 jar 并将这些 jar 添加为

/opt/spark/spark-1.4.0-bin-hadoop2.6/bin/spark-submit --jars spark-streaming_2.10-1.4.0.jar,spark-core_2.10-1.4.0.jar,spark-streaming-kafka-assembly_2.10-1.4.0.jar,kafka_2.10-0.8.2.1.jar,kafka-clients-0.8.2.1.jar,spark-streaming-kafka-assembly_2.10-1.4.0.jar /root/SparkPySQLNew.py

它还说它在应用程序启动时添加了它们,为什么找不到它们?

15/07/08 05:44:37 INFO spark.SparkContext: Added JAR file:/root/spark-streaming_2.10-1.4.0.jar at http://192.168.134.138:49637/jars/spark-streaming_2.10-1.4.0.jar with timestamp 1436334277792
15/07/08 05:44:37 INFO spark.SparkContext: Added JAR file:/root/spark-core_2.10-1.4.0.jar at http://192.168.134.138:49637/jars/spark-core_2.10-1.4.0.jar with timestamp 1436334277919
15/07/08 05:44:38 INFO spark.SparkContext: Added JAR file:/root/spark-streaming-kafka-assembly_2.10-1.4.0.jar at http://192.168.134.138:49637/jars/spark-streaming-kafka-assembly_2.10-1.4.0.jar with timestamp 1436334278295
15/07/08 05:44:38 INFO spark.SparkContext: Added JAR file:/root/kafka_2.10-0.8.2.1.jar at http://192.168.134.138:49637/jars/kafka_2.10-0.8.2.1.jar with timestamp 1436334278353
15/07/08 05:44:38 INFO spark.SparkContext: Added JAR file:/root/kafka-clients-0.8.2.1.jar at http://192.168.134.138:49637/jars/kafka-clients-0.8.2.1.jar with timestamp 1436334278357
15/07/08 05:44:38 INFO spark.SparkContext: Added JAR file:/root/spark-streaming-kafka-assembly_2.10-1.4.0.jar at http://192.168.134.138:49637/jars/spark-streaming-kafka-assembly_2.10-1.4.0.jar with timestamp 1436334278665
15/07/08 05:44:38 INFO spark.SparkContext: Added JAR file:/root/spark-streaming-kafka-assembly_2.10-1.4.0-sources.jar at http://192.168.134.138:49637/jars/spark-streaming-kafka-assembly_2.10-1.4.0-sources.jar with timestamp 1436334278666               

而且我知道我已经添加了很多,我从一个开始,然后在最后将它们全部添加。

最佳答案

我怀疑确切的答案因 spark 版本而异,但基于 this HCC thread , 以下似乎对其他人有用:

spark-submit --jars spark-assembly-1.5.2.2.3.4.7-4-hadoop2.7.1.2.3.4.7-4.jar,spark-streaming-kafka-assembly_2.10-1.6.1.jar 

乍一看,区别在于它有 1 个 spark-streaming-kafka-assembly jar,而您提交的是两个。

关于hadoop - Spark 1.4 缺少 Kafka 库,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31284504/

相关文章:

java - 在 Java hdfs 中读取文件

apache-kafka - 为什么在 Confluence 的模式注册表中使用主题?

java - 卡夫卡 : could not find or load main class installation Windows

斯卡拉/ Spark : How to convert List of values into separate rows?

mysql - 通过对现有行执行 GROUP BY 来更新表 SPARK - SQL?

apache-spark - 使用临时目录的 Spark 事务写操作

java - Kafka无法更新元数据

jar - 为 hadoop 工具设置额外的类路径

hadoop - 运行名称节点时出错

hadoop - 如何让hadoop中的每个mapper类读取同一个文件