apache-spark - Cassandra Spark Connector 版本与 Spark 2.2 冲突

标签 apache-spark cassandra cassandra-3.0 spark-cassandra-connector

我在运行 Spark 作业时收到以下错误。请建议 Spark 和 Cassandra 连接器的正确版本。

下面是我的 build.sbt

scalaVersion := "2.11.8"

 

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-streaming" % "2.2.0-cdh6.0.1" % "provided",
  "org.apache.spark" %% "spark-core" % "2.2.0-cdh6.0.1" % "provided", // excludeAll ExclusionRule(organization = "javax.servlet"),
  "org.apache.spark" %% "spark-sql" % "2.2.0-cdh6.0.1" % "provided",
  "org.apache.spark" %% "`enter code here`spark-streaming-kafka-0-10" % "2.2.0-cdh6.0.1",
  "org.apache.hbase" % "hbase-client" % "2.0.0-cdh6.0.1",
  "org.apache.hbase" % "hbase-common" % "2.0.0-cdh6.0.1",
  "com.datastax.spark" %% "spark-cassandra-connector" % "2.0.10",
  "net.liftweb" %% "lift-json" % "3.3.0",
  "com.typesafe" % "config" % "1.2.1"
)

在 Spark 上提交作业后,我收到以下错误

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/configuration/ConfigurationException
    at com.datastax.spark.connector.streaming.DStreamFunctions.saveToCassandra$default$4(DStreamFunctions.scala:47)
    at com.StreamingPrerequisiteLoad$.main(StreamingPrerequisiteLoad.scala:72)
    at com.StreamingPrerequisiteLoad.main(StreamingPrerequisiteLoad.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.configuration.ConfigurationException
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)

最佳答案

我在 Spark-cassandra-connector 上遇到了类似的问题,所以看看它是如何工作的, 对于 Spark 版本 2.2 和 scala 11.8.0,spark-cassandra-connector 2.3.0 也可以使用。 添加 commons-configuration 1.9 版本 jar ,因为它会抛出异常 NoClassDefFound:/org/apache/commons/configuration/ConfigurationException 。 尝试使用以下依赖项:

    version := "0.1"

scalaVersion := "2.11.8"

libraryDependencies += "org.apache.spark" %% "spark-core" % "2.2.0" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.2.0" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.0" % "provided"
libraryDependencies += "net.liftweb" %% "lift-json" % "3.0.2"
libraryDependencies += "org.apache.spark" %% "spark-streaming-kafka-0-10" % "2.0.0" //% "provided"
libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector" % "2.3.0" //% "provided"
libraryDependencies += "commons-configuration" % "commons-configuration" % "1.9" //% "provided"

assemblyMergeStrategy in assembly := {
  case PathList("org", "apache", "spark", "unused", "UnusedStubClass.class") => MergeStrategy.first
  case x => (mergeStrategy in assembly).value(x)
}

关于apache-spark - Cassandra Spark Connector 版本与 Spark 2.2 冲突,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54399343/

相关文章:

configuration - Cassandra Commit 日志理想大小和默认大小说明

docker - 如何从游牧者运行 cassandra docker 容器?

python - 懒惰地从 PostgreSQL/Cassandra 创建 Dask DataFrame

Cassandra 高可用性

apache-spark - 如何调整 "spark.rpc.askTimeout"?

Cassandra 压力测试在大量插入后失败

Cassandra 3.11 SSTableLoader 机制

apache-spark - 如何为 KMeans 向量化 json 数据?

apache-spark - pyspark.sql.utils.AnalysisException : Column ambiguous but no duplicate column names

csv - 无法在 pyspark 中显示 CSV 文件(ValueError : Some of types cannot be determined by the first 100 rows, 请重试采样)