scala - Spark 工作 Cassandra 错误

标签 scala cassandra apache-spark

每次我使用 cassandra 连接器在 spark 中运行我的 scala 程序时都会收到此错误

    Exception during preparation of SELECT count(*) FROM "eventtest"."simpletbl" WHERE token("a") > ? AND token("a") <= ?   
    ALLOW FILTERING: class org.joda.time.DateTime in JavaMirror with org.apache.spark.util.MutableURLClassLoader@23041911 of type class org.apache.spark.util.MutableURLClassLoader 
    with classpath 
    [file: 
    /home/sysadmin/ApacheSpark/spark-1.4.0-bin-hadoop2.4/work/app-20150711142923-0023/0/./spark-cassandra-connector_2.10-1.4.0-M1.jar
    ,file: 
    /home/sysadmin/ApacheSpark/spark-1.4.0-bin-hadoop2.4/work/app-20150711142923-0023/0/./cassandra-driver-core-2.1.5.jar,file: 
    /home/sysadmin/ApacheSpark/spark-1.4.0-bin-hadoop2.4/work/app-20150711142923-0023/0/./cassandra-spark-job_2.10-1.0.jar,file: 
    /home/sysadmin/ApacheSpark/spark-1.4.0-bin-hadoop2.4/work/app-20150711142923-0023/0/./guava-18.0.jar,file: 
    /home/sysadmin/ApacheSpark/spark-1.4.0-bin-hadoop2.4/work/app-20150711142923-0023/0/./joda-convert-1.2.jar,file: 
    /home/sysadmin/ApacheSpark/spark-1.4.0-bin-hadoop2.4/work/app-20150711142923-0023/0/./cassandra-clientutil-2.1.5.jar,file: 
    /home/sysadmin/ApacheSpark/spark-1.4.0-bin-hadoop2.4/work/app-20150711142923-0023/0/./google-collections-1.0.jar] and parent being sun.misc.Launcher$AppClassLoader@6132b73b of type class sun.misc.Launcher$AppClassLoader with classpath [file: 
    /home/sysadmin/ApacheSpark/spark-1.4.0-bin-hadoop2.4/conf/,file: 
    /home/sysadmin/ApacheSpark/spark-1.4.0-bin-hadoop2.4/lib/spark-assembly-1.4.0-hadoop2.4.0.jar,file: 
    /home/sysadmin/ApacheSpark/spark-1.4.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar,file: 
    /home/sysadmin/ApacheSpark/spark-1.4.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar,file: 
    /home/sysadmin/ApacheSpark/spark-1.4.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar] and parent being sun.misc.Launcher$ExtClassLoader@489bb457 of type class sun.misc.Launcher$ExtClassLoader with classpath [file: 
    /usr/lib/jvm/java-7-openjdk-amd64/jre/lib/ext/dnsns.jar,file: 
    /usr/lib/jvm/java-7-openjdk-amd64/jre/lib/ext/sunpkcs11.jar,file: 
    /usr/lib/jvm/java-7-openjdk-amd64/jre/lib/ext/sunjce_provider.jar,file: 
    /usr/lib/jvm/java-7-openjdk-amd64/jre/lib/ext/zipfs.jar,file: 
    /usr/lib/jvm/java-7-openjdk-amd64/jre/lib/ext/libatk-wrapper.so,file: 
    /usr/lib/jvm/java-7-openjdk-amd64/jre/lib/ext/java-atk-wrapper.jar,file: 
    /usr/lib/jvm/java-7-openjdk-amd64/jre/lib/ext/localedata.jar,file: 
    /usr/lib/jvm/java-7-openjdk-amd64/jre/lib/ext/icedtea-sound.jar] and parent being primordial classloader with boot classpath [/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/resources.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/rt.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/sunrsasign.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/jsse.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/jce.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/charsets.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/rhino.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/jfr.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/classes] not found.
        at com.datastax.spark.connector.rdd.CassandraTableScanRDD.createStatement(CassandraTableScanRDD.scala:163)

这是我的程序

    /** CassandraJob.scala **/

    import com.datastax.spark.connector._
    import  org.apache.spark._


    object CassandraJob {
            def main(args: Array[String]) {
                    val conf = new SparkConf(true)
                                    .set("spark.cassandra.connection.host", "172.28.0.164")
                                    .set("soark.cassandra.connection.rpc.port", "9160")

                    val sc = new SparkContext(conf)
                    val rdd = sc.cassandraTable("eventtest", "simpletbl");
                    println("cassandra row count : " + rdd.count + ", cassanra row : " + rdd.first)

            }
    }

I have build the file using sbt compile, sbt package

Here is how I am submitting spark job

./bin/spark-submit --jars $(echo /home/sysadmin/ApacheSpark/jar/*.jar | tr ' ' ',')  --class "CassandraJob" --master spark://noi-cs-01:7077 /home/sysadmin/ApacheSparkProj/CassandraJob/target/scala-2.10/cassandra-spark-job_2.10-1.0.jar

最佳答案

我猜您使用的是 org.joda.time.DateTime,它在您提交的 jar 中丢失了。只需将此 jar 添加到您的依赖项:... --jars $(echo/home/sysadmin/ApacheSpark/jar/*.jar | tr ' ' ','),/PATH/TO/DOWNLOADED/JODATIME/JAR --class "CassandraJob..."

另一种方法是在 sbt 的库依赖项中包含 org.joda.time.DateTime 并使用 sbt assembly 与此库一起组装 fat jar plugin而不是 sbt package

关于scala - Spark 工作 Cassandra 错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31355404/

相关文章:

java - 启动cassandra的问题

apache-spark - 数据帧上的两个后续 show() 导致结果不一致

multithreading - 如何使并发与写入 hive 表的数据帧一起工作?

cassandra - 更改 Cassandra 分区程序类型

scala - 在 Play 2 模板中格式化 double 的正确方法是什么

scala - 如何在 Scala 中将 DataFrame 转换为 DynamicFrame 对象

java - 在 scala REPL 中导入自定义 java 类

python - 在pycassa中使用column_validators

apache-spark - 斯卡拉 Spark : Calculate grouped-by AUC

scala - 如何使用 Spark-Scala 本地程序从谷歌云存储中读取简单的文本文件