apache-spark - 无法使用 Spark-cassandra-connector 启动 Spark-Shell

标签 apache-spark

我构建了一个包含 3 个节点的 cassandra 集群,并在其上安装了 Spark 集群。

使用以下脚本在一台虚拟机上启动 spark-shell,启动失败。

spark-shell -v --master spark://storm.c.gcp20170324.internal:7077 --packages datastax:spark-cassandra-connector:2.0.6-s_2.11 --conf spark.cassandra.connection.host=10.128.0.4 --conf spark.cassandra.read.timeout_ms=2400000 --conf spark.cassandra.query.retry.count=600 --conf spark.cassandra.connection.timeout_ms=50000 --conf spark.cassandra.input.split.size_in_mb=67108864 --conf spark.network.timeout=600s --conf spark.executor.heartbeatInterval=100s

出现以下错误:

Ivy Default Cache set to: /home/nmj/.ivy2/cache
The jars for the packages stored in: /home/nmj/.ivy2/jars
:: loading settings :: url = jar:file:/opt/spark-2.1.2-bin-hadoop2.7/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
datastax#spark-cassandra-connector added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
        confs: [default]
        found datastax#spark-cassandra-connector;2.0.6-s_2.11 in spark-packages
        found commons-beanutils#commons-beanutils;1.9.3 in local-m2-cache
        found commons-collections#commons-collections;3.2.2 in local-m2-cache
        found org.joda#joda-convert;1.2 in local-m2-cache
        found joda-time#joda-time;2.3 in local-m2-cache
        found io.netty#netty-all;4.0.33.Final in local-m2-cache
        found com.twitter#jsr166e;1.1.0 in local-m2-cache
        found org.scala-lang#scala-reflect;2.11.8 in local-m2-cache
:: resolution report :: resolve 906ms :: artifacts dl 27ms
        :: modules in use:
        com.twitter#jsr166e;1.1.0 from local-m2-cache in [default]
        commons-beanutils#commons-beanutils;1.9.3 from local-m2-cache in [default]
        commons-collections#commons-collections;3.2.2 from local-m2-cache in [default]
        datastax#spark-cassandra-connector;2.0.6-s_2.11 from spark-packages in [default]
        io.netty#netty-all;4.0.33.Final from local-m2-cache in [default]
        joda-time#joda-time;2.3 from local-m2-cache in [default]
        org.joda#joda-convert;1.2 from local-m2-cache in [default]
        org.scala-lang#scala-reflect;2.11.8 from local-m2-cache in [default]
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |   8   |   0   |   0   |   0   ||   8   |   0   |
        ---------------------------------------------------------------------

:: problems summary ::
:::: WARNINGS
                [NOT FOUND  ] io.netty#netty-all;4.0.33.Final!netty-all.jar (2ms)

        ==== local-m2-cache: tried

          file:/home/nmj/.m2/repository/io/netty/netty-all/4.0.33.Final/netty-all-4.0.33.Final.jar

                ::::::::::::::::::::::::::::::::::::::::::::::

                ::              FAILED DOWNLOADS            ::

                :: ^ see resolution messages for details  ^ ::

                ::::::::::::::::::::::::::::::::::::::::::::::

                :: io.netty#netty-all;4.0.33.Final!netty-all.jar

                ::::::::::::::::::::::::::::::::::::::::::::::



:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Exception in thread "main" java.lang.RuntimeException: [download failed: io.netty#netty-all;4.0.33.Final!netty-all.jar]
        at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1084)
        at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:296)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:160)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

在其他虚拟机上,启动成功。

日志中的差异是成功的虚拟机在中央而不是在local-m2-cache中找到了依赖项。

最佳答案

正如 @JacekLaskowski 所说,删除目录 /home/nmj/.m2home/nmj/.ivy2 是有效的。

关于apache-spark - 无法使用 Spark-cassandra-connector 启动 Spark-Shell,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48112489/

相关文章:

apache-spark - 如何在 Spark 中命名 DataFrame 以使 DAG 图更易于阅读?

apache-spark - "one Executor per Core vs one Executor with multiple Core"的区别

python - 如何选择具有最大值的行的所有列

apache-spark - Spark Streaming 崩溃到 Kafka Ran out of messages before reaching ending offset exception

apache-spark - Spark SQL 插入选择问题

scala - Spark 数据集的正确单子(monad) flatMap 操作?

apache-spark - PySpark 计数在 RDD 中按组区分

scala - Spark 流 : error in fileStream()

scala - 我应该将 nullable 设置为 false 还是 true?

apache-spark - 将数据帧写入 Spark 集群上的文件的速度非常慢