java - Spark和MongoDB应用程序在Scala 2.10 maven构建错误

标签 java mongodb scala maven apache-spark

我想构建一个具有 Spark 和 MongoDB 的 Maven 依赖项的 Scala 应用程序。我使用的Scala版本是2.10。我的 pom 看起来像这样(省略了不相关的部分):

<properties>
    <maven.compiler.source>1.6</maven.compiler.source>
    <maven.compiler.target>1.6</maven.compiler.target>
    <encoding>UTF-8</encoding>
    <scala.tools.version>2.10</scala.tools.version>
    <!-- Put the Scala version of the cluster -->
    <scala.version>2.10.5</scala.version>
</properties>

<!-- repository to add org.apache.spark -->
<repositories>
    <repository>
        <id>cloudera-repo-releases</id>
        <url>https://repository.cloudera.com/artifactory/repo/</url>
    </repository>
</repositories>

<build>
    <sourceDirectory>src/main/scala</sourceDirectory>
    <testSourceDirectory>src/test/scala</testSourceDirectory>
    <!-- <pluginManagement>  -->    
        <plugins>
            <plugin>
                <!-- see http://davidb.github.com/scala-maven-plugin -->
                <groupId>net.alchim31.maven</groupId>
                <artifactId>scala-maven-plugin</artifactId>
                <version>3.1.3</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>compile</goal>
                            <goal>testCompile</goal>
                        </goals>
                        <configuration>
                            <args>
                                <arg>-make:transitive</arg>
                                <arg>-dependencyfile</arg>
                                <arg>${project.build.directory}/.scala_dependencies</arg>
                            </args>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-plugin</artifactId>
                <version>2.13</version>
                <configuration>
                    <useFile>false</useFile>
                    <disableXmlReport>true</disableXmlReport>
                    <!-- If you have classpath issue like NoDefClassError,... -->
                    <!-- useManifestOnlyJar>false</useManifestOnlyJar -->
                    <includes>
                        <include>**/*Test.*</include>
                        <include>**/*Suite.*</include>
                    </includes>
                </configuration>
            </plugin>

            <!-- "package" command plugin -->
            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>2.4.1</version>
                <configuration>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    <!-- </pluginManagement>  -->       
</build>

<dependencies>  
    <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-library</artifactId>
        <version>${scala.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.10</artifactId>
        <version>1.6.1</version>
    </dependency>
    <dependency>
        <groupId>org.mongodb.spark</groupId>
        <artifactId>mongo-spark-connector_2.10</artifactId>
        <version>1.1.0</version>

    </dependency>   
    <dependency>
        <groupId>org.mongodb.scala</groupId>
        <artifactId>mongo-scala-driver_2.11</artifactId>
        <version>1.1.1</version>
    </dependency>
</dependencies>

当我运行mvn clean assembly: assembly时,出现以下错误:

C:\Develop\workspace\SparkApplication>mvn clean assembly:assembly
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building SparkApplication 0.0.1-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ SparkApplication ---
[INFO] Deleting C:\Develop\workspace\SparkApplication\target
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building SparkApplication 0.0.1-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] >>> maven-assembly-plugin:2.4.1:assembly (default-cli) > package @ SparkA
pplication >>>
[INFO]
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ SparkAppli
cation ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory C:\Develop\workspace\SparkApplication
\src\main\resources
[INFO]
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ SparkApplicatio
n ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- scala-maven-plugin:3.1.3:compile (default) @ SparkApplication ---
[WARNING]  Expected all dependencies to require Scala version: 2.10.5
[WARNING]  xx.xxx.xxx:SparkApplication:0.0.1-SNAPSHOT requires scala version:
2.10.5
[WARNING]  com.twitter:chill_2.10:0.5.0 requires scala version: 2.10.4
[WARNING] Multiple versions of scala libraries detected!
[INFO] C:\Develop\workspace\SparkApplication\src\main\scala:-1: info: compiling
[INFO] Compiling 1 source files to C:\Develop\workspace\SparkApplication\target\
classes at 1477993255625
[INFO] No known dependencies. Compiling everything
[ERROR] error: bad symbolic reference. A signature in package.class refers to ty
pe compileTimeOnly
[INFO] in package scala.annotation which is not available.
[INFO] It may be completely missing from the current classpath, or the version o
n
[INFO] the classpath might be incompatible with the version used when compiling
package.class.
[ERROR] C:\Develop\workspace\SparkApplication\src\main\scala\com\examples\MainEx
ample.scala:33: error: Reference to method intWrapper in class LowPriorityImplic
its should not have survived past type checking,
[ERROR] it should have been processed and eliminated during expansion of an encl
osing macro.
[ERROR]     val count = sc.parallelize(1 to NUM_SAMPLES).map{i =>
[ERROR]                                ^
[ERROR] two errors found
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 10.363 s
[INFO] Finished at: 2016-11-01T10:40:58+01:00
[INFO] Final Memory: 20M/353M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.1.3:compi
le (default) on project SparkApplication: wrap: org.apache.commons.exec.ExecuteE
xception: Process exited with an error: 1(Exit value: 1) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e swit
ch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please rea
d the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionE
xception

仅在添加 mongo-scala-driver_2.11 依赖项时才会出现该错误。如果没有这种依赖关系,将构建 jar。我的代码当前是 Spark website 中的 Pi 估计示例:

val conf = new SparkConf()
            .setAppName("Cluster Application")
            //.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") 

val sc = new SparkContext(conf)


val count = sc.parallelize(1 to NUM_SAMPLES).map{i =>
  val x = Math.random()
  val y = Math.random()
  if (x*x + y*y < 1) 1 else 0
}.reduce(_ + _)
println("Pi is roughly " + 4.0 * count / NUM_SAMPLES)

我还尝试向每个元素添加以下标签,因为我在一些 github issue 中发现了这一点。但没有帮助。

    <exclusions>
        <exclusion>
            <!-- make sure wrong scala version is not pulled in -->
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
        </exclusion>
    </exclusions>         

如何解决这个问题? MongoDB Scala 驱动程序似乎是针对 Scala 2.11 构建的但 Spark 需要 Scala 2.10。

最佳答案

删除 Mongo Scala 驱动程序依赖项,它不是为 Scala 2.10 编译的,因此不兼容。

好消息是MongoDB Spark Connector是一个独立的连接器。它利用同步Mongo Java Driver因为Spark是为CPU密集型同步任务而设计的。它的设计遵循 Spark 习惯用法,并且是将 MongoDB 连接到 Spark 所需的全部内容。

另一方面,Mongo Scala Driver符合现代 Scala 惯例;所有IO都是完全异步的。这对于 Web 应用程序和提高单个计算机的可扩展性非常有用。

关于java - Spark和MongoDB应用程序在Scala 2.10 maven构建错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40357646/

相关文章:

java - % 运算符和 IEEEremainder() 方法之间的 Java 差异(如果有的话)?

java - DelegatingFilterProxy 的单独 Spring 上下文

java - 使用 IntelliJ IDEA 进行大型 Eclipse RTC 项目

java - Bouncy CasSTLe CMS 公钥加密

mongodb - MongoReplicaSetClient 不会在 celery 工作人员中重新连接

MongoDB 从聚合中排除字段,这可能吗?

python - MongoDB 连接是否可能在 Python 中超时?

java - 在 Java 中连接 Scala Seq

scala - 在 Scala 中重用函数结果的好方法是什么

java - 是否可以不安全地访问 JVM 未使用的对象?