scala - 无法在 GCP Dataproc 上运行 Spark Scala JAR

标签 scala maven apache-spark google-cloud-dataproc maven-shade-plugin

我在 Scala 中创建了一个简单的 Spark 应用程序,该应用程序在本地运行良好。我使用 Maven 作为构建工具,并使用shade 插件打包 JAR 文件。目录结构如下所示:

Project Structure

我正在使用以下 pom.xml :

<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>cl.aman.fund</groupId>
    <artifactId>sym_data_decryptor</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <packaging>jar</packaging>

    <properties>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
        <encoding>UTF-8</encoding>
        <scala.version>2.11.12</scala.version>
    </properties>

    <build>
        <sourceDirectory>src/main/scala</sourceDirectory>
        <testSourceDirectory>src/test/scala</testSourceDirectory>
        <resources>
            <resource>
                <directory>/src/resources/</directory>
                <filtering>false</filtering>
                <includes>
                    <include>hbase-site.xml</include>
                </includes>
            </resource>
        </resources>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-plugin</artifactId>
                <version>2.7</version>
                <configuration>
                    <skipTests>true</skipTests>
                </configuration>
            </plugin>
            <plugin>
                <!-- see http://davidb.github.com/scala-maven-plugin -->
                <groupId>net.alchim31.maven</groupId>
                <artifactId>scala-maven-plugin</artifactId>
                <version>4.3.0</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>compile</goal>
                            <goal>testCompile</goal>
                        </goals>
                        <configuration>
                            <args>
                                <!-- <arg>-make:transitive</arg> -->
                                <arg>-dependencyfile</arg>
                                <arg>${project.build.directory}/.scala_dependencies</arg>
                            </args>
                        </configuration>
                    </execution>
                </executions>
            </plugin>

            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>2.4.1</version>
                <configuration>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>

            <plugin>
                <groupId>org.scalatest</groupId>
                <artifactId>scalatest-maven-plugin</artifactId>
                <version>1.0</version>
                <configuration>
                    <reportsDirectory>${project.build.directory}/surefire-reports</reportsDirectory>
                    <junitxml>.</junitxml>
                    <filereports>WDF TestSuite.txt</filereports>
                </configuration>
                <executions>
                    <execution>
                        <id>test</id>
                        <goals>
                            <goal>test</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>

            <plugin>
                <groupId>org.jacoco</groupId>
                <artifactId>jacoco-maven-plugin</artifactId>
                <version>0.8.5</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>prepare-agent</goal>
                        </goals>
                    </execution>
                    <execution>
                        <id>report</id>
                        <phase>test</phase>
                        <goals>
                            <goal>report</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <artifactId>maven-shade-plugin</artifactId>
                <version>3.2.1</version>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <transformers>
                                <transformer
                                        implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"/>
                            </transformers>
                            <filters>
                                <filter>
                                    <artifact>*:*</artifact>
                                    <excludes>
                                        <exclude>META-INF/*.SF</exclude>
                                        <exclude>META-INF/*.DSA</exclude>
                                        <exclude>META-INF/*.RSA</exclude>
                                    </excludes>
                                </filter>
                            </filters>
                            <shadedArtifactAttached>true</shadedArtifactAttached>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

    <dependencies>
        <!-- Scala and Spark dependencies -->
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>${scala.version}</version>
        </dependency>
        <dependency>
            <groupId>com.typesafe.scala-logging</groupId>
            <artifactId>scala-logging_2.11</artifactId>
            <version>3.9.2</version>
        </dependency>
        <!-- Spark avro -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-avro_2.11</artifactId>
            <version>2.4.4</version>
        </dependency>
        <dependency>
            <groupId>com.google.cloud.bigdataoss</groupId>
            <artifactId>gcs-connector</artifactId>
            <version>hadoop2-1.9.17</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>2.4.4</version>
        </dependency>
        <dependency>
            <groupId>com.google.cloud.bigdataoss</groupId>
            <artifactId>bigquery-connector</artifactId>
            <version>hadoop2-0.13.9</version>
        </dependency>
        <dependency>
            <groupId>org.springframework.security</groupId>
            <artifactId>spring-security-core</artifactId>
            <version>5.2.1.RELEASE</version>
        </dependency>
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-xml</artifactId>
            <version>2.11.0-M4</version>
        </dependency>
        <dependency>
            <groupId>org.scalactic</groupId>
            <artifactId>scalactic_2.11</artifactId>
            <version>3.1.0</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>org.scalamock</groupId>
            <artifactId>scalamock-scalatest-support_2.11</artifactId>
            <version>3.6.0</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>org.json4s</groupId>
            <artifactId>json4s-native_2.11</artifactId>
            <version>3.6.6</version>
        </dependency>
        <dependency>
            <groupId>org.scalatest</groupId>
            <artifactId>scalatest_2.11</artifactId>
            <version>3.1.0</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>2.4.4</version>
            <type>test-jar</type>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>2.4.4</version>
            <type>test-jar</type>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-core</artifactId>
            <version>2.13.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-catalyst_2.11</artifactId>
            <version>2.4.4</version>
            <type>test-jar</type>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-api-scala_2.12</artifactId>
            <version>11.0</version>
        </dependency>
        <dependency>
            <groupId>com.typesafe</groupId>
            <artifactId>config</artifactId>
            <version>1.2.0</version>
        </dependency>
        <dependency>
            <groupId>org.scalacheck</groupId>
            <artifactId>scalacheck_2.11</artifactId>
            <version>1.14.2</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-avro_2.11</artifactId>
            <version>2.4.4</version>
            <type>test-jar</type>
            <scope>test</scope>
        </dependency>
    </dependencies>
</project>

用于在 Dataproc 集群上提交作业的命令:

gcloud dataproc jobs submit spark \
  --cluster <cluster_name> \
  --region <region> \
  --class cl.aman.symphony.commons.DecryptorApplication \
  --jars gs://<bucket_name>/sym_data_decryptor-0.0.1-SNAPSHOT-jar-with-dependencies.jar \
  -- \
  "my_Input"

错误:

Exception in thread "main" java.lang.NoSuchMethodError: org.json4s.Serialization$class.read(Lorg/json4s/Serialization;Ljava/lang/String;Lorg/json4s/Formats;Lscala/reflect/Manifest;)Ljava/lang/Object;
        at org.json4s.native.Serialization$.read(Serialization.scala:32)
        at cl.falabella.symphony.commons.DecryptorApplication$.makeRequest(DecryptorApplication.scala:44)
        at cl.falabella.symphony.commons.DecryptorApplication$.main(DecryptorApplication.scala:65)
        at cl.falabella.symphony.commons.DecryptorApplication.main(DecryptorApplication.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:890)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:192)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:217)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

有人可以帮我打包 Scala JAR 吗?

最佳答案

您正在使用 json4s-native_2.11 3.6.6 但 Spark 框架的核心 spark-core 2.4.4 使用 json4s在版本 3.5.3 中,这使得 json4s core 不兼容。我建议切换到 json4s 版本 3.5.3

关于scala - 无法在 GCP Dataproc 上运行 Spark Scala JAR,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61159177/

相关文章:

java - 如何在Maven中的资源中添加子文件夹

json - Graphx 中使用 Spark 的最短路径性能

hadoop - 如何在 Hadoop 中创建从 1..N 开始的迭代器?

java - 收到错误 : Route() in Route cannot be applied to String

scala - 'for-comprension' 内元组值的模式匹配?

scala - 从参数列表实例化案例类

scala - 如何在spark sql中执行多行sql

maven - Gradle 相当于 maven-versions-plugin

java - Cobertura 和 jetty

java - 尝试在 scala 中重写 getListCellRenderComponent 方法时出现对象创建不可能错误