spring - 将Spring Boot应用程序提交给Spark时出错

标签 spring spring-boot apache-spark gradle

我的Spark集群以独立模式运行。

我正在使用spark-submit将spring-boot应用程序部署到spark集群,并且遇到此错误:

我在spark / jar中删除了几个与spring-boot jar不兼容的jar,例如gson和servlet-api。

Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, 10.10.10.53, executor 0): java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD
        at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2287)
        at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1417)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2293)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)

...


我的命令:
bin/spark-submit \
--master spark://localhost:7077 \
path_to_jar/xxx.jar

我的build.gradle:


dependencies {
    compile fileTree(dir: 'libs', include: ['*.jar'])

    compile('org.springframework.boot:spring-boot-starter-web:2.1.3.RELEASE'){
        exclude module: 'logback-classic'
        exclude module: 'slf4j-log4j12'
    }
    compile('org.springframework.boot:spring-boot-starter-thymeleaf:2.1.3.RELEASE'){
            exclude module: 'logback-classic'
            exclude module: 'slf4j-log4j12'
    }
    compile('org.springframework.boot:spring-boot-configuration-processor:2.1.3.RELEASE')

    compile('com.google.code.gson:gson:2.8.5')
    compileOnly(group: 'org.apache.hadoop', name: 'hadoop-common', version: '2.7.7'){
        exclude module: 'servlet-api'
    }
    compileOnly(group: 'org.apache.spark', name: 'spark-core_2.12', version: '2.4.0')
    compileOnly(group: 'org.apache.spark', name: 'spark-mllib_2.12', version: '2.4.0')


}

SparkContext在spring-boot应用程序中自动接线。

SparkContextBean.java
@Configuration
public class SparkContextBean {
    @Autowired
    private SparkProperties sparkProperties;


    @Bean
    @ConditionalOnMissingBean(SparkConf.class)
    public SparkConf sparkConf(){
        SparkConf conf = new SparkConf().setAppName(sparkProperties.getAppname());

        return conf;
    }

    @Bean
    @ConditionalOnMissingBean(JavaSparkContext.class)
    public JavaSparkContext javaSparkContext() throws Exception {
        return new JavaSparkContext(sparkConf());
    }


}

Spark 代码:
//hsidata is a JavaPairRDD<Integer, short[][]> value
Tuple2<double[], double[]> mk = hsidata.mapToPair(pair -> {
    short[][] data = pair._2;
    return JTool.CalcMK(data);
}).reduce((right, left) -> {

    double[] mean = right._1;
    int bands = mean.length;
    double[] K = right._2;
    int n = bands * (bands + 1) / 2;
    for (int i = 0; i < bands; i++)
        mean[i] = mean[i] + left._1[i];

    for (int i = 0; i < n; i++)
        K[i] = K[i] + left._2[i];
    return new Tuple2<>(mean, K);
});

最佳答案

如果要在本地计算机上测试Spark应用程序独立模式,请将master设置为 local

bin/spark-submit --master local path_to_jar/xxx.jar

我怀疑您运行的Spark版本可能与构建jar时不同。请确保您的计算机上安装的Spark版本为2.4.0。

如果仍然遇到问题,请发布示例spark示例计算应用程序代码。

关于spring - 将Spring Boot应用程序提交给Spark时出错,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55812967/

相关文章:

java - 如何让 spring boot mvc 理解 observables?

pandas - 在 zeppelin 中将 pandas 数据帧转换为 spark 数据帧

scala - Elasticsearch + Spark:使用自定义文档_id编写json

java - 模拟restTemplate getForObject

java - 使用 RestHighLevelClient 的 Elasticsearch Spring 数据

java - 如何根据方法忽略字段?

java - 如何使用 List<Entity> 以外的多个属性在 Spring 中返回自定义响应

java - 在 Spring 中将 .jsp 映射到 Controller

java - 从计算引擎中的 docker 容器连接到 Cloud SQL

scala - Spark 标度 : select column name from other dataframe