scala - NoClassDefFound错误: org/apache/spark/sql/SparkSession$ while running spark source code locally

标签 scala maven apache-spark intellij-idea

我已在本地计算机中克隆了 Spark 项目,并使用以下命令构建了 Spark 项目,构建成功。

mvn -DskipTests clean package

我在IntelliJ idea中将spark项目导入为maven项目。 我在项目中将 Scala 2.12.10 设置为全局库

但是当我尝试运行示例模块中的任何示例程序时,我收到以下错误。 我猜这与 Scala 编译有关,请帮我理解这里发生了什么?

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/SparkSession$
    at org.apache.spark.examples.GroupByTest$.main(GroupByTest.scala:30)
    at org.apache.spark.examples.GroupByTest.main(GroupByTest.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.SparkSession$
    at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
    ... 2 more

分享我正在运行的示例 Spark 代码

/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *    http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

// scalastyle:off println
package org.apache.spark.examples

import java.util.Random

import org.apache.spark.sql.SparkSession

/**
 * Usage: GroupByTest [numMappers] [numKVPairs] [KeySize] [numReducers]
 */
object GroupByTest {
  def main(args: Array[String]): Unit = {
    val spark = SparkSession
      .builder
      .appName("GroupBy Test")
      .getOrCreate()

    val numMappers = if (args.length > 0) args(0).toInt else 2
    val numKVPairs = if (args.length > 1) args(1).toInt else 1000
    val valSize = if (args.length > 2) args(2).toInt else 1000
    val numReducers = if (args.length > 3) args(3).toInt else numMappers

    val pairs1 = spark.sparkContext.parallelize(0 until numMappers, numMappers).flatMap { p =>
      val ranGen = new Random
      val arr1 = new Array[(Int, Array[Byte])](numKVPairs)
      for (i <- 0 until numKVPairs) {
        val byteArr = new Array[Byte](valSize)
        ranGen.nextBytes(byteArr)
        arr1(i) = (ranGen.nextInt(Int.MaxValue), byteArr)
      }
      arr1
    }.cache()
    // Enforce that everything has been calculated and in cache
    pairs1.count()

    println(pairs1.groupByKey(numReducers).count())

    spark.stop()
  }
}
// scalastyle:on println

最佳答案

SparkSession 是 spark-sql 的一部分。因此,您需要将此库的 provided 范围更改为 compile:

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_${scala.binary.version}</artifactId>
    <version>${spark.version}</version>
    <scope>compile</compile>
</dependency>

关于scala - NoClassDefFound错误: org/apache/spark/sql/SparkSession$ while running spark source code locally,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60791902/

相关文章:

scala - 使用 Scala 中的选项(最佳实践)

json - 为 Play Json 库中的单例提供隐式值

java - 类型安全条件查询,JPA 2.0 的新特性 : Path unknown?

maven - 如何避免使用 'mvn site' 运行 surefire 单元测试

java - PMD故障: ILogin:73 Rule:ConstantsInInterface Priority:3 Avoid constants in interfaces

apache-spark - 如何解决向 Hive 表发送大文件时的连接问题?

scala - 如何从 Scala 的 Glue Job 中的 S3 文件创建动态数据框?

java - 使用 FastParse 解析缩进

apache-spark - 如何知道 PySpark 应用程序的部署模式?

python - Apache Spark Streaming 不读取目录