apache-spark - 将 Spark 3.1.1 作为 hive 的 3.1.2 引擎运行时出错 ( java.lang.NoClassDefFoundError : org/apache/spark/unsafe/array/ByteArrayMethods )

标签 apache-spark hadoop hive hadoop-yarn

我在 ubuntu 20.4 上的 yarn 上运行 Spark
集群版本:

  • Hadoop 3.2.2
  • hive 3.1.2
  • Spark 3.1.1

  • 我已经给出了从 spark 的 jar 到 hive 的 lib 的符号链接(symbolic link):
    sudo ln -s $SPARK_HOME/jars/spark-network-common_2.12-3.1.1.jar $HIVE_HOME/lib/spark-network-common_2.12-3.1.1.jar
    sudo ln -s $SPARK_HOME/jars/spark-core_2.12-3.1.1.jar $HIVE_HOME/lib/spark-core_2.12-3.1.1.jar
    sudo ln -s $SPARK_HOME/jars/scala-library-2.12.10.jar $HIVE_HOME/lib/scala-library-2.12.10.jar
    
    当运行 hive 并设置 spark 时,我收到以下错误:
    Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session 57f08f6b-02b7-4c3d-bf8c-4ec351a5fd34)'
    2021-05-31T12:31:58,949 ERROR [a69d446a-f1a0-45d9-8dbc-c0fccbf718b3 main] spark.SparkTask: Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session 57f08f6b-02b7-4c3d-bf8c-4ec351a5fd34)'
    org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create Spark client for Spark session 57f08f6b-02b7-4c3d-bf8c-4ec351a5fd34
            at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.getHiveException(SparkSessionImpl.java:221)
            at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:92)
            at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:115)
            at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:136)
            at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:115)
            at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205)
            at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
            at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2664)
            at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2335)
            at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011)
            at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709)
            at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703)
            at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157)
            at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:218)
            at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239)
            at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
            at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402)
            at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
            at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
            at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:498)
            at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
            at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
    Caused by: java.lang.NoClassDefFoundError: org/apache/spark/unsafe/array/ByteArrayMethods
            at org.apache.spark.internal.config.package$.<init>(package.scala:1095)
            at org.apache.spark.internal.config.package$.<clinit>(package.scala)
            at org.apache.spark.SparkConf$.<init>(SparkConf.scala:654)
            at org.apache.spark.SparkConf$.<clinit>(SparkConf.scala)
            at org.apache.spark.SparkConf.set(SparkConf.scala:94)
            at org.apache.spark.SparkConf.set(SparkConf.scala:83)
            at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.generateSparkConf(HiveSparkClientFactory.java:265)
            at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:98)
            at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:76)
            at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:87)
            ... 24 more
    Caused by: java.lang.ClassNotFoundException: org.apache.spark.unsafe.array.ByteArrayMethods
            at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
            at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
            at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
            at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
            ... 34 more
    
    2021-05-31T12:31:58,950 ERROR [a69d446a-f1a0-45d9-8dbc-c0fccbf718b3 main] spark.SparkTask: Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session 57f08f6b-02b7-4c3d-bf8c-4ec351a5fd34)'
    org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create Spark client for Spark session 57f08f6b-02b7-4c3d-bf8c-4ec351a5fd34
            at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.getHiveException(SparkSessionImpl.java:221) ~[hive-exec-3.1.2.jar:3.1.2]
            at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:92) ~[hive-exec-3.1.2.jar:3.1.2]
            at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:115) ~[hive-exec-3.1.2.jar:3.1.2]
            at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:136) ~[hive-exec-3.1.2.jar:3.1.2]
            at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:115) ~[hive-exec-3.1.2.jar:3.1.2]
            at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205) ~[hive-exec-3.1.2.jar:3.1.2]
            at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) ~[hive-exec-3.1.2.jar:3.1.2]
            at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2664) ~[hive-exec-3.1.2.jar:3.1.2]
            at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2335) ~[hive-exec-3.1.2.jar:3.1.2]
            at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011) ~[hive-exec-3.1.2.jar:3.1.2]
            at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709) ~[hive-exec-3.1.2.jar:3.1.2]
            at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703) ~[hive-exec-3.1.2.jar:3.1.2]
            at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157) ~[hive-exec-3.1.2.jar:3.1.2]
            at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:218) ~[hive-exec-3.1.2.jar:3.1.2]
            at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239) ~[hive-cli-3.1.2.jar:3.1.2]
            at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188) ~[hive-cli-3.1.2.jar:3.1.2]
            at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402) ~[hive-cli-3.1.2.jar:3.1.2]
            at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821) ~[hive-cli-3.1.2.jar:3.1.2]
            at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) ~[hive-cli-3.1.2.jar:3.1.2]
            at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683) ~[hive-cli-3.1.2.jar:3.1.2]
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_292]
            at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_292]
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_292]
            at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_292]
            at org.apache.hadoop.util.RunJar.run(RunJar.java:323) ~[hadoop-common-3.2.2.jar:?]
            at org.apache.hadoop.util.RunJar.main(RunJar.java:236) ~[hadoop-common-3.2.2.jar:?]
    Caused by: java.lang.NoClassDefFoundError: org/apache/spark/unsafe/array/ByteArrayMethods
            at org.apache.spark.internal.config.package$.<init>(package.scala:1095) ~[spark-core_2.12-3.1.1.jar:3.1.1]
            at org.apache.spark.internal.config.package$.<clinit>(package.scala) ~[spark-core_2.12-3.1.1.jar:3.1.1]
            at org.apache.spark.SparkConf$.<init>(SparkConf.scala:654) ~[spark-core_2.12-3.1.1.jar:3.1.1]
            at org.apache.spark.SparkConf$.<clinit>(SparkConf.scala) ~[spark-core_2.12-3.1.1.jar:3.1.1]
            at org.apache.spark.SparkConf.set(SparkConf.scala:94) ~[spark-core_2.12-3.1.1.jar:3.1.1]
            at org.apache.spark.SparkConf.set(SparkConf.scala:83) ~[spark-core_2.12-3.1.1.jar:3.1.1]
            at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.generateSparkConf(HiveSparkClientFactory.java:265) ~[hive-exec-3.1.2.jar:3.1.2]
            at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:98) ~[hive-exec-3.1.2.jar:3.1.2]
            at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:76) ~[hive-exec-3.1.2.jar:3.1.2]
            at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:87) ~[hive-exec-3.1.2.jar:3.1.2]
            ... 24 more
    
    我已经下载了 spark 作为 hadoop 3.2.0 的预构建版本,后来在其中 spark jars 包含 hive 2.3.0 jars 而 hive 是 3.1.2 其中 hive 的 lib 包含 3.1.2 jars

    最佳答案

    Spark 上的 Hive 仅使用特定版本的 Spark 进行测试,因此仅保证给定版本的 Hive 可与特定版本的 Spark 一起使用。其他版本的 Spark 可能适用于给定版本的 Hive,但这不能保证。以下是 Hive 版本及其对应的兼容 Spark 版本的列表。
    请在 Hive Spark Compatibility Chart 中查找信息.
    enter image description here
    请使用 spark 2.3.0 并查看 POM.xml,其中包含以下内容。

    <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-unsafe_2.11</artifactId>
    <version>2.3.0</version>
    <scope>compile</scope>
    </dependency>
    

    关于apache-spark - 将 Spark 3.1.1 作为 hive 的 3.1.2 引擎运行时出错 ( java.lang.NoClassDefFoundError : org/apache/spark/unsafe/array/ByteArrayMethods ),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67773977/

    相关文章:

    使用配置在 R 中运行 Hive 查询

    eclipse - Gradle:在Eclipse中使用Apache Spark设置Scala项目

    python - 读取SAS文件以获取元信息

    r - 将RStudio与远程R机连接

    hadoop - Hive 真的使用 HCatalog 吗?

    java - hive 外壳未启动

    scala - 如何处理 Spark 和 Scala 中的异常

    apache-spark - 获取Spark作业的应用运行ID

    hadoop - 不使用 HDFS 时的数据局部性

    Hadoop 在运行 terasort 时崩溃了?