我在 ubuntu 20.4 上的 yarn 上运行 Spark
集群版本:
我已经给出了从 spark 的 jar 到 hive 的 lib 的符号链接(symbolic link):
sudo ln -s $SPARK_HOME/jars/spark-network-common_2.12-3.1.1.jar $HIVE_HOME/lib/spark-network-common_2.12-3.1.1.jar
sudo ln -s $SPARK_HOME/jars/spark-core_2.12-3.1.1.jar $HIVE_HOME/lib/spark-core_2.12-3.1.1.jar
sudo ln -s $SPARK_HOME/jars/scala-library-2.12.10.jar $HIVE_HOME/lib/scala-library-2.12.10.jar
当运行 hive 并设置 spark 时,我收到以下错误:Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session 57f08f6b-02b7-4c3d-bf8c-4ec351a5fd34)'
2021-05-31T12:31:58,949 ERROR [a69d446a-f1a0-45d9-8dbc-c0fccbf718b3 main] spark.SparkTask: Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session 57f08f6b-02b7-4c3d-bf8c-4ec351a5fd34)'
org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create Spark client for Spark session 57f08f6b-02b7-4c3d-bf8c-4ec351a5fd34
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.getHiveException(SparkSessionImpl.java:221)
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:92)
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:115)
at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:136)
at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:115)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2664)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2335)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703)
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157)
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:218)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
Caused by: java.lang.NoClassDefFoundError: org/apache/spark/unsafe/array/ByteArrayMethods
at org.apache.spark.internal.config.package$.<init>(package.scala:1095)
at org.apache.spark.internal.config.package$.<clinit>(package.scala)
at org.apache.spark.SparkConf$.<init>(SparkConf.scala:654)
at org.apache.spark.SparkConf$.<clinit>(SparkConf.scala)
at org.apache.spark.SparkConf.set(SparkConf.scala:94)
at org.apache.spark.SparkConf.set(SparkConf.scala:83)
at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.generateSparkConf(HiveSparkClientFactory.java:265)
at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:98)
at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:76)
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:87)
... 24 more
Caused by: java.lang.ClassNotFoundException: org.apache.spark.unsafe.array.ByteArrayMethods
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 34 more
2021-05-31T12:31:58,950 ERROR [a69d446a-f1a0-45d9-8dbc-c0fccbf718b3 main] spark.SparkTask: Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session 57f08f6b-02b7-4c3d-bf8c-4ec351a5fd34)'
org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create Spark client for Spark session 57f08f6b-02b7-4c3d-bf8c-4ec351a5fd34
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.getHiveException(SparkSessionImpl.java:221) ~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:92) ~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:115) ~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:136) ~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:115) ~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205) ~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) ~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2664) ~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2335) ~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011) ~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709) ~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703) ~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157) ~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:218) ~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239) ~[hive-cli-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188) ~[hive-cli-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402) ~[hive-cli-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821) ~[hive-cli-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) ~[hive-cli-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683) ~[hive-cli-3.1.2.jar:3.1.2]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_292]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_292]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_292]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_292]
at org.apache.hadoop.util.RunJar.run(RunJar.java:323) ~[hadoop-common-3.2.2.jar:?]
at org.apache.hadoop.util.RunJar.main(RunJar.java:236) ~[hadoop-common-3.2.2.jar:?]
Caused by: java.lang.NoClassDefFoundError: org/apache/spark/unsafe/array/ByteArrayMethods
at org.apache.spark.internal.config.package$.<init>(package.scala:1095) ~[spark-core_2.12-3.1.1.jar:3.1.1]
at org.apache.spark.internal.config.package$.<clinit>(package.scala) ~[spark-core_2.12-3.1.1.jar:3.1.1]
at org.apache.spark.SparkConf$.<init>(SparkConf.scala:654) ~[spark-core_2.12-3.1.1.jar:3.1.1]
at org.apache.spark.SparkConf$.<clinit>(SparkConf.scala) ~[spark-core_2.12-3.1.1.jar:3.1.1]
at org.apache.spark.SparkConf.set(SparkConf.scala:94) ~[spark-core_2.12-3.1.1.jar:3.1.1]
at org.apache.spark.SparkConf.set(SparkConf.scala:83) ~[spark-core_2.12-3.1.1.jar:3.1.1]
at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.generateSparkConf(HiveSparkClientFactory.java:265) ~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:98) ~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:76) ~[hive-exec-3.1.2.jar:3.1.2]
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:87) ~[hive-exec-3.1.2.jar:3.1.2]
... 24 more
我已经下载了 spark 作为 hadoop 3.2.0 的预构建版本,后来在其中 spark jars 包含 hive 2.3.0 jars 而 hive 是 3.1.2 其中 hive 的 lib 包含 3.1.2 jars
最佳答案
Spark 上的 Hive 仅使用特定版本的 Spark 进行测试,因此仅保证给定版本的 Hive 可与特定版本的 Spark 一起使用。其他版本的 Spark 可能适用于给定版本的 Hive,但这不能保证。以下是 Hive 版本及其对应的兼容 Spark 版本的列表。
请在 Hive Spark Compatibility Chart 中查找信息.
请使用 spark 2.3.0
并查看 POM.xml,其中包含以下内容。
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-unsafe_2.11</artifactId>
<version>2.3.0</version>
<scope>compile</scope>
</dependency>
关于apache-spark - 将 Spark 3.1.1 作为 hive 的 3.1.2 引擎运行时出错 ( java.lang.NoClassDefFoundError : org/apache/spark/unsafe/array/ByteArrayMethods ),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67773977/