java - 仅在 Spark 中可见的 Shapeless 中的 NoSuchMethodError

标签 java scala apache-spark shapeless

我正在尝试编写一个 Spark 连接器来从 RabbitMQ 消息队列中提取 AVRO 消息。解码 AVRO 消息时,仅在 Spark 中运行时才会出现 NoSuchMethodError 错误。

我无法在 Spark 之外准确地重现 Spark 代码,但我相信这两个示例非常相似。我认为这是重现相同场景的最小代码。

我删除了所有连接参数,因为这些信息是私有(private)的,而且连接似乎不是问题。

Spark 代码:

package simpleexample

import org.apache.spark.SparkConf
import org.apache.spark.streaming.rabbitmq.distributed.RabbitMQDistributedKey
import org.apache.spark.streaming.rabbitmq.models.ExchangeAndRouting
import org.apache.spark.streaming.rabbitmq.RabbitMQUtils
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.storage.StorageLevel

import org.apache.kafka.clients.producer.{KafkaProducer, ProducerRecord}

import com.sksamuel.avro4s._

import java.io.{ByteArrayInputStream, ByteArrayOutputStream}
import com.rabbitmq.client.QueueingConsumer.Delivery
import java.util.HashMap


case class AttributeTuple(attrName: String, attrValue: String)

// AVRO Schema for Events

case class DeviceEvent(
    tenantName: String, 
    groupName: String, 
    subgroupName: String, 
    eventType: String, 
    eventSource: String,
    deviceTypeName: String,
    deviceId: Int,
    timestamp: Long,
    attribute: AttributeTuple
)

object RabbitMonitor {
  def main(args: Array[String]) {
    println("start")

    val sparkConf = new SparkConf().setMaster("local[2]").setAppName("RabbitMonitor")
    val ssc = new StreamingContext(sparkConf, Seconds(60))

    def parseArrayEvent(delivery: Delivery): Seq[DeviceEvent] = {
        val in = new ByteArrayInputStream(delivery.getBody())
        val input = AvroInputStream.binary[DeviceEvent](in)
        input.iterator.toSeq
    }

    val params: Map[String, String] = Map(
        /* many rabbit connection parameters */
        "maxReceiveTime" -> "60000" // 60s
      )

    val distributedKey = Seq(
        RabbitMQDistributedKey(
          /* queue name */, 
          new ExchangeAndRouting(/* exchange name */, /* routing key */),
          params
        )
    )

    var events = RabbitMQUtils.createDistributedStream[Seq[DeviceEvent]](ssc, distributedKey, params, parseArrayEvent)
    events.print()

    ssc.start()
    ssc.awaitTermination()
  }
}

非 Spark 代码:

package simpleexample

import com.thenewmotion.akka.rabbitmq._
import akka.actor._
// avoid name collision with rabbitmq channel
import scala.concurrent.{Channel => BasicChannel}
import scala.concurrent.ExecutionContext.Implicits.global

import com.sksamuel.avro4s._
import java.io.{ByteArrayInputStream, ByteArrayOutputStream}

object Test extends App {
    implicit val system = ActorSystem()

    val factory = new ConnectionFactory()
    /* Set connection parameters*/
    val exchange: String = /* exchange name */

    val connection: ActorRef = system.actorOf(ConnectionActor.props(factory), "rabbitmq")

    def setupSubscriber(channel: Channel, self: ActorRef) {
        val queue = channel.queueDeclare().getQueue
        channel.queueBind(queue, exchange, /* routing key */)
        val consumer = new DefaultConsumer(channel) {
          override def handleDelivery(consumerTag: String, envelope: Envelope, properties: BasicProperties, body: Array[Byte]) {
            val in = new ByteArrayInputStream(body)
            val input = AvroInputStream.binary[DeviceEvent](in)
            val result = input.iterator.toSeq
            println(result)
          }
        }

        channel.basicConsume(queue, true, consumer)
      }

    connection ! CreateChannel(ChannelActor.props(setupSubscriber), Some("eventSubscriber"))


    scala.concurrent.Future {
        def loop(n: Long) {
            Thread.sleep(1000)
            if (n < 30) {
                loop(n + 1)
            }
        }
        loop(0)
    }
}

非 Spark 输出(最后一行是成功解码的更新):

drex@drexThinkPad:~/src/scala/so-repro/connector/target/scala-2.11$ scala project.jar 
[INFO] [03/02/2017 14:11:06.899] [default-akka.actor.default-dispatcher-4] [akka://default/deadLetters] Message [com.thenewmotion.akka.rabbitmq.ChannelCreated] from Actor[akka://default/user/rabbitmq#-889215077] to Actor[akka://default/deadLetters] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
[INFO] [03/02/2017 14:11:07.337] [default-akka.actor.default-dispatcher-3] [akka://default/user/rabbitmq] akka://default/user/rabbitmq connected to amqp://<rabbit info>
[INFO] [03/02/2017 14:11:07.509] [default-akka.actor.default-dispatcher-4] [akka://default/user/rabbitmq/eventSubscriber] akka://default/user/rabbitmq/eventSubscriber connected
Stream(DeviceEvent(int,na,d01,deviceAttrUpdate,device,TestDeviceType,33554434,1488492704421,AttributeTuple(temperature,60)), ?)

Spark 输出:

drex@drexThinkPad:~/src/scala/so-repro/connector/target/scala-2.11$ spark-submit ./project.jar --class RabbitMonitor
start
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/03/02 14:20:15 INFO SparkContext: Running Spark version 2.1.0
17/03/02 14:20:15 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/03/02 14:20:16 WARN Utils: Your hostname, drexThinkPad resolves to a loopback address: 127.0.1.1; using 192.168.1.11 instead (on interface wlp3s0)
17/03/02 14:20:16 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
17/03/02 14:20:16 INFO SecurityManager: Changing view acls to: drex
17/03/02 14:20:16 INFO SecurityManager: Changing modify acls to: drex
17/03/02 14:20:16 INFO SecurityManager: Changing view acls groups to: 
17/03/02 14:20:16 INFO SecurityManager: Changing modify acls groups to: 
17/03/02 14:20:16 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(drex); groups with view permissions: Set(); users  with modify permissions: Set(drex); groups with modify permissions: Set()
17/03/02 14:20:16 INFO Utils: Successfully started service 'sparkDriver' on port 34701.
17/03/02 14:20:16 INFO SparkEnv: Registering MapOutputTracker
17/03/02 14:20:16 INFO SparkEnv: Registering BlockManagerMaster
17/03/02 14:20:16 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
17/03/02 14:20:16 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
17/03/02 14:20:16 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-5cbb13bf-78fe-4227-81b3-1afea40f899a
17/03/02 14:20:16 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
17/03/02 14:20:16 INFO SparkEnv: Registering OutputCommitCoordinator
17/03/02 14:20:16 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/03/02 14:20:16 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.1.11:4040
17/03/02 14:20:16 INFO SparkContext: Added JAR file:/home/drex/src/scala/so-repro/connector/target/scala-2.11/./project.jar at spark://192.168.1.11:34701/jars/project.jar with timestamp 1488493216614
17/03/02 14:20:16 INFO Executor: Starting executor ID driver on host localhost
17/03/02 14:20:16 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 33276.
17/03/02 14:20:16 INFO NettyBlockTransferService: Server created on 192.168.1.11:33276
17/03/02 14:20:16 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
17/03/02 14:20:16 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.1.11, 33276, None)
17/03/02 14:20:16 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.11:33276 with 366.3 MB RAM, BlockManagerId(driver, 192.168.1.11, 33276, None)
17/03/02 14:20:16 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.1.11, 33276, None)
17/03/02 14:20:16 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.1.11, 33276, None)
17/03/02 14:20:17 INFO RabbitMQDStream: Duration for remembering RDDs set to 60000 ms for org.apache.spark.streaming.rabbitmq.distributed.RabbitMQDStream@546621c4
17/03/02 14:20:17 INFO RabbitMQDStream: Slide time = 60000 ms
17/03/02 14:20:17 INFO RabbitMQDStream: Storage level = Memory Deserialized 1x Replicated
17/03/02 14:20:17 INFO RabbitMQDStream: Checkpoint interval = null
17/03/02 14:20:17 INFO RabbitMQDStream: Remember interval = 60000 ms
17/03/02 14:20:17 INFO RabbitMQDStream: Initialized and validated org.apache.spark.streaming.rabbitmq.distributed.RabbitMQDStream@546621c4
17/03/02 14:20:17 INFO ForEachDStream: Slide time = 60000 ms
17/03/02 14:20:17 INFO ForEachDStream: Storage level = Serialized 1x Replicated
17/03/02 14:20:17 INFO ForEachDStream: Checkpoint interval = null
17/03/02 14:20:17 INFO ForEachDStream: Remember interval = 60000 ms
17/03/02 14:20:17 INFO ForEachDStream: Initialized and validated org.apache.spark.streaming.dstream.ForEachDStream@49c6ddef
17/03/02 14:20:17 INFO RecurringTimer: Started timer for JobGenerator at time 1488493260000
17/03/02 14:20:17 INFO JobGenerator: Started JobGenerator at 1488493260000 ms
17/03/02 14:20:17 INFO JobScheduler: Started JobScheduler
17/03/02 14:20:17 INFO StreamingContext: StreamingContext started
17/03/02 14:21:00 INFO JobScheduler: Added jobs for time 1488493260000 ms
17/03/02 14:21:00 INFO JobScheduler: Starting job streaming job 1488493260000 ms.0 from job set of time 1488493260000 ms
17/03/02 14:21:00 INFO SparkContext: Starting job: print at RabbitMonitor.scala:94
17/03/02 14:21:00 INFO DAGScheduler: Got job 0 (print at RabbitMonitor.scala:94) with 1 output partitions
17/03/02 14:21:00 INFO DAGScheduler: Final stage: ResultStage 0 (print at RabbitMonitor.scala:94)
17/03/02 14:21:00 INFO DAGScheduler: Parents of final stage: List()
17/03/02 14:21:00 INFO DAGScheduler: Missing parents: List()
17/03/02 14:21:00 INFO DAGScheduler: Submitting ResultStage 0 (RabbitMQRDD[0] at createDistributedStream at RabbitMonitor.scala:93), which has no missing parents
17/03/02 14:21:00 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 2.7 KB, free 366.3 MB)
17/03/02 14:21:00 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1752.0 B, free 366.3 MB)
17/03/02 14:21:00 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.11:33276 (size: 1752.0 B, free: 366.3 MB)
17/03/02 14:21:00 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:996
17/03/02 14:21:00 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (RabbitMQRDD[0] at createDistributedStream at RabbitMonitor.scala:93)
17/03/02 14:21:00 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
17/03/02 14:21:00 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, ANY, 7744 bytes)
17/03/02 14:21:00 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
17/03/02 14:21:00 INFO Executor: Fetching spark://192.168.1.11:34701/jars/project.jar with timestamp 1488493216614
17/03/02 14:21:00 INFO TransportClientFactory: Successfully created connection to /192.168.1.11:34701 after 23 ms (0 ms spent in bootstraps)
17/03/02 14:21:00 INFO Utils: Fetching spark://192.168.1.11:34701/jars/project.jar to /tmp/spark-92b6ff6a-b120-4fd0-ba46-a450eff80636/userFiles-c0a334f3-68fc-495f-8ccd-cfe90e6d0bf8/fetchFileTemp2710654534934784726.tmp
17/03/02 14:21:00 INFO Executor: Adding file:/tmp/spark-92b6ff6a-b120-4fd0-ba46-a450eff80636/userFiles-c0a334f3-68fc-495f-8ccd-cfe90e6d0bf8/project.jar to class loader
<removing rabbit queue connection parameters>
17/03/02 14:21:02 INFO RabbitMQRDD: Receiving data in Partition 0 from  
</removing rabbit queue connection parameters>
17/03/02 14:21:50 WARN BlockManager: Putting block rdd_0_0 failed due to an exception
17/03/02 14:21:50 WARN BlockManager: Block rdd_0_0 could not be removed as it was not found on disk or in memory
17/03/02 14:21:50 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.NoSuchMethodError: shapeless.Lazy.map(Lscala/Function1;)Lshapeless/Lazy;
    at com.sksamuel.avro4s.SchemaFor$.recordBuilder(SchemaFor.scala:447)
    at simpleexample.RabbitMonitor$$anon$3.<init>(RabbitMonitor.scala:70)
    at simpleexample.RabbitMonitor$.simpleexample$RabbitMonitor$$parseArrayEvent$1(RabbitMonitor.scala:70)
    at simpleexample.RabbitMonitor$$anonfun$15.apply(RabbitMonitor.scala:93)
    at simpleexample.RabbitMonitor$$anonfun$15.apply(RabbitMonitor.scala:93)
    at org.apache.spark.streaming.rabbitmq.distributed.RabbitMQRDD$RabbitMQRDDIterator$$anonfun$5.apply(RabbitMQRDD.scala:209)
    at scala.util.Try$.apply(Try.scala:192)
    at org.apache.spark.streaming.rabbitmq.distributed.RabbitMQRDD$RabbitMQRDDIterator.processDelivery(RabbitMQRDD.scala:209)
    at org.apache.spark.streaming.rabbitmq.distributed.RabbitMQRDD$RabbitMQRDDIterator.getNext(RabbitMQRDD.scala:194)
    at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
    at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:215)
    at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:957)
    at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948)
    at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888)
    at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
    at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
    at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:99)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
17/03/02 14:21:50 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.lang.NoSuchMethodError: shapeless.Lazy.map(Lscala/Function1;)Lshapeless/Lazy;
    at com.sksamuel.avro4s.SchemaFor$.recordBuilder(SchemaFor.scala:447)
    at simpleexample.RabbitMonitor$$anon$3.<init>(RabbitMonitor.scala:70)
    at simpleexample.RabbitMonitor$.simpleexample$RabbitMonitor$$parseArrayEvent$1(RabbitMonitor.scala:70)
    at simpleexample.RabbitMonitor$$anonfun$15.apply(RabbitMonitor.scala:93)
    at simpleexample.RabbitMonitor$$anonfun$15.apply(RabbitMonitor.scala:93)
    at org.apache.spark.streaming.rabbitmq.distributed.RabbitMQRDD$RabbitMQRDDIterator$$anonfun$5.apply(RabbitMQRDD.scala:209)
    at scala.util.Try$.apply(Try.scala:192)
    at org.apache.spark.streaming.rabbitmq.distributed.RabbitMQRDD$RabbitMQRDDIterator.processDelivery(RabbitMQRDD.scala:209)
    at org.apache.spark.streaming.rabbitmq.distributed.RabbitMQRDD$RabbitMQRDDIterator.getNext(RabbitMQRDD.scala:194)
    at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
    at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:215)
    at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:957)
    at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948)
    at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888)
    at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
    at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
    at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:99)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

17/03/02 14:21:50 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
17/03/02 14:21:50 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
17/03/02 14:21:50 INFO TaskSchedulerImpl: Cancelling stage 0

构建.sbt:

retrieveManaged := true

lazy val sparkVersion = "2.1.0"

scalaVersion in ThisBuild := "2.11.8"

lazy val rabbit = (project in file("rabbit-plugin")).settings(
    name := "Spark Streaming RabbitMQ Receiver",
    homepage := Some(url("https://github.com/Stratio/RabbitMQ-Receiver")),
    description := "RabbitMQ-Receiver is a library that allows the user to read data with Apache Spark from RabbitMQ.",
    exportJars := true,

    assemblyJarName in assembly := "rabbit.jar",
    test in assembly := {},

    moduleName := "spark-rabbitmq",
    organization := "com.stratio.receive",
    version := "0.6.0",

    libraryDependencies ++= Seq(
        "org.apache.spark" %% "spark-core" % sparkVersion % "provided",
        "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided", 
        "org.apache.spark" %% "spark-sql" % sparkVersion % "provided", 
        "com.typesafe.akka" %% "akka-actor" % "2.4.11",
        "com.rabbitmq" % "amqp-client" % "3.6.6",
        "joda-time" % "joda-time" % "2.8.2",
        "com.github.sstone" %% "amqp-client" % "1.5" % Test,
        "org.scalatest" %% "scalatest" % "2.2.2" % Test,
        "org.scalacheck" %% "scalacheck" % "1.11.3" % Test,
        "junit" % "junit" % "4.12" % Test, 
        "com.typesafe.akka" %% "akka-testkit" % "2.4.11" % Test
        ) 
)

lazy val root = (project in file("connector")).settings(
    name := "Connector from Rabbit to Kafka queue",
    description := "",
    exportJars := true,

    test in assembly := {},
    assemblyJarName in assembly := "project.jar",

    libraryDependencies ++= Seq(
        "org.apache.spark" %% "spark-core" % sparkVersion % "provided",
        "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
        "com.thenewmotion" %% "akka-rabbitmq" % "3.0.0",
        "org.apache.kafka" % "kafka_2.10" % "0.10.1.1",
        "com.sksamuel.avro4s" %% "avro4s-core" % "1.6.4"
        ) 
) dependsOn rabbit

我还使用 assembly 为 Spark 组合了一个“fat jar”(addSbtPlugin("com.eed3si9n"% "sbt-assemble"% "0.14.4")),并使用命令 sbt assembly 生成上面两个示例中使用的 jar。我正在运行 Spark 2.1.0。

我对 Spark/Scala 生态系统比较陌生,所以希望这是我的build设置的问题。在 Spark 中不使用 Shapeless 是没有意义的。

最佳答案

我自己也有同样的问题。我只是为其他面临它的人添加更多详细信息。

错误

一切正常,直到我部署到集群。然后我得到

Exception in thread "main" java.lang.NoSuchMethodError: 'shapeless.DefaultSymbolicLabelling shapeless.DefaultSymbolicLabelling$.instance(shapeless.HList)'

根本原因

根据堆栈跟踪,我知道它与 circe 库有关。然后我运行依赖项(确保您的 ~/.sbt/1.0/plugins/plugins.sbt 文件中有 addDependencyTreePlugin):

❯ sbt "whatDependsOn com.chuusai shapeless_2.12"
[info] welcome to sbt 1.6.2 (Amazon.com Inc. Java 1.8.0_332)
[info] com.chuusai:shapeless_2.12:2.3.7 [S]
[info]   +-io.circe:circe-generic_2.12:0.14.1 [S]
[info]     +-***

但是如果我在“提供的”范围内运行依赖项,我会得到:

❯ sbt provided:"whatDependsOn com.chuusai shapeless_2.12"
[info] welcome to sbt 1.6.2 (Amazon.com Inc. Java 1.8.0_332)
[info] com.chuusai:shapeless_2.12:2.3.3 [S]
[info]   +-org.scalanlp:breeze_2.12:1.0 [S]
[info]     +-org.apache.spark:spark-mllib-local_2.12:3.1.3
[info]     | +-org.apache.spark:spark-graphx_2.12:3.1.3
[info]     | | +-org.apache.spark:spark-mllib_2.12:3.1.3
[info]     | |   +-***
[info]     | |
[info]     | +-org.apache.spark:spark-mllib_2.12:3.1.3
[info]     |   +-***
[info]     |
[info]     +-org.apache.spark:spark-mllib_2.12:3.1.3
[info]       +-***

如您所见,2.3.7 版本中的 instance 函数在 2.3.3 版本中不存在(在 2.3.5 版本中添加):

不起作用

添加依赖项并没有解决我的问题。

val CirceVersion = "0.14.1"
val ShapelessVersion = "2.3.7" // Circe 0.14.1 uses 2.3.7; Spark 3.1.3 uses 2.3.3
val SparkVersion = "3.1.3"

lazy val CirceDeps: Seq[ModuleID] = Seq(
    "io.circe" %% "circe-generic" % CirceVersion,
    /* Shapeless is one of the Spark dependencies. As Spark is provided, it is not included in the uber jar.
     * Adding the dependency explicitly to make sure we have the correct version at run-time
     */
    "com.chuusai" %% "shapeless" % ShapelessVersion
  )

我将其保留在我的代码中仅用于文档目的。

什么有效

主要修复实际上是重命名 Shapeless 库(请参阅我的评论)the question that I pick the answer

/** Shapeless is one of the Spark dependencies. At run-time, they clash and Spark's shapeless package takes
  * precedence. It results run-time error as shapeless 2.3.7 and 2.3.3 are not fully compatible.
  * Here, we are are renaming the library so they co-exist in run-time and Spark uses its own version and Circe also
  * uses its own version.
  */
// noinspection SbtDependencyVersionInspection
lazy val shadingRules: Def.Setting[Seq[ShadeRule]] =
  assembly / assemblyShadeRules := Seq(
    ShadeRule
      .rename("shapeless.**" -> "shadeshapless.@1")
      .inLibrary("com.chuusai" % "shapeless_2.12" % Dependencies.ShapelessVersion)
      .inProject,
    ShadeRule
      .rename("shapeless.**" -> "shadeshapless.@1")
      .inLibrary("io.circe" % "circe-generic_2.12" % Dependencies.CirceVersion)
      .inProject
  )

更新2022年8月20日

根据@denis-arnaud 评论,这是 pureconfig 的更简单版本

assembly / assemblyShadeRules := Seq(ShadeRule.rename("shapeless.**" -> "new_shapeless.@1").inAll)

我想简单的方法适用于大多数情况。当类路径中有不同版本的 shapeless,并且您想在 @1@2 等中重命名它们时,更复杂的版本会很有用。

关于java - 仅在 Spark 中可见的 Shapeless 中的 NoSuchMethodError,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42568010/

相关文章:

sql - 每组的百分比计数和使用 pyspark 的枢轴

java - 连接数据库时出现 SQlite 错误

java - android更改选择器和选项卡下划线的默认颜色

scala - 尝试编译播放模板项目时 Unresolved 依赖项 : sbt-plugin;2. 7.0

scala - 方法参数中具有抽象类型的特征

scala - Spark SQL 将数据集转换为数据框

java - 非公共(public)顶级类与静态嵌套类

java - Android:Java 是否对许多类效率较低

json - 从 anyContent 获取内容为 Json

python - 将 python 代码转换为 python Spark 代码