scala - 如何选择由不同的SBT库依赖项添加的特定版本的程序包

标签 scala apache-spark hadoop hive sbt

我正在使用SBT 1.3.10在Scala 2.11上开发Apache Spark应用程序。我在未安装Spark / Hadoop / Hive的本地计算机上使用IDE,而是将它们添加为SBT依赖项(Hadoop 3.1.2,Spark 2.4.5,Hive 3.1.2)。我的SBT如下:

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-sql" % "2.4.5",
  "org.apache.hadoop" % "hadoop-client" % "3.1.2",

  "com.fasterxml.jackson.core" % "jackson-core" % "2.9.10",
  "com.fasterxml.jackson.core" % "jackson-databind" % "2.9.10",
  "com.fasterxml.jackson.module" %% "jackson-module-scala" % "2.9.10",


  // about these two later in the question
  "org.apache.hive" % "hive-exec" % "3.1.2",
  "org.apache.commons" % "commons-lang3" % "3.6"
)

在我的应用程序中,我正在使用提供的模式将示例CSV文件读取到DataFrame中:
        val init = spark.read
          .format("csv")
          .option("header", value = false)
          .schema(sampleCsvSchema)
          .load("src/main/resources/sample.csv")

        init.show(10, false)

在某个时刻,我不得不添加org.apache.hive:hive-exec:3.1.2依赖项,并在执行过程中遇到异常:
Illegal pattern component: XXX
java.lang.IllegalArgumentException: Illegal pattern component: XXX
    at org.apache.commons.lang3.time.FastDatePrinter.parsePattern(FastDatePrinter.java:282)
    at org.apache.commons.lang3.time.FastDatePrinter.init(FastDatePrinter.java:149)
    at org.apache.commons.lang3.time.FastDatePrinter.<init>(FastDatePrinter.java:142)
    at org.apache.commons.lang3.time.FastDateFormat.<init>(FastDateFormat.java:369)
    at org.apache.commons.lang3.time.FastDateFormat$1.createInstance(FastDateFormat.java:91)
    at org.apache.commons.lang3.time.FastDateFormat$1.createInstance(FastDateFormat.java:88)
    at org.apache.commons.lang3.time.FormatCache.getInstance(FormatCache.java:82)
    at org.apache.commons.lang3.time.FastDateFormat.getInstance(FastDateFormat.java:165)
    at org.apache.spark.sql.execution.datasources.csv.CSVOptions.<init>(CSVOptions.scala:139)
    at org.apache.spark.sql.execution.datasources.csv.CSVOptions.<init>(CSVOptions.scala:41)
    ...

它说org.apache.commons.lang3.time.FastDatePrinter.parsePattern()无法解析默认设置为org.apache.spark.sql.execution.datasources.csv.CSVOptions.timestampFormat的 Spark 时间戳格式("yyyy-MM-dd'T'HH:mm:ss.SSSXXX")。 (请注意,我的sample.csv没有任何时间戳记数据,但是无论如何Spark都要经过这堆程序)。

最初,org.apache.commons.lang3.time.FastDatePrinter通过org.apache.commons:commons-lang3:3.6依赖项添加到项目中,并且工作正常。但是,org.apache.hive:hive-exec:3.1.2库添加了自己的指定包和类的实现,该实现无法解析"XXX"(并且不能排除,因为它是在库本身内部实现的)。

enter image description here
因此,我遇到一种情况,其中2个库依赖项提供了同一包的2种实现,并且我需要在应用程序执行期间选择其中一个特定的实现。如何做到这一点?

P.S. I've found a workaround for this specific "java.lang.IllegalArgumentException: Illegal pattern component: XXX" issue, but I'm more interested in how to resolve such SBT dependencies issues in general.

最佳答案

在版本依赖性冲突的情况下,我通常-

  • Exclude certain transitive dependencies of a dependency
    ref-https://www.scala-sbt.org/1.x/docs/Library-Management.html#Exclude+Transitive+Dependencies
  • For binary conflicts like mentioned in above query, I use dependencyOverrides and override the version I want.
    ref- https://www.scala-sbt.org/1.x/docs/Library-Management.html#Overriding+a+version
  • rarely, if the problem doesn't solve with above 2 options, then i'll rebuild the my own version with the compatible transitive dependency

  • PS。请注意其他类型的 hive 执行程序(如果有),这将使您免受此类情况的影响

    关于scala - 如何选择由不同的SBT库依赖项添加的特定版本的程序包,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61935587/

    相关文章:

    scala - 对象流不是包 org.apache.spark 的成员

    scala - Akka 流 - 在失败后使用广播和 zip 恢复图形

    scala - 如何从json字符串中提取值?

    string - Scala 实习 : how does different initialisation affect comparison?

    scala - Spark 在 Yarn 集群上运行 exitCode=13 :

    Azure Synapse Spark 池提交失败,错误消息为 "SparkJobDefinitionActionFailed"

    hadoop - CDH伪集群启动Jobtracker和Tasktracker失败

    hadoop - Hadoop 中的粘性位

    hadoop - WebHdfsFileSystem本地ip与网络ip hadoop

    scala - Apache Spark 错误: not found: value sqlContext