scala - 如何选择由不同的SBT库依赖项添加的特定版本的程序包

我正在使用SBT 1.3.10在Scala 2.11上开发Apache Spark应用程序。我在未安装Spark / Hadoop / Hive的本地计算机上使用IDE，而是将它们添加为SBT依赖项(Hadoop 3.1.2，Spark 2.4.5，Hive 3.1.2)。我的SBT如下:

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-sql" % "2.4.5",
  "org.apache.hadoop" % "hadoop-client" % "3.1.2",

  "com.fasterxml.jackson.core" % "jackson-core" % "2.9.10",
  "com.fasterxml.jackson.core" % "jackson-databind" % "2.9.10",
  "com.fasterxml.jackson.module" %% "jackson-module-scala" % "2.9.10",


  // about these two later in the question
  "org.apache.hive" % "hive-exec" % "3.1.2",
  "org.apache.commons" % "commons-lang3" % "3.6"
)

在我的应用程序中，我正在使用提供的模式将示例CSV文件读取到DataFrame中:

        val init = spark.read
          .format("csv")
          .option("header", value = false)
          .schema(sampleCsvSchema)
          .load("src/main/resources/sample.csv")

        init.show(10, false)

在某个时刻，我不得不添加org.apache.hive:hive-exec:3.1.2依赖项，并在执行过程中遇到异常:

Illegal pattern component: XXX
java.lang.IllegalArgumentException: Illegal pattern component: XXX
    at org.apache.commons.lang3.time.FastDatePrinter.parsePattern(FastDatePrinter.java:282)
    at org.apache.commons.lang3.time.FastDatePrinter.init(FastDatePrinter.java:149)
    at org.apache.commons.lang3.time.FastDatePrinter.<init>(FastDatePrinter.java:142)
    at org.apache.commons.lang3.time.FastDateFormat.<init>(FastDateFormat.java:369)
    at org.apache.commons.lang3.time.FastDateFormat$1.createInstance(FastDateFormat.java:91)
    at org.apache.commons.lang3.time.FastDateFormat$1.createInstance(FastDateFormat.java:88)
    at org.apache.commons.lang3.time.FormatCache.getInstance(FormatCache.java:82)
    at org.apache.commons.lang3.time.FastDateFormat.getInstance(FastDateFormat.java:165)
    at org.apache.spark.sql.execution.datasources.csv.CSVOptions.<init>(CSVOptions.scala:139)
    at org.apache.spark.sql.execution.datasources.csv.CSVOptions.<init>(CSVOptions.scala:41)
    ...

它说org.apache.commons.lang3.time.FastDatePrinter.parsePattern()无法解析默认设置为org.apache.spark.sql.execution.datasources.csv.CSVOptions.timestampFormat的 Spark 时间戳格式("yyyy-MM-dd'T'HH:mm:ss.SSSXXX")。 (请注意，我的sample.csv没有任何时间戳记数据，但是无论如何Spark都要经过这堆程序)。

最初，org.apache.commons.lang3.time.FastDatePrinter通过org.apache.commons:commons-lang3:3.6依赖项添加到项目中，并且工作正常。但是，org.apache.hive:hive-exec:3.1.2库添加了自己的指定包和类的实现，该实现无法解析"XXX"(并且不能排除，因为它是在库本身内部实现的)。

因此，我遇到一种情况，其中2个库依赖项提供了同一包的2种实现，并且我需要在应用程序执行期间选择其中一个特定的实现。如何做到这一点？

P.S. I've found a workaround for this specific "java.lang.IllegalArgumentException: Illegal pattern component: XXX" issue, but I'm more interested in how to resolve such SBT dependencies issues in general.

最佳答案

在版本依赖性冲突的情况下，我通常-

Exclude certain transitive dependencies of a dependency
ref-https://www.scala-sbt.org/1.x/docs/Library-Management.html#Exclude+Transitive+Dependencies

For binary conflicts like mentioned in above query, I use dependencyOverrides and override the version I want.
ref- https://www.scala-sbt.org/1.x/docs/Library-Management.html#Overriding+a+version

rarely, if the problem doesn't solve with above 2 options, then i'll rebuild the my own version with the compatible transitive dependency

PS。请注意其他类型的 hive 执行程序(如果有)，这将使您免受此类情况的影响

关于scala - 如何选择由不同的SBT库依赖项添加的特定版本的程序包，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/61935587/

scala - 如何选择由不同的SBT库依赖项添加的特定版本的程序包

上一篇：hadoop - PigStorage如何在Hadoop中使用，为什么？

下一篇：docker - 使用有限资源调整 Mule 性能