我正在使用SBT 1.3.10在Scala 2.11上开发Apache Spark应用程序。我在未安装Spark / Hadoop / Hive的本地计算机上使用IDE,而是将它们添加为SBT依赖项(Hadoop 3.1.2,Spark 2.4.5,Hive 3.1.2)。我的SBT如下:
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-sql" % "2.4.5",
"org.apache.hadoop" % "hadoop-client" % "3.1.2",
"com.fasterxml.jackson.core" % "jackson-core" % "2.9.10",
"com.fasterxml.jackson.core" % "jackson-databind" % "2.9.10",
"com.fasterxml.jackson.module" %% "jackson-module-scala" % "2.9.10",
// about these two later in the question
"org.apache.hive" % "hive-exec" % "3.1.2",
"org.apache.commons" % "commons-lang3" % "3.6"
)
在我的应用程序中,我正在使用提供的模式将示例CSV文件读取到DataFrame中:
val init = spark.read
.format("csv")
.option("header", value = false)
.schema(sampleCsvSchema)
.load("src/main/resources/sample.csv")
init.show(10, false)
在某个时刻,我不得不添加
org.apache.hive:hive-exec:3.1.2
依赖项,并在执行过程中遇到异常:Illegal pattern component: XXX
java.lang.IllegalArgumentException: Illegal pattern component: XXX
at org.apache.commons.lang3.time.FastDatePrinter.parsePattern(FastDatePrinter.java:282)
at org.apache.commons.lang3.time.FastDatePrinter.init(FastDatePrinter.java:149)
at org.apache.commons.lang3.time.FastDatePrinter.<init>(FastDatePrinter.java:142)
at org.apache.commons.lang3.time.FastDateFormat.<init>(FastDateFormat.java:369)
at org.apache.commons.lang3.time.FastDateFormat$1.createInstance(FastDateFormat.java:91)
at org.apache.commons.lang3.time.FastDateFormat$1.createInstance(FastDateFormat.java:88)
at org.apache.commons.lang3.time.FormatCache.getInstance(FormatCache.java:82)
at org.apache.commons.lang3.time.FastDateFormat.getInstance(FastDateFormat.java:165)
at org.apache.spark.sql.execution.datasources.csv.CSVOptions.<init>(CSVOptions.scala:139)
at org.apache.spark.sql.execution.datasources.csv.CSVOptions.<init>(CSVOptions.scala:41)
...
它说
org.apache.commons.lang3.time.FastDatePrinter.parsePattern()
无法解析默认设置为org.apache.spark.sql.execution.datasources.csv.CSVOptions.timestampFormat
的 Spark 时间戳格式("yyyy-MM-dd'T'HH:mm:ss.SSSXXX"
)。 (请注意,我的sample.csv没有任何时间戳记数据,但是无论如何Spark都要经过这堆程序)。最初,
org.apache.commons.lang3.time.FastDatePrinter
通过org.apache.commons:commons-lang3:3.6
依赖项添加到项目中,并且工作正常。但是,org.apache.hive:hive-exec:3.1.2
库添加了自己的指定包和类的实现,该实现无法解析"XXX"
(并且不能排除,因为它是在库本身内部实现的)。因此,我遇到一种情况,其中2个库依赖项提供了同一包的2种实现,并且我需要在应用程序执行期间选择其中一个特定的实现。如何做到这一点?
P.S. I've found a workaround for this specific "java.lang.IllegalArgumentException: Illegal pattern component: XXX" issue, but I'm more interested in how to resolve such SBT dependencies issues in general.
最佳答案
在版本依赖性冲突的情况下,我通常-
Exclude certain transitive dependencies of a dependency
ref-https://www.scala-sbt.org/1.x/docs/Library-Management.html#Exclude+Transitive+Dependencies
For binary conflicts like mentioned in above query, I use dependencyOverrides and override the version I want.
ref- https://www.scala-sbt.org/1.x/docs/Library-Management.html#Overriding+a+version
rarely, if the problem doesn't solve with above 2 options, then i'll rebuild the my own version with the compatible transitive dependency
PS。请注意其他类型的 hive 执行程序(如果有),这将使您免受此类情况的影响
关于scala - 如何选择由不同的SBT库依赖项添加的特定版本的程序包,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61935587/