我正在尝试使用Databricks库加载XML文件并将数据写入文件,但无法将输出data(array<string>)
写入csv文件。
我收到以下错误:
Exception in thread "main" java.lang.UnsupportedOperationException: CSV data source does not support array<string> data type.
当我打印数据集时,它的打印如下:
+--------------------+
| orgname|
+--------------------+
|[Muncy, Geissler,...|
|[Muncy, Geissler,...|
|[Knobbe Martens O...|
|[null, Telekta La...|
|[McAndrews, Held ...|
|[Notaro, Michalos...|
| null|
|[Cowan, Liebowitz...|
| null|
|[Kunzler Law Grou...|
|[null, null, Klei...|
|[Knobbe, Martens,...|
|[Merchant & Gould...|
| null|
|[Culhane Meadows ...|
|[Culhane Meadows ...|
|[Vista IP Law Gro...|
|[Thompson & Knigh...|
| [Fish & Tsang LLP]|
| null|
+--------------------+
最佳答案
异常(exception)应该是不言自明的。您不能将数组写入CSV
文件。
您必须将其串联为一个字符串:
import org.apache.spark.sql.functions.concat_ws
val separator: String = ";" // Choose appropriate one in your case
df.withColumn("orgname", concat_ws(separator, $"orgname")).write.csv(...)
关于xml - Spark Dataset write()方法返回错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48563150/