scala - 在 Spark 中连接 Maptype 值时如何处理空值

我正在尝试使用 concat_map() 连接 Maptype 的两列。我的问题是，当我尝试连接一个 null 和一个 Map 时，当我期望获得非 Null Map 值时却得到了一个 null。

val DF_concatenated=    DF.select(col("_1"), map_concat(col("m2"),col("m3"))).show()

我正在尝试从这个 DataFrame DF 中获取:

+---+----------+----------------+
| _1|        m2|              m3|
+---+----------+----------------+
|  3|[c -> III]|            null|
|  1|  [a -> I]|     [one -> un]|
|  4|      null|[four -> quatre]|
|  2| [b -> II]|   [two -> deux]|
+---+----------+----------------+

到此数据框 DF_concatenated:

+---+----------------------+
| _1|  map_concat(m2, m3)  |
+---+----------------------+
|  3|           [c -> III] |
|  1| [a -> I, one -> un]  |
|  4|    [four -> quatre]  |
|  2|[b -> II, two -> deux]|
+---+----------------------+

但我最终得到了这个输出:

+---+----------------------+
| _1|  map_concat(m2, m3)  |
+---+----------------------+
|  3|                null  |
|  1| [a -> I, one -> un]  |
|  4|                null  |
|  2|[b -> II, two -> deux]|
+---+----------------------+

最佳答案

map_concat 的行为是，即使单个操作数为 null，它也返回 null。

如果您的列可以为 null，您可以使用 coalesce 将 null 替换为空映射。

DF.select(
   col("_1"),
   map_concat(
       coalesce(col("m2"), map()),
       coalesce(col("m3"), map())
   ).as("result")
).show()

关于scala - 在 Spark 中连接 Maptype 值时如何处理空值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/68316614/

上一篇：r - 如何摆脱 R 中 gtsummary 包创建的表中的前导零？

下一篇：pytorch - 如何避免在推理过程中得到重叠的关键点？

Scala/Akka/Guice 动态注入(inject) child Actor

apache-spark - 将流式数据集附加到 Spark 中的批处理数据集

python - Python中解析Json空数据

java - 从循环中将 null 放入数组

scala - Apache Spark 独立集群初始作业不接受资源

scala - SBT:访问scala构建中子项目的设置值

java - Spark-Java : How to replace a column name in Dataset<Row> to new name?

scala - 如何在Spark 2.X数据集中创建自定义编码器？

c++ - 取消引用空指针