scala - Spark 斯卡拉 : How to Replace a Field in Deeply Nested DataFrame

我有一个包含多个嵌套列的 DataFrame。该模式不是静态的，可能会在我的 Spark 应用程序的上游发生变化。模式演化保证始终向后兼容。下面粘贴了一个匿名的、缩短版本的架构

root
 |-- isXPresent: boolean (nullable = true)
 |-- isYPresent: boolean (nullable = true)
 |-- isZPresent: boolean (nullable = true)
 |-- createTime: long (nullable = true)
<snip>
 |-- structX: struct (nullable = true)
 |    |-- hostIPAddress: integer (nullable = true)
 |    |-- uriArguments: string (nullable = true)
<snip>
 |-- structY: struct (nullable = true)
 |    |-- lang: string (nullable = true)
 |    |-- cookies: map (nullable = true)
 |    |    |-- key: string
 |    |    |-- value: array (valueContainsNull = true)
 |    |    |    |-- element: string (containsNull = true)
<snip>

spark 作业应该将“structX.uriArguments”从字符串转换为 map(string, string)。在 this post 中有类似的情况。 .但是，答案假设模式是静态的并且不会改变。所以 case class 在我的情况下不起作用。

转换structX.uriArguments 的最佳方式是什么，而无需在代码中对整个架构进行硬编码？结果应如下所示:

root
 |-- isXPresent: boolean (nullable = true)
 |-- isYPresent: boolean (nullable = true)
 |-- isZPresent: boolean (nullable = true)
 |-- createTime: long (nullable = true)
<snip>
 |-- structX: struct (nullable = true)
 |    |-- hostIPAddress: integer (nullable = true)
 |    |-- uriArguments: map (nullable = true)
 |    |    |-- key: string
 |    |    |-- value: string (valueContainsNull = true)
<snip>
 |-- structY: struct (nullable = true)
 |    |-- lang: string (nullable = true)
 |    |-- cookies: map (nullable = true)
 |    |    |-- key: string
 |    |    |-- value: array (valueContainsNull = true)
 |    |    |    |-- element: string (containsNull = true)
<snip>

谢谢

最佳答案

您可以尝试使用 DataFrame.withColumn()。它允许您引用嵌套字段。您可以添加一个新的 map 列并删除平面列。 This question展示了如何使用 withColumn 处理结构。

关于scala - Spark 斯卡拉 : How to Replace a Field in Deeply Nested DataFrame，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/50421494/

scala - Spark 斯卡拉 : How to Replace a Field in Deeply Nested DataFrame

上一篇：sql-server - 调试器未正确安装 SSIS

下一篇：sas - 如何使用宏中的 SYSPBUFF 通过变量列表屏蔽 "OR"