python - 访问 WrappedArray 元素

我有一个 spark 数据框，这里是架构:

|-- eid: long (nullable = true)
|-- age: long (nullable = true)
|-- sex: long (nullable = true)
|-- father: array (nullable = true)
|    |-- element: array (containsNull = true)
|    |    |-- element: long (containsNull = true)

和行样本:。

df.select(df['father']).show()
+--------------------+
|              father|
+--------------------+
|[WrappedArray(-17...|
|[WrappedArray(-11...|
|[WrappedArray(13,...|
+--------------------+

类型是

DataFrame[father: array<array<bigint>>]

如何访问内部数组的每个元素？例如第一行的-17？我尝试了不同的东西，比如 df.select(df['father'])(0)(0).show() 但没有成功。

最佳答案

如果我没记错的话，Python 中的语法是

df.select(df['father'])[0][0].show()

或

df.select(df['father']).getItem(0).getItem(0).show()

请在此处查看一些示例:http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=column#pyspark.sql.Column

关于python - 访问 WrappedArray 元素，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/44468311/

上一篇：python - 从桌面文本文件创建一列并在 python tkinter 中将其识别为正面、负面或中性句子

下一篇：python - 在 sympy 表达式中使用 LaTeX 符号

相关文章：

scala - 如何解决 akka 版本兼容性问题？

apache-spark - Spark 中的 HIVE Metastore 警告

python - 无法使用按钮和自定义数据更新 Tkinter matplotlib 图形

python - 在Python中实现模幂的蒙哥马利阶梯法

python - 使用 bool 掩码有效地将 numpy 数组的元素归零

python - 运行 python 时 GVIM 崩溃

scala - 类型类参数的 ClassTag

Scala 元组选项

scala - SBT 0.13.1 离线

performance - Apache Spark : map vs mapPartitions?