scala - Spark数据帧:How to add a index Column : Aka Distributed Data Index

我从 csv 文件读取数据，但没有索引。

我想添加从 1 到行号的列。

我该怎么办，谢谢(scala)

最佳答案

使用 Scala，您可以使用:

import org.apache.spark.sql.functions._ 

df.withColumn("id",monotonicallyIncreasingId)

可以引用这个exemple和 scala docs .

通过 Pyspark，您可以使用:

from pyspark.sql.functions import monotonically_increasing_id 

df_index = df.select("*").withColumn("id", monotonically_increasing_id())

关于scala - Spark数据帧:How to add a index Column : Aka Distributed Data Index，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/43406887/