x = df.withColumn("id_col", F.monotonically_increasing_id())
返回随机长整数而不是排序的 int 数字 enter image description here
最佳答案
您所看到的是该函数的预期行为。来自文档
The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. The current implementation puts the partition ID in the upper 31 bits, and the record number within each partition in the lower 33 bits. The assumption is that the data frame has less than 1 billion partitions, and each partition has less than 8 billion records
这就是您看到长随机整数的原因。它们可能不是连续的,但它们是按递增顺序排列的,并且出于所有实际目的,它们是唯一的。
关于python - F.monotonicly_increasing_id() 返回长随机数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58623659/