python - 合并 pySpark RDD 中的列表列表

我有一些元组列表，我想将它们组合成一个列表。我已经能够使用 lambdas 和列表理解来处理数据，直到我接近能够使用 reduceByKey 但不确定如何合并列表。所以格式...

[[(0, 14), (0, 24)], [(1, 19), (1, 50)], ...]

我希望它是这样的......

[(0, 14), (0, 24), (1, 19), (1, 50), ...]

让我到达目的地的代码...

test = test.map(lambda x: (x[1], [e * local[x[1]] for e in x[0]]))
test = test.map(lambda x: [(x[0], y) for y in x[1]])

但不确定如何合并列表

最佳答案

你可以做到，

test = test.flatMap(identity)

或

test = test.flatMap(lambda list: list)

关于python - 合并 pySpark RDD 中的列表列表，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/46556327/

相关文章：

python - 运行 Pyspark 程序时出现 Py4JJavaError