我有这样的数据:
| id | action |
| 1 | increase |
| 2 | increase |
| 1 | increase |
| 1 | decrease |
| 3 | decrease |
我想得到结果
| id | increase | decrease |
| 1 | 2 | 1 |
| 2 | 1 | 0 |
| 3 | 0 | 1 |
我尝试了类似的方法,尽管这是错误的:
val result = data.groupBy($"id").withColumn("increase", data("action").where(" action == 'increase' ").count).withColumn("decrease", data("action").where(" decrease == 'view' ").count)
35: error: value withColumn is not a member of org.apache.spark.sql.GroupedData
最佳答案
您可以使用groupBy.pivot
,并使用count
作为聚合函数:
df.groupBy("id").pivot("action").agg(count($"action")).na.fill(0).show
+---+--------+--------+
| id|decrease|increase|
+---+--------+--------+
| 1| 1| 2|
| 3| 1| 0|
| 2| 0| 1|
+---+--------+--------+
关于sql - Spark SQL : put the conditional count result into a new column,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43671457/