sql - Spark SQL : put the conditional count result into a new column

标签 sql scala apache-spark apache-spark-sql

我有这样的数据:

|  id  |  action  |
|   1  | increase |
|   2  | increase |
|   1  | increase |
|   1  | decrease |
|   3  | decrease |

我想得到结果

|  id  | increase | decrease |
|   1  |     2    |     1    |
|   2  |     1    |     0    |
|   3  |     0    |     1    |

我尝试了类似的方法，尽管这是错误的:

val result = data.groupBy($"id").withColumn("increase", data("action").where(" action == 'increase' ").count).withColumn("decrease", data("action").where(" decrease == 'view' ").count)

35: error: value withColumn is not a member of org.apache.spark.sql.GroupedData

最佳答案

您可以使用groupBy.pivot，并使用count作为聚合函数:

df.groupBy("id").pivot("action").agg(count($"action")).na.fill(0).show
+---+--------+--------+
| id|decrease|increase|
+---+--------+--------+
|  1|       1|       2|
|  3|       1|       0|
|  2|       0|       1|
+---+--------+--------+

关于sql - Spark SQL : put the conditional count result into a new column，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/43671457/

上一篇：scala - 如何创建一个对数组和选项通用的函数

下一篇：json - 服务未监听 Hosting.json 中的端口

mysql - 在具有不同值的行上连接两个相同的表

sql - 按时间段聚合 SQL 列值

sql - Rails : Two 'where' queries - each works individually, 但不在一起

php - 使用 CONCAT 格式化 MySQL

scala - 迭代 DataFrame 时更新列

eclipse - Scala:Eclipse 访问 Ubuntu 中的环境变量

Scala:如何定义返回子类实例的方法

scala - SBT 项目中的条件设置

java - Jersey + Spark javax.ws.rs.core.UriBuilder.uri