我的数据是:
User id product_id action
1 apple incart
1 apple purchased
1 banana incart
2 banana incart
2 banana purchased
3 carrot incart
我需要输出为user_id和product_id,其操作仅具有因果关系而不是购买的。
最佳答案
val df1 = df.filter(col("action") === "purchased")
val df2 = df.filter(col("action") === "incart")
df2.join(df1,df2.col("User_id") === df1.col("User_id") && df2.col("product_id") === df1.col("product_id"),"leftanti").drop("action").show
关于sql - 如何在以下情况下在Spark和Hive查询中写入,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52179589/