python - 根据groupby条件过滤前n行

我有一个数据框，其中有 4 列 User_id、Transaction_id、产品和日期时间。对于每个用户，我必须选择他最近的前 n 笔交易，假设 n=2，我的数据框如下:

    transaction_id  user_id  product  date
         T1             U1     P1     2019-03-27
         T1             U1     P2     2019-03-27
         T1             U1     P3     2019-03-27
         T2             U1     P2     2019-03-21
         T2             U1     P3     2019-03-21
         T3             U1     P2     2019-03-20

我尝试通过此 group by pandas dataframe and select latest in each group 的帮助来做到这一点

我期望的输出是:

   transaction_id   user_id  product  date
        T1            U1       P1     2019-03-27
        T1            U1       P2     2019-03-27
        T1            U1       P3     2019-03-27
        T2            U1       P2     2019-03-21
        T2            U1       P3     2019-03-21

最佳答案

想法是首先通过 DataFrame.drop_duplicates 删除重复项，获取每组的 top2 值和 DataFrame.merge原始数据框:

df = (df.merge(df.drop_duplicates(['user_id','date'])
                 .sort_values('date',ascending = False)
                 .groupby('user_id')
                 .head(2)[['user_id','date']])
       )
print (df)
  transaction_id user_id product       date
0             T1      U1      P1 2019-03-27
1             T1      U1      P2 2019-03-27
2             T1      U1      P3 2019-03-27
3             T2      U1      P2 2019-03-21
4             T2      U1      P3 2019-03-21

关于python - 根据groupby条件过滤前n行，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/55374127/

上一篇：python - 使用 python 将嵌套 BigQuery 数据导出到云存储

下一篇：python - 如何在 python 中使用 pika (RabbitMQ) 向消费者添加多处理

相关文章：

python - 如何比较嵌套的字典

python - cvCaptureFromCAM(CV_CAP_ANY) 返回 NULL

python - 在 Python Pandas 中将数据帧/列归零的最快方法

python - 如何从 python 的标准输出中删除\n 和\r\n？

python - Pandas:在条件后创建指示列

Python Pandas - 获取特定月份的第一天和最后一天的行

Python 尝试从 youtube api 获取数据时出错

python - 为什么 Pandas 分组聚合会丢弃分类列？

python - Pandas 聚合动态列名

python - Groupby Pandas 生成多个带条件的字段