Python 分区依据

标签 python group-by timestamp partition

我有以下数据框 df:

| Staff_ID | Join_Date | Time_Stamp |
|----------|-----------|------------|
| 1        | 3/29/2016 | 4/23/2016  |
| 1        | 3/29/2016 | 3/29/2016  |
| 1        | 3/29/2016 | 6/21/2016  |
| 2        | 5/15/2016 | 4/1/2016   |
| 2        | 5/15/2016 | 5/25/2016  |
| 3        | 7/24/2016 | 6/21/2016  |
| 3        | 7/24/2016 | 6/10/2016  |
| 3        | 7/24/2016 | 4/21/2016  |

我想通过“Staff_ID”获取最小和最大“Time_Stamp Date”分区,这样得到的数据帧如下:

| Staff_ID | Join_Date | Time_Stamp | Min_Time_Stamp | Max_Time_Stamp |
|----------|-----------|------------|----------------|----------------|
| 1        | 3/29/2016 | 4/23/2016  | 3/29/2016      | 6/21/2016      |
| 1        | 3/29/2016 | 3/29/2016  | 3/29/2016      | 6/21/2016      |
| 1        | 3/29/2016 | 6/21/2016  | 3/29/2016      | 6/21/2016      |
| 2        | 5/15/2016 | 4/1/2016   | 4/1/2016       | 5/25/2016      |
| 2        | 5/15/2016 | 5/25/2016  | 4/1/2016       | 5/25/2016      |
| 3        | 7/24/2016 | 6/21/2016  | 4/21/2016      | 6/21/2016      |
| 3        | 7/24/2016 | 6/10/2016  | 4/21/2016      | 6/21/2016      |
| 3        | 7/24/2016 | 4/21/2016  | 4/21/2016      | 6/21/2016      |

我如何在 Python 中执行此操作?

最佳答案

让我们将 groupbytransformassign 一起使用:

g = df.groupby('Staff_ID')['Time_Stamp']
df.assign(Min_Time_Stamp = g.transform(min), Max_Time_Stamp = g.transform(max))

输出:

     Staff_ID    Join_Date    Time_Stamp Max_Time_Stamp Min_Time_Stamp
1   1           3/29/2016    4/23/2016      6/21/2016      3/29/2016  
2   1           3/29/2016    3/29/2016      6/21/2016      3/29/2016  
3   1           3/29/2016    6/21/2016      6/21/2016      3/29/2016  
4   2           5/15/2016    4/1/2016       5/25/2016      4/1/2016   
5   2           5/15/2016    5/25/2016      5/25/2016      4/1/2016   
6   3           7/24/2016    6/21/2016      6/21/2016      4/21/2016  
7   3           7/24/2016    6/10/2016      6/21/2016      4/21/2016  
8   3           7/24/2016    4/21/2016      6/21/2016      4/21/2016  

时间:

@CarlesMitjans 方法:

10 loops, best of 3: 33.3 ms per loop

@ScottBoston 方法:

100 loops, best of 3: 5.52 ms per loop

关于Python 分区依据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47324699/

相关文章:

python - web2py错误: Connection reset by peer

php - groupBy 仅返回 QueryBuilder laravel 5 中该特定 id 的 1 个结果集

javascript - 在javascript中将时间戳转换为特定的日期格式

php - Yii2 model->created_date 是 future 的,不会在查询时返回

python - 线程错误 : AttributeError: 'NoneType' object has no attribute '_initialized'

python - 在 Python 中实现 k 均值聚类,并使用三角不等式加速 (Scikit learn)

python - 在 Python 中导入

python - 将 pandas groupby() 中的值提取到结合单个值和 numpy 数组的新数据集中

mysql - GROUP BY - 不分组 NULL

php - 在 PHP 中从 SQL 格式化时间戳的最简单方法是什么?