我有以下数据框 df:
| Staff_ID | Join_Date | Time_Stamp |
|----------|-----------|------------|
| 1 | 3/29/2016 | 4/23/2016 |
| 1 | 3/29/2016 | 3/29/2016 |
| 1 | 3/29/2016 | 6/21/2016 |
| 2 | 5/15/2016 | 4/1/2016 |
| 2 | 5/15/2016 | 5/25/2016 |
| 3 | 7/24/2016 | 6/21/2016 |
| 3 | 7/24/2016 | 6/10/2016 |
| 3 | 7/24/2016 | 4/21/2016 |
我想通过“Staff_ID”获取最小和最大“Time_Stamp Date”分区,这样得到的数据帧如下:
| Staff_ID | Join_Date | Time_Stamp | Min_Time_Stamp | Max_Time_Stamp |
|----------|-----------|------------|----------------|----------------|
| 1 | 3/29/2016 | 4/23/2016 | 3/29/2016 | 6/21/2016 |
| 1 | 3/29/2016 | 3/29/2016 | 3/29/2016 | 6/21/2016 |
| 1 | 3/29/2016 | 6/21/2016 | 3/29/2016 | 6/21/2016 |
| 2 | 5/15/2016 | 4/1/2016 | 4/1/2016 | 5/25/2016 |
| 2 | 5/15/2016 | 5/25/2016 | 4/1/2016 | 5/25/2016 |
| 3 | 7/24/2016 | 6/21/2016 | 4/21/2016 | 6/21/2016 |
| 3 | 7/24/2016 | 6/10/2016 | 4/21/2016 | 6/21/2016 |
| 3 | 7/24/2016 | 4/21/2016 | 4/21/2016 | 6/21/2016 |
我如何在 Python 中执行此操作?
最佳答案
让我们将 groupby
与 transform
和 assign
一起使用:
g = df.groupby('Staff_ID')['Time_Stamp']
df.assign(Min_Time_Stamp = g.transform(min), Max_Time_Stamp = g.transform(max))
输出:
Staff_ID Join_Date Time_Stamp Max_Time_Stamp Min_Time_Stamp
1 1 3/29/2016 4/23/2016 6/21/2016 3/29/2016
2 1 3/29/2016 3/29/2016 6/21/2016 3/29/2016
3 1 3/29/2016 6/21/2016 6/21/2016 3/29/2016
4 2 5/15/2016 4/1/2016 5/25/2016 4/1/2016
5 2 5/15/2016 5/25/2016 5/25/2016 4/1/2016
6 3 7/24/2016 6/21/2016 6/21/2016 4/21/2016
7 3 7/24/2016 6/10/2016 6/21/2016 4/21/2016
8 3 7/24/2016 4/21/2016 6/21/2016 4/21/2016
时间:
@CarlesMitjans 方法:
10 loops, best of 3: 33.3 ms per loop
@ScottBoston 方法:
100 loops, best of 3: 5.52 ms per loop
关于Python 分区依据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47324699/