我有一个像这样的数据框:
id_a | date
12 | 2020-01-01
12 | 2020-01-02
13 | 2020-01-01
13 | 2020-01-03
14 | 2020-01-01
14 | 2020-01-02
14 | 2020-01-06
我希望能够根据 id_a 区分每个组的最大日期和最小日期 得到类似的东西
id_a | date | diff
12 | 2020-01-01 | 1
12 | 2020-01-02 | 1
13 | 2020-01-01 | 2
13 | 2020-01-03 | 2
14 | 2020-01-01 | 5
14 | 2020-01-02 | 5
14 | 2020-01-06 | 5
我正在尝试这样做:
df['diff'] = df.groupby('id_a').apply(lambda x: max(x['date']) - min(x['date']))
但是我有点挣扎
我走的路正确吗?
最佳答案
您想要transform
而不是apply
。另外np.ptp
会做:
# convert to datetime, ignore if already is
df['date'] = pd.to_datetime(df['date'])
df['date_diff'] = df.groupby('id_a')['date'].transform(np.ptp)
输出:
id_a date date_diff
0 12 2020-01-01 1 days
1 12 2020-01-02 1 days
2 13 2020-01-01 2 days
3 13 2020-01-03 2 days
4 14 2020-01-01 5 days
5 14 2020-01-02 5 days
6 14 2020-01-06 5 days
<小时/>
更新:如果您想获得max
来自date_a
和min
来自date_b
:
groups = df.groupby('id_a')
min_dates = groups['date_b'].transform('min')
max_dates = groups['date_a'].transform('max')
df['date_diff'] = max_dates - min_dates
关于python - Pandas 数据框分组函数计算日期差异,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60605291/