我有这样的数据集
user-id date-time msg
1 2016-12-09 10:25:00 1
2 2016-12-09 10:26:00 0
3 2016-12-09 10:26:00 1
2 2016-12-09 10:27:00 1
1 2016-12-09 10:28:00 2
2 2016-12-09 10:28:00 1
3 2016-12-09 10:29:00 2
2 2016-12-09 10:29:00 1
1 2016-12-09 10:30:00 3
我想要一个新列来计算每条记录与消息第一次与该记录相似的时间之间的时间差。像这样:
user-id date-time msg time-diffrence
1 2016-12-09 10:25:00 1 00:00
2 2016-12-09 10:26:00 0 00:00
3 2016-12-09 10:26:00 1 01:00
2 2016-12-09 10:27:00 1 02:00
1 2016-12-09 10:28:00 2 00:00
2 2016-12-09 10:28:00 1 03:00
3 2016-12-09 10:29:00 2 01:00
2 2016-12-09 10:29:00 1 04:00
1 2016-12-09 10:30:00 3 00:00
我找到了只考虑日期时间或使用 loc 或 iloc 的解决方案,但它们并不适用。
最佳答案
选项#1
使用groupby
和iloc
:
df['time-difference'] = df.groupby('msg')['date-time'].apply(lambda x: x - x.iloc[0])
输出:
user-id date-time msg time-difference
0 1 2016-12-09 10:25:00 1 00:00:00
1 2 2016-12-09 10:26:00 0 00:00:00
2 3 2016-12-09 10:26:00 1 00:01:00
3 2 2016-12-09 10:27:00 1 00:02:00
4 1 2016-12-09 10:28:00 2 00:00:00
5 2 2016-12-09 10:28:00 1 00:03:00
6 3 2016-12-09 10:29:00 2 00:01:00
7 2 2016-12-09 10:29:00 1 00:04:00
8 1 2016-12-09 10:30:00 3 00:00:00
选项#2
将 groupby
与 transform
和 first
或 min
一起使用:
df['time-difference'] = df['date-time'] - df.groupby('msg')['date-time'].transform('first')
输出:
user-id date-time msg time-difference
0 1 2016-12-09 10:25:00 1 00:00:00
1 2 2016-12-09 10:26:00 0 00:00:00
2 3 2016-12-09 10:26:00 1 00:01:00
3 2 2016-12-09 10:27:00 1 00:02:00
4 1 2016-12-09 10:28:00 2 00:00:00
5 2 2016-12-09 10:28:00 1 00:03:00
6 3 2016-12-09 10:29:00 2 00:01:00
7 2 2016-12-09 10:29:00 1 00:04:00
8 1 2016-12-09 10:30:00 3 00:00:00
关于python - 如何根据另一列查找时差,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49594784/