python - 计算 pandas 中另一列上的连续日期列与 groupby 之间的差异?

标签 python pandas

我有一个 pandas 数据框,

data = pd.DataFrame([['Car','2019-01-06T21:44:09Z'],
                     ['Train','2019-01-06T19:44:09Z'],
                     ['Train','2019-01-02T19:44:09Z'],
                     ['Car','2019-01-08T06:44:09Z'],
                     ['Car','2019-01-06T18:44:09Z'],
                     ['Train','2019-01-04T19:44:09Z'],
                     ['Car','2019-01-05T16:34:09Z'],
                     ['Train','2019-01-08T19:44:09Z'],
                     ['Car','2019-01-07T14:44:09Z'],
                     ['Car','2019-01-06T11:44:09Z'],
                     ['Train','2019-01-10T19:44:09Z'],
                     ], 
                    columns=['Type', 'Date'])

按日期排序后,需要找到每种类型的连续日期之间的差异

最终数据如下

data = pd.DataFrame([['Car','2019-01-06T21:44:09Z',1],
                     ['Train','2019-01-06T19:44:09Z',4],
                     ['Train','2019-01-02T19:44:09Z',0],
                     ['Car','2019-01-08T06:44:09Z',3],
                     ['Car','2019-01-06T18:44:09Z',1],
                     ['Train','2019-01-04T19:44:09Z',2],
                     ['Car','2019-01-05T16:34:09Z',0],
                     ['Train','2019-01-08T19:44:09Z',6],
                     ['Car','2019-01-07T14:44:09Z',2],
                     ['Car','2019-01-06T11:44:09Z',1],
                     ['Train','2019-01-10T19:44:09Z',8],
                     ], 
                    columns=['Type', 'Date','diff'])

此处,Type Car min(Date) 为 2019-01-05T16:34:09Z,因此差异从 0 开始,然后第二个日期为 2019-01-06T18:44:09Z 和 2019-01-06T11:44 :09Z,所以 diff 是 1 天(这里不确定是否可以包括时间)等等.. 对于 Type Train min(Date) 是 2019-01-02T19:44:09Z,所以 diff 是 0 然后 2019-01-04T19:44:09Z 所以 2 天 diff

我尝试了 groupby,但不确定如何包括日期排序

data['diff'] = data.groupby('Type')['Date'].diff() / np.timedelta64(1, 'D')

最佳答案

pandas.DataFrame.groupbydt.date一起使用:

df['diff'] = df.groupby('Type')['Date'].apply(lambda x: x.dt.date - x.min().date())

输出:

     Type                      Date   diff
0     Car 2019-01-06 21:44:09+00:00 1 days
1   Train 2019-01-06 19:44:09+00:00 4 days
2   Train 2019-01-02 19:44:09+00:00 0 days
3     Car 2019-01-08 06:44:09+00:00 3 days
4     Car 2019-01-06 18:44:09+00:00 1 days
5   Train 2019-01-04 19:44:09+00:00 2 days
6     Car 2019-01-05 16:34:09+00:00 0 days
7   Train 2019-01-08 19:44:09+00:00 6 days
8     Car 2019-01-07 14:44:09+00:00 2 days
9     Car 2019-01-06 11:44:09+00:00 1 days
10  Train 2019-01-10 19:44:09+00:00 8 days

如果您希望它们为int,请添加dt.days:

df['diff'] = df.groupby('Type')['Date'].apply(lambda x: x.dt.date - x.min().date()).dt.days

输出:

     Type                      Date  diff
0     Car 2019-01-06 21:44:09+00:00     1
1   Train 2019-01-06 19:44:09+00:00     4
2   Train 2019-01-02 19:44:09+00:00     0
3     Car 2019-01-08 06:44:09+00:00     3
4     Car 2019-01-06 18:44:09+00:00     1
5   Train 2019-01-04 19:44:09+00:00     2
6     Car 2019-01-05 16:34:09+00:00     0
7   Train 2019-01-08 19:44:09+00:00     6
8     Car 2019-01-07 14:44:09+00:00     2
9     Car 2019-01-06 11:44:09+00:00     1
10  Train 2019-01-10 19:44:09+00:00     8

关于python - 计算 pandas 中另一列上的连续日期列与 groupby 之间的差异?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59834527/

相关文章:

Python - Matplotlib/matplotlib.cbook.TimeoutError : LOCKERROR

python - 过滤 Pandas 数据透视表

python - 在 Python 中使用两个列变量将数据框转换为频率列表

python - Pandas 依靠柱子

python - Tkinter 破坏 Toplevel 并产生另一个错误

python - 在Python中通过SQL中的多个数据库进行迭代循环

python - 将 '?' 替换为 Nan 或零

python - ValueError : labels shape must be [batch_size, labels_dimension], 得到 (128, 2)

Python3 与 sql 变量

python - 使用 CherryPy/Cherryd 启动多个 Flask 实例