下面是我正在使用的示例数据集:
maint id
datetime
2015-01-01 1.0 a
2015-01-02 NaN a
2015-01-03 NaN a
2015-01-04 1.0 a
2015-01-05 NaN a
2015-01-06 NaN a
2015-01-07 NaN a
2015-01-01 NaN b
2015-01-02 NaN b
2015-01-03 1.0 b
2015-01-04 1.0 b
2015-01-05 NaN b
2015-01-06 NaN b
2015-01-07 NaN b
我想要得到的是日差,因为df['maint']
是1。
maint id days
datetime
2015-01-01 1.0 a 0
2015-01-02 NaN a 1
2015-01-03 NaN a 2
2015-01-04 1.0 a 0
2015-01-05 NaN a 1
2015-01-06 NaN a 2
2015-01-07 NaN a 3
2015-01-01 NaN b 0
2015-01-02 NaN b 0
2015-01-03 1.0 b 0
2015-01-04 1.0 b 0
2015-01-05 NaN b 1
2015-01-06 NaN b 2
2015-01-07 NaN b 3
因为我有几千个不同的ID,而且每个ID都有几年的维护记录。我想找到一种计算日差的有效方法。
最佳答案
用途:
df['days'] = df.index.where(df['maint'].eq(1))
df['days'] = (df.index - df.groupby('id')['days'].ffill()).fillna(pd.Timedelta(0)).dt.days
print (df)
maint id days
datetime
2015-01-01 1.0 a 0
2015-01-02 NaN a 1
2015-01-03 NaN a 2
2015-01-04 1.0 a 0
2015-01-05 NaN a 1
2015-01-06 NaN a 2
2015-01-07 NaN a 3
2015-01-01 NaN b 0
2015-01-02 NaN b 0
2015-01-03 1.0 b 0
2015-01-04 1.0 b 0
2015-01-05 NaN b 1
2015-01-06 NaN b 2
2015-01-07 NaN b 3
说明:
- 首先创建新列
days
,其值为df.index
,其中maint
为1
,其他值为NaT
- 用
GroupBy.ffill
创建的新系列减去index
,将NaN
替换为0 timedelta
并最后将其转换为天数Series.dt.days
关于python - 计算自上次维护以来的日期差异的有效方法是什么?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55085235/