我正在尝试计算一列中的值 (One
) 为 1
与另一列中的值 (Value
) 之间的天数差异code>) 大于 0
。
df = pd.DataFrame({'Date':['02.01.2017', '03.01.2017', '04.01.2017', '05.01.2017', '01.01.2017', '02.01.2017', '03.01.2017', '02.12.2017', '03.12.2017', '04.12.2017'],
'CustomerId':['02','02','02','02','03','03','03', '05', '05', '05'],
'Value':[0, 0, 10, 100, 0, 10000, 10000, 0, 0, 12312312],
'One':[1, 1, 0, 0, 1, 0, 0, 1, 0, 0]})
def dayDiff(groupby):
if (not (groupby['One'] == 1).any()) or (not (groupby['Value'] > 0).any()):
return np.zeros(groupby['Date'].count())
min_date = groupby[groupby['One'] == 1]['Date'].iloc[0]
max_date = groupby[groupby['Value'] > 0]['Date'].iloc[0]
delta = max_date - min_date
return np.where(groupby['Value'] > 0 , delta.days, 0)
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
DateDiff = df.groupby('CustomerId').apply(dayDiff).explode().rename('DateDiff').reset_index(drop=True)
df = pd.concat([df, DateDiff], axis=1)
df
结果是:
Date CustomerId Value One DateDiff
0 2017-01-02 02 0 1 0
1 2017-01-03 02 0 1 0
2 2017-01-04 02 10 0 2
3 2017-01-05 02 100 0 2
4 2017-01-01 03 0 1 0
5 2017-01-02 03 10000 0 1
6 2017-01-03 03 10000 0 1
7 2017-12-02 05 0 1 0
8 2017-12-03 05 0 0 0
9 2017-12-04 05 12312312 0 2
问题是第 2 行显示错误的值。我希望它显示值 1
,第 6 行显示 2
。因为我想计算当 Value
大于零时 One
中的最后一个 1
值与客户之间的天数差异。似乎无论日期如何,dayDiff()
都会计算相同的天数差异。
我尝试更改 iloc[0]
值,但结果并不完全正确。
期望(请注意,DateDiff
的第 2 行和第 6 行现在是正确的):
Date CustomerId Value One DateDiff
0 2017-01-02 02 0 1 0
1 2017-01-03 02 0 1 0
2 2017-01-04 02 10 0 1
3 2017-01-05 02 100 0 2
4 2017-01-01 03 0 1 0
5 2017-01-02 03 10000 0 1
6 2017-01-03 03 10000 0 2
7 2017-12-02 05 0 1 0
8 2017-12-03 05 0 0 0
9 2017-12-04 05 12312312 0 2
编辑:使用@jezrael的建议,我意识到当有多个 1
超出时会出现问题。日子变得消极。我希望行 2
显示 0,因为 2017-01-04
- 2017-01-04
应该为零,因为它是最后一个日期。换句话说,是之前的最后一个日期或同一日期。
df = pd.DataFrame({'Date':['02.01.2017', '03.01.2017', '04.01.2017', '05.01.2017', '01.01.2017', '02.01.2017', '03.01.2017', '02.12.2017', '03.12.2017', '04.12.2017'],
'CustomerId':['02','02','02','02','03','03','03', '05', '05', '05'],
'Value':[0, 0, 10, 100, 0, 10000, 10000, 0, 0, 12312312],
'One':[1, 1, 1, 1, 1, 0, 0, 1, 0, 0]})
Date CustomerId Value One DateDiff
0 2017-01-02 02 0 1 0
1 2017-01-03 02 0 1 0
2 2017-01-04 02 10 1 -1
3 2017-01-05 02 100 1 0
4 2017-01-01 03 0 1 0
5 2017-01-02 03 10000 0 1
6 2017-01-03 03 10000 0 2
7 2017-12-02 05 0 1 0
8 2017-12-03 05 0 0 0
9 2017-12-04 05 12312312 0 2
最佳答案
我相信您需要最后一个值 Date
与 One == 1
的差异,以及每组 Value > 0
的所有值:
def dayDiff(groupby):
if (not (groupby['One'] == 1).any()) or (not (groupby['Value'] > 0).any()):
groupby['DateDiff'] = 0
return groupby
min_date = groupby.loc[groupby['One'] == 1, 'Date'].iloc[-1]
max_date = groupby.loc[groupby['Value'] > 0, 'Date']
delta = max_date - min_date
groupby['DateDiff'] = delta.dt.days.reindex(groupby.index, fill_value=0)
return groupby
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df = df.groupby('CustomerId').apply(dayDiff)
print (df)
Date CustomerId Value One DateDiff
0 2017-01-02 02 0 1 0
1 2017-01-03 02 0 1 0
2 2017-01-04 02 10 0 1
3 2017-01-05 02 100 0 2
4 2017-01-01 03 0 1 0
5 2017-01-02 03 10000 0 1
6 2017-01-03 03 10000 0 2
7 2017-12-02 05 0 1 0
8 2017-12-03 05 0 0 0
9 2017-12-04 05 12312312 0 2
编辑:另一个想法是通过掩码过滤 groupby
之前的行,然后append
不匹配的行:
def dayDiff(groupby):
if (not (groupby['One'] == 1).any()) or (not (groupby['Value'] > 0).any()):
groupby['DateDiff'] = 0
return groupby
min_date = groupby.loc[groupby['One'] == 1, 'Date'].iloc[-1]
max_date = groupby.loc[groupby['Value'] > 0, 'Date']
delta = max_date - min_date
groupby['DateDiff'] = delta.dt.days.reindex(groupby.index, fill_value=0)
return groupby
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
m1 = (df['One'] == 1) & (df['Value'] <= 0)
m2 = (df['Value'] > 0) & (df['One'] != 1)
mask = m1 | m2
df = df[mask].groupby('CustomerId').apply(dayDiff).append(df[~mask], sort=False).sort_index()
df['DateDiff'] = df['DateDiff'].fillna(0).astype(int)
print (df)
Date CustomerId Value One DateDiff
0 2017-01-02 02 0 1 0
1 2017-01-03 02 0 1 0
2 2017-01-04 02 10 1 0
3 2017-01-05 02 100 1 0
4 2017-01-01 03 0 1 0
5 2017-01-02 03 10000 0 1
6 2017-01-03 03 10000 0 2
7 2017-12-02 05 0 1 0
8 2017-12-03 05 0 0 0
9 2017-12-04 05 12312312 0 2
关于python - 计算给定两列值的天数差异,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57571505/