python - 计算给定两列值的天数差异

标签 python pandas function datetime

我正在尝试计算一列中的值 (One) 为 1 与另一列中的值 (Value) 之间的天数差异code>) 大于 0

df = pd.DataFrame({'Date':['02.01.2017', '03.01.2017', '04.01.2017', '05.01.2017', '01.01.2017', '02.01.2017', '03.01.2017', '02.12.2017', '03.12.2017', '04.12.2017'],
                   'CustomerId':['02','02','02','02','03','03','03', '05', '05', '05'],
                   'Value':[0, 0, 10, 100, 0, 10000, 10000, 0, 0, 12312312],
                   'One':[1, 1, 0, 0, 1, 0, 0, 1, 0, 0]})

def dayDiff(groupby):
    if (not (groupby['One'] == 1).any()) or (not (groupby['Value'] > 0).any()):
        return np.zeros(groupby['Date'].count())

    min_date = groupby[groupby['One'] == 1]['Date'].iloc[0]
    max_date = groupby[groupby['Value'] > 0]['Date'].iloc[0]
    delta = max_date - min_date
    return np.where(groupby['Value'] > 0 , delta.days, 0)


df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
DateDiff = df.groupby('CustomerId').apply(dayDiff).explode().rename('DateDiff').reset_index(drop=True)
df = pd.concat([df, DateDiff], axis=1)
df

结果是:

          Date  CustomerId     Value    One DateDiff
0   2017-01-02          02         0    1   0
1   2017-01-03          02         0    1   0
2   2017-01-04          02        10    0   2
3   2017-01-05          02       100    0   2
4   2017-01-01          03         0    1   0
5   2017-01-02          03     10000    0   1
6   2017-01-03          03     10000    0   1
7   2017-12-02          05         0    1   0
8   2017-12-03          05         0    0   0
9   2017-12-04          05  12312312    0   2

问题是第 2 行显示错误的值。我希望它显示值 1,第 6 行显示 2。因为我想计算当 Value 大于零时 One 中的最后一个 1 值与客户之间的天数差异。似乎无论日期如何,dayDiff() 都会计算相同的天数差异。

我尝试更改 iloc[0] 值,但结果并不完全正确。

期望(请注意,DateDiff 的第 2 行和第 6 行现在是正确的):

          Date  CustomerId     Value    One DateDiff
0   2017-01-02          02         0    1   0
1   2017-01-03          02         0    1   0
2   2017-01-04          02        10    0   1
3   2017-01-05          02       100    0   2
4   2017-01-01          03         0    1   0
5   2017-01-02          03     10000    0   1
6   2017-01-03          03     10000    0   2
7   2017-12-02          05         0    1   0
8   2017-12-03          05         0    0   0
9   2017-12-04          05  12312312    0   2

编辑:使用@jezrael的建议,我意识到当有多个 1 超出时会出现问题。日子变得消极。我希望行 2 显示 0,因为 2017-01-04 - 2017-01-04 应该为零,因为它是最后一个日期。换句话说,是之前的最后一个日期或同一日期。

df = pd.DataFrame({'Date':['02.01.2017', '03.01.2017', '04.01.2017', '05.01.2017', '01.01.2017', '02.01.2017', '03.01.2017', '02.12.2017', '03.12.2017', '04.12.2017'],
                   'CustomerId':['02','02','02','02','03','03','03', '05', '05', '05'],
                   'Value':[0, 0, 10, 100, 0, 10000, 10000, 0, 0, 12312312],
                   'One':[1, 1, 1, 1, 1, 0, 0, 1, 0, 0]})

        Date CustomerId     Value  One  DateDiff
0 2017-01-02         02         0    1         0
1 2017-01-03         02         0    1         0
2 2017-01-04         02        10    1        -1
3 2017-01-05         02       100    1         0
4 2017-01-01         03         0    1         0
5 2017-01-02         03     10000    0         1
6 2017-01-03         03     10000    0         2
7 2017-12-02         05         0    1         0
8 2017-12-03         05         0    0         0
9 2017-12-04         05  12312312    0         2

最佳答案

我相信您需要最后一个值 DateOne == 1 的差异,以及每组 Value > 0 的所有值:

def dayDiff(groupby):
    if (not (groupby['One'] == 1).any()) or (not (groupby['Value'] > 0).any()):
        groupby['DateDiff'] = 0
        return groupby

    min_date = groupby.loc[groupby['One'] == 1, 'Date'].iloc[-1]
    max_date = groupby.loc[groupby['Value'] > 0, 'Date']
    delta = max_date - min_date
    groupby['DateDiff'] = delta.dt.days.reindex(groupby.index, fill_value=0)
    return groupby

df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df = df.groupby('CustomerId').apply(dayDiff)
print (df)
        Date CustomerId     Value  One  DateDiff
0 2017-01-02         02         0    1         0
1 2017-01-03         02         0    1         0
2 2017-01-04         02        10    0         1
3 2017-01-05         02       100    0         2
4 2017-01-01         03         0    1         0
5 2017-01-02         03     10000    0         1
6 2017-01-03         03     10000    0         2
7 2017-12-02         05         0    1         0
8 2017-12-03         05         0    0         0
9 2017-12-04         05  12312312    0         2

编辑:另一个想法是通过掩码过滤 groupby 之前的行,然后append 不匹配的行:

def dayDiff(groupby):
    if (not (groupby['One'] == 1).any()) or (not (groupby['Value'] > 0).any()):
        groupby['DateDiff'] = 0
        return groupby

    min_date = groupby.loc[groupby['One'] == 1, 'Date'].iloc[-1]
    max_date = groupby.loc[groupby['Value'] > 0, 'Date']
    delta = max_date - min_date
    groupby['DateDiff'] = delta.dt.days.reindex(groupby.index, fill_value=0)
    return groupby

df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
m1 = (df['One'] == 1) & (df['Value'] <= 0)
m2 = (df['Value'] > 0) & (df['One'] != 1)
mask = m1 | m2

df = df[mask].groupby('CustomerId').apply(dayDiff).append(df[~mask], sort=False).sort_index()
df['DateDiff'] = df['DateDiff'].fillna(0).astype(int)
print (df)
        Date CustomerId     Value  One  DateDiff
0 2017-01-02         02         0    1         0
1 2017-01-03         02         0    1         0
2 2017-01-04         02        10    1         0
3 2017-01-05         02       100    1         0
4 2017-01-01         03         0    1         0
5 2017-01-02         03     10000    0         1
6 2017-01-03         03     10000    0         2
7 2017-12-02         05         0    1         0
8 2017-12-03         05         0    0         0
9 2017-12-04         05  12312312    0         2

关于python - 计算给定两列值的天数差异,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57571505/

相关文章:

python - 不要跳过 pandas.read_excel() 中的空白行

python - 通过行比较迭代 pandas 列

c++ - 同时需要 typedef 和 class

计算阶乘的c程序不起作用

python - 格式化字典打印输出

python - os.system 与 linux 上 python 中的子进程

python - 从一系列 Pandas 时间戳中提取月份的最快方法

c++ - 为什么我可以在没有前向声明的情况下调用函数模板?

python - Scrapy - 发送 AJAX FormRequest 返回错误 419

python - 如何加速测试图像中滑动窗口对象检测的咖啡馆