Python Pandas : How to subtract values in two non-consecutive rows in a specific column of a dataframe from one another

我试图通过减去同一 df 中不同列中两个非连续行的值来填充 Pandas df 中新列中的值。我可以做到，只要 df 中没有包含日期的列即可。但如果它确实有一列包含日期，那么 pandas 就会抛出错误。

假设以下数据框。

import pandas as pd
import numpy as np

df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 55, 9], [10, 99, 19], [27, 38, 29], [39, 10, 72]]),
                   columns=['a', 'b', 'c'])
df['Date'] = ['2020-01-02', '2020-01-05', '2020-06-10', '2020-08-05', '2020-09-01', '2020-10-29']
df['Date'] = pd.to_datetime(df['Date'])

df['d'] = ''
df = df[['Date', 'a', 'b', 'c', 'd']]

这给了我一个看起来像这样的 df:

    Date        a   b   c   d
0   2020-01-02  1   2   3   
1   2020-01-05  4   5   6   
2   2020-06-10  7   55  9   
3   2020-08-05  10  99  19  
4   2020-09-01  27  38  29  
5   2020-10-29  39  10  72

我正在尝试创建一个新列“d”，对于每一行，从相关行中减去下面两行的“b”列中的值。例如，行 [0]、列 ['d'] 中的值将计算为 df.loc[2]['b'] - df.loc[0]['b']。

我正在尝试(不起作用)是:

for i in range(len(df)-2):
    df.loc[i]['d'] = df.loc[i+2]['b'] - df.loc[i]['b']

如果我在 df 中没有日期，我可以让它工作。但是当我添加带有日期的列时，它会抛出一条错误消息:

A value is trying to be set on a copy of a slice from a DataFrame

我不明白为什么日期列会导致 df 无法对仅包含 int64 数据的列进行数学运算。我尝试搜索该网站，但似乎无法解决问题。任何帮助将不胜感激。

最佳答案

您可以使用 shift 以矢量化形式完成此操作(这比使用循环快得多):

df['d'] = df['b'].shift(-2) - df['b']
df

输出:

        Date   a   b   c     d
0 2020-01-02   1   2   3  53.0
1 2020-01-05   4   5   6  94.0
2 2020-06-10   7  55   9 -17.0
3 2020-08-05  10  99  19 -89.0
4 2020-09-01  27  38  29   NaN
5 2020-10-29  39  10  72   NaN

关于Python Pandas : How to subtract values in two non-consecutive rows in a specific column of a dataframe from one another，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/66481302/

Python Pandas : How to subtract values in two non-consecutive rows in a specific column of a dataframe from one another

上一篇：python - Pandas 的重复行填写日期

下一篇：正则表达式检测优先股票代码