python - Pandas apply() 自定义函数使用多个列作为 "input"

也许查看这个简单的示例将帮助您理解我尝试做的事情:

import pandas as pd
df = pd.DataFrame({"A": [10,20,30,50,70,40], "B": [20,30,10,15,20,30]})


def _custom_function(X):    
    # whatever... just for the purpose of the example
    # but I need X to be the actual df and not a series

    Y = sum((X['A'] / X['B']) + (0.2 * X['B']))   
    return Y


df['C'] = df.rolling(2).apply(_custom_function, axis=0)

当调用自定义函数时，X是Series类型，并且只有df的第一列。是否可以通过 apply 函数传递 df ？

编辑:可以使用rolling().apply():

import pandas as pd
df = pd.DataFrame({"A": [10,20,30,50,70,40], "B": [20,30,10,15,20,30]})


def _custom_function(X):    
    # whatever... just for the purpose of the example
    Y = sum(0.2 * X)    
    return Y


df['C'] = df['A'].rolling(2).apply(_custom_function)

第二次编辑:滚动列表理解的行为不符合预期

for x in df.rolling(3):
    print(x)

正如您在下面的示例中看到的，两种方法不会给出相同的输出:

import pandas as pd
df = pd.DataFrame({"A": [10,20,30,50,70,40], "B": [20,30,10,15,20,30]})
df['C'] = 0.2


def _custom_function_df(X):    
    # whatever... just for the purpose of the example
    # but I need X to be the actual df and not a series
    Y = sum(X['C'] * X['B'])
    return Y

def _custom_function_series(X):    
    # whatever... just for the purpose of the example
    # but I need X to be the actual df and not a series
    Y = sum(0.2 * X)
    return Y


df['result'] = df['B'].rolling(3).apply(_custom_function_series)

df['result2'] = [x.pipe(_custom_function_df) for x in df.rolling(3, min_periods=3)]

列表推导式滚动输出第一行(没有预期的 NaN)，但仅在滚动窗口 len(x) = 3 之后开始正确的滚动。

提前致谢!

最佳答案

将 DataFrame 传递给函数:

df['C'] = _custom_function(df)

或者使用DataFrame.pipe :

df['C'] = df.pipe(_custom_function)

print (df)
    A   B         C
0  10  20  4.500000
1  20  30  6.666667
2  30  10  5.000000
3  50  15  6.333333
4  70  20  7.500000
5  40  30  7.333333

编辑:Rolling.apply每列单独工作，因此不能在此处使用。

可能的解决方案:

df['C'] = [x.pipe(_custom_function) for x in df.rolling(2)]
print (df)
    A   B          C
0  10  20   4.500000
1  20  30  11.166667
2  30  10  11.666667
3  50  15  11.333333
4  70  20  13.833333
5  40  30  14.833333

编辑:如果似乎有错误，默认滚动的工作方式类似于min_periods=1。

这是解决方案(黑客):

df['result'] = df['B'].rolling(3).apply(_custom_function_series)

df['result2']=[x.pipe(_custom_function_df) if len(x)==3 else np.nan for x in df.rolling(3)]

print (df)
    A   B    C  result  result2
0  10  20  0.2     NaN      NaN
1  20  30  0.2     NaN      NaN
2  30  10  0.2    12.0     12.0
3  50  15  0.2    11.0     11.0
4  70  20  0.2     9.0      9.0
5  40  30  0.2    13.0     13.0

关于python - Pandas apply() 自定义函数使用多个列作为 "input"，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/66762002/

python - Pandas apply() 自定义函数使用多个列作为 "input"

上一篇：php - 根据订单商品数量向 WooCommerce 订单添加到期日期

下一篇：c# - 我希望 REGEX 匹配每个字母之间的点 C#