python - 按年月分组并在 Python 中查找前 N 个最小值列

基于 this link: 的输出数据帧

import pandas as pd
import numpy as np

np.random.seed(2021)
dates = pd.date_range('20130226', periods=90)
df = pd.DataFrame(np.random.uniform(0, 10, size=(90, 6)), index=dates, columns=['A_values', 'B_values', 'C_values', 'D_values', 'E_values', 'target'])

# all your models
models = df.columns[df.columns.str.endswith('_values')]

# function to calculate mape
def mape(y_true, y_pred):
    y_pred = np.array(y_pred)
    return np.mean(np.abs(y_true - y_pred) / np.clip(np.abs(y_true), 1, np.inf),
                   axis=0)*100

errors = (df.groupby(pd.Grouper(freq='M'))
            .apply(lambda x: mape(x[models], x[['target']]))
         )
res = pd.merge_asof(df[['target']], errors, 
                             left_index=True, 
                             right_index=True,
                             direction='forward'
                            )
print(res)

输出:

              target    A_values    B_values    C_values    D_values   E_values
2013-02-26  1.281624   48.759348   77.023855  325.376455   74.422508  60.602101
2013-02-27  0.585713   48.759348   77.023855  325.376455   74.422508  60.602101
2013-02-28  9.638430   48.759348   77.023855  325.376455   74.422508  60.602101
2013-03-01  1.950960   98.909249  143.760594   90.051465  138.059241  93.461361
2013-03-02  0.690563   98.909249  143.760594   90.051465  138.059241  93.461361
             ...         ...         ...         ...         ...        ...
2013-05-22  5.554824  122.272490  139.420056  133.658101   62.368310  94.334362
2013-05-23  8.440801  122.272490  139.420056  133.658101   62.368310  94.334362
2013-05-24  0.968086  122.272490  139.420056  133.658101   62.368310  94.334362
2013-05-25  0.672555  122.272490  139.420056  133.658101   62.368310  94.334362
2013-05-26  5.273122  122.272490  139.420056  133.658101   62.368310  94.334362

如何按年月分组并找到最小的前 N 值列？

例如，如果我设置N=3，那么预期结果将是:

感谢您提前提供的帮助。

最佳答案

这是 argsort 的一种方法:

errors = (df.groupby(pd.Grouper(freq='M'))
            .apply(lambda x: mape(x[models], x[['target']]))
         )

k = 2            # your k here  


# filter top k models
sorted_args = np.argsort(errors, axis=1) < k

res = pd.merge_asof(df[['target']], sorted_args, 
                             left_index=True, 
                             right_index=True,
                             direction='forward'
                            )

topk = df[models].where(res[models])

然后topk看起来像:

            A_values  B_values  C_values  D_values  E_values
2013-02-26  6.059783       NaN       NaN  3.126731       NaN
2013-02-27  1.789931       NaN       NaN  7.843101       NaN
2013-02-28  9.623960       NaN       NaN  5.612724       NaN
2013-03-01       NaN       NaN  4.521452       NaN  5.693051
2013-03-02       NaN       NaN  5.178144       NaN  7.322250
...              ...       ...       ...       ...       ...
2013-05-22       NaN       NaN  0.427136       NaN  6.803052
2013-05-23       NaN       NaN  2.225667       NaN  2.756443
2013-05-24       NaN       NaN  7.212742       NaN  0.430184
2013-05-25       NaN       NaN  5.384490       NaN  5.461017
2013-05-26       NaN       NaN  9.823048       NaN  6.312104

关于python - 按年月分组并在 Python 中查找前 N 个最小值列，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/69937232/

python - 按年月分组并在 Python 中查找前 N 个最小值列

上一篇：r - 填写常量值，dplyr中满足条件时加1

下一篇：python - 没有异常时将回溯打印到文件