python - 按年月分组并在 Python 中查找前 N 个最小值列

标签 python pandas dataframe numpy

基于 this link: 的输出数据帧

import pandas as pd
import numpy as np

np.random.seed(2021)
dates = pd.date_range('20130226', periods=90)
df = pd.DataFrame(np.random.uniform(0, 10, size=(90, 6)), index=dates, columns=['A_values', 'B_values', 'C_values', 'D_values', 'E_values', 'target'])

# all your models
models = df.columns[df.columns.str.endswith('_values')]

# function to calculate mape
def mape(y_true, y_pred):
    y_pred = np.array(y_pred)
    return np.mean(np.abs(y_true - y_pred) / np.clip(np.abs(y_true), 1, np.inf),
                   axis=0)*100

errors = (df.groupby(pd.Grouper(freq='M'))
            .apply(lambda x: mape(x[models], x[['target']]))
         )
res = pd.merge_asof(df[['target']], errors, 
                             left_index=True, 
                             right_index=True,
                             direction='forward'
                            )
print(res)

输出:

              target    A_values    B_values    C_values    D_values   E_values
2013-02-26  1.281624   48.759348   77.023855  325.376455   74.422508  60.602101
2013-02-27  0.585713   48.759348   77.023855  325.376455   74.422508  60.602101
2013-02-28  9.638430   48.759348   77.023855  325.376455   74.422508  60.602101
2013-03-01  1.950960   98.909249  143.760594   90.051465  138.059241  93.461361
2013-03-02  0.690563   98.909249  143.760594   90.051465  138.059241  93.461361
             ...         ...         ...         ...         ...        ...
2013-05-22  5.554824  122.272490  139.420056  133.658101   62.368310  94.334362
2013-05-23  8.440801  122.272490  139.420056  133.658101   62.368310  94.334362
2013-05-24  0.968086  122.272490  139.420056  133.658101   62.368310  94.334362
2013-05-25  0.672555  122.272490  139.420056  133.658101   62.368310  94.334362
2013-05-26  5.273122  122.272490  139.420056  133.658101   62.368310  94.334362

如何按年月分组并找到最小的前 N ​​值列?

例如,如果我设置N=3,那么预期结果将是:

enter image description here

感谢您提前提供的帮助。

最佳答案

这是 argsort 的一种方法:

errors = (df.groupby(pd.Grouper(freq='M'))
            .apply(lambda x: mape(x[models], x[['target']]))
         )

k = 2            # your k here  


# filter top k models
sorted_args = np.argsort(errors, axis=1) < k

res = pd.merge_asof(df[['target']], sorted_args, 
                             left_index=True, 
                             right_index=True,
                             direction='forward'
                            )

topk = df[models].where(res[models])

然后topk看起来像:

            A_values  B_values  C_values  D_values  E_values
2013-02-26  6.059783       NaN       NaN  3.126731       NaN
2013-02-27  1.789931       NaN       NaN  7.843101       NaN
2013-02-28  9.623960       NaN       NaN  5.612724       NaN
2013-03-01       NaN       NaN  4.521452       NaN  5.693051
2013-03-02       NaN       NaN  5.178144       NaN  7.322250
...              ...       ...       ...       ...       ...
2013-05-22       NaN       NaN  0.427136       NaN  6.803052
2013-05-23       NaN       NaN  2.225667       NaN  2.756443
2013-05-24       NaN       NaN  7.212742       NaN  0.430184
2013-05-25       NaN       NaN  5.384490       NaN  5.461017
2013-05-26       NaN       NaN  9.823048       NaN  6.312104

关于python - 按年月分组并在 Python 中查找前 N 个最小值列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69937232/

相关文章:

python - 在 Tkinter Gui 中创建框架类

python - 有没有办法强制 ODEINT 在 Python 中使用特定算法?

python - 为什么不建议在conda基础环境中安装额外的包?它的目的是什么?

python - 从 json 转换为 dataframe 再到 sql

python - 如何使用pyspark创建包含大量列和日期数据的数据框?

python - _mysql_exceptions.ProgrammingError - 如何处理?

python - 我如何删除 Pandas 中的过滤器数据(数据处理)

python - 将计算列附加到现有数据框

python - 如何在 python 中将一列整数转换为标准小时时间?

python - 如何基于数值变量创建分类变量