我正在尝试处理包含大量列 (505) 的数据框,并且我只想选择每个月的前 5 个值。 您将在下面找到我的 DataFrame 图像的链接。
这是示例:
Dates 1 2 3 4 5 6
2002-07-31 -31.710916 NaN -5.208684 -29.773404 NaN -7.308558
2002-08-31 -44.941351 NaN 3.665286 -23.987135 NaN 3.134669
2002-09-30 -36.725548 NaN 4.114474 -19.536571 NaN -0.986986
2002-10-31 -25.377286 NaN -0.486158 -5.887594 NaN -0.787117
2002-11-30 19.766328 NaN -5.298877 -10.672174 NaN -21.057946
2002-12-31 1.996514 NaN -7.570497 -9.257122 NaN -19.630112
2003-01-31 -0.366083 NaN -14.124492 -5.434475 NaN -8.053424
2003-02-28 -17.869297 NaN -20.075997 1.009837 NaN -11.616974
我该怎么做?我已经尝试过 df.max(axis=1) 但我想在最大值后添加 4 个其他值。 感谢您的帮助
最佳答案
我假设您希望每行最多 5 列,因为这是我解释您的问题的方式。以下代码在示例输入中选择最多 2 行,因为它只有 4 个非 nan 列。
import io
import re
import pandas as pd
# First read in the data you supplied.
data=io.StringIO(re.sub(" +","\t",
"""Dates 1 2 3 4 5 6
2002-07-31 -31.710916 NaN -5.208684 -29.773404 NaN -7.308558
2002-08-31 -44.941351 NaN 3.665286 -23.987135 NaN 3.134669
2002-09-30 -36.725548 NaN 4.114474 -19.536571 NaN -0.986986
2002-10-31 -25.377286 NaN -0.486158 -5.887594 NaN -0.787117
2002-11-30 19.766328 NaN -5.298877 -10.672174 NaN -21.057946
2002-12-31 1.996514 NaN -7.570497 -9.257122 NaN -19.630112
2003-01-31 -0.366083 NaN -14.124492 -5.434475 NaN -8.053424
2003-02-28 -17.869297 NaN -20.075997 1.009837 NaN -11.616974"""))
df = pd.read_csv(data,sep="\t")
# Then we preprocess the data, so it is in a long format instead of a wide
df = df.melt(id_vars='Dates',var_name='Column_name',value_name='Value')
# Finally extract the top 2 values for each date, but first set the index so the output knows what column the input came from
print(df.set_index('Column_name').groupby('Dates')['Value'].apply(lambda grp: grp.nlargest(2)))
输出为
Dates Column_name
2002-07-31 3 -5.208684
6 -7.308558
2002-08-31 3 3.665286
6 3.134669
2002-09-30 3 4.114474
6 -0.986986
2002-10-31 3 -0.486158
6 -0.787117
2002-11-30 1 19.766328
3 -5.298877
2002-12-31 1 1.996514
3 -7.570497
2003-01-31 1 -0.366083
4 -5.434475
2003-02-28 4 1.009837
6 -11.616974
Name: Value, dtype: float64
除非您更加明确地表达您想要的输出,否则很难给出更合适的答案。
关于python - 显示每月 DataFrame 的前 5 个最大值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62004150/