python - pandas describe() reshape 为具有列名的一行

我正在为机器学习算法生成一些特征，并且我想从数据帧中计算一些统计数据，例如 describe()确实如此。

这里是示例代码:

df = pd.DataFrame({'A' : [1,np.nan,3], 'B' : [20,30,40]})
print(df)

df_t = df.describe()
print(type(df_t))
print(df_t)
print(df_t.columns)
print(df_t.index)

输出:

     A   B
0  1.0  20
1  NaN  30
2  3.0  40
<class 'pandas.core.frame.DataFrame'>
              A     B
count  2.000000   3.0
mean   2.000000  30.0
std    1.414214  10.0
min    1.000000  20.0
25%    1.500000  25.0
50%    2.000000  30.0
75%    2.500000  35.0
max    3.000000  40.0
Index(['A', 'B'], dtype='object')
Index(['count', 'mean', 'std', 'min', '25%', '50%', '75%', 'max'], dtype='object')

以下是问题:

如何 reshape describe 的结果到一行，名称如 A_count,A_mean,...,B_75%,B_max ？
执行相同操作但使用一些自定义函数而不是 describe 的最佳方法是什么？，例如我想添加 np.median和np.percentile分别为 20% 和 80%。

最佳答案

要获取一列，请使用stack:

In [11]: df_s = df_t.stack()

In [12]: df_s.index = df_s.index.map("_".join)

In [13]: df_s
Out[13]:
count_A     2.000000
count_B     3.000000
mean_A      2.000000
mean_B     30.000000
std_A       1.414214
std_B      10.000000
min_A       1.000000
min_B      20.000000
25%_A       1.500000
25%_B      25.000000
50%_A       2.000000
50%_B      30.000000
75%_A       2.500000
75%_B      35.000000
max_A       3.000000
max_B      40.000000
dtype: float64

尽管...目前还不清楚您为什么要这样做(您可能不想这样做)。

<小时/>

您可以将 percentile 参数传递给 describe :

In [21]: df.describe(percentiles=[0.2, 0.8])
Out[21]:
              A     B
count  2.000000   3.0
mean   2.000000  30.0
std    1.414214  10.0
min    1.000000  20.0
20%    1.400000  24.0
50%    2.000000  30.0
80%    2.600000  36.0
max    3.000000  40.0

关于python - pandas describe() reshape 为具有列名的一行，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/50204825/

python - pandas describe() reshape 为具有列名的一行

上一篇：python - 神经网络没有经过训练，交叉熵保持不变

下一篇：python - 对正则化数据使用 SciPy fmin_bfgs() 发出警告