python - Pandas 广泛的 'describe' 包括计算空值

我有一个由 450 列和 550 000 行组成的大型数据框。在我的专栏中:

73 个 float 列
30 列日期
对象中的剩余列

我想对我的变量进行描述，但不仅要像往常一样描述，还要在同一个矩阵中包含其他描述。最后，我们将有一个包含 450 个变量的描述矩阵，然后是以下的详细描述: - 类型 - 数数 - 计算空值 - 空值的百分比 - 最大限度 - 分钟 - 50% - 75% - 25% ——……

现在，我只有一个基本函数来描述我的数据:

Dataframe.describe(include = 'all')

您是否有功能或方法来进行更广泛的描述。

谢谢。

最佳答案

您需要为Series 编写自定义函数，然后添加到final describe DataFrame:

注意:

最终 df 的第一行是 count - 使用函数 count计算非 NaN 值

df = pd.DataFrame({
        'A':list('abcdef'),
         'B':[4,np.nan,np.nan,5,5,4],
         'C':[7,8,9,4,2,3],
         'D':[1,3,5,7,1,0],
         'E':[5,3,6,9,2,4],
         'F':list('aaabbb')
})

print (df)
   A    B  C  D  E  F
0  a  4.0  7  1  5  a
1  b  NaN  8  3  3  a
2  c  NaN  9  5  6  a
3  d  5.0  4  7  9  b
4  e  5.0  2  1  2  b
5  f  4.0  3  0  4  b

df1 = df.describe(include = 'all')

df1.loc['dtype'] = df.dtypes
df1.loc['size'] = len(df)
df1.loc['% count'] = df.isnull().mean()

print (df1)
              A         B        C        D        E       F
count         6         4        6        6        6       6
unique        6       NaN      NaN      NaN      NaN       2
top           e       NaN      NaN      NaN      NaN       b
freq          1       NaN      NaN      NaN      NaN       3
mean        NaN       4.5      5.5  2.83333  4.83333     NaN
std         NaN   0.57735  2.88097  2.71416  2.48328     NaN
min         NaN         4        2        0        2     NaN
25%         NaN         4     3.25        1     3.25     NaN
50%         NaN       4.5      5.5        2      4.5     NaN
75%         NaN         5     7.75      4.5     5.75     NaN
max         NaN         5        9        7        9     NaN
dtype    object   float64    int64    int64    int64  object
size          6         6        6        6        6       6
% count       0  0.333333        0        0        0       0

关于python - Pandas 广泛的 'describe' 包括计算空值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/53173927/

python - Pandas 广泛的 'describe' 包括计算空值

上一篇：python - (Python 3) 使用唯一元素作为分隔符拆分列表

下一篇：python - 在 pandas 中填充 DataFrame