python - 宽数据的摘要统计

是否有一种有效的方法来计算每个水果行为 True 的汇总统计数据？

df   comment  type      score    apple   banana   pear   
0     dfsd    new        0.4     True    False    True     
1     sdfs    low        0.3     False   True     False 
2     sdddfs   low       0.2     False   True     False    
3     sdsfs    low       0.8     True    True     False    
4     ddds    low        0.1     True    True     True

...

我试过:

fruits = ['apple','banana','pear']

for fruit in fruits:
    df1 = df.loc[df.f'{fruit}', :]
    df1.describe()

预期输出:

fruit
        count     mean_score   std_score  
apple               
banana              
pear

最佳答案

选择所需的 fruits 列，然后为每个 fruit 列获取相应的 score 并屏蔽 False 值，最后使用 describe 获取描述性统计数据:

s = ['count', 'mean', 'std']
stats = df[fruits].apply(lambda m: df['score'].mask(~m)).describe().T[s]

print(stats)

        count      mean       std
apple     3.0  0.433333  0.351188
banana    4.0  0.350000  0.310913
pear      2.0  0.250000  0.212132

关于python - 宽数据的摘要统计，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/63859501/

上一篇：python - 垂直 "broken"条形图，以数组作为条形高度和颜色编码

下一篇：html - RTL 中的星级评定系统

相关文章：

python - 性能 - 为什么具有范围的素数生成算法比使用素数列表快得多？

python-3.x - 安装 assimulo 和日晷 - 错误

python - 检查 Pandas 数据框的异常值

Python:函数中 int 减 1

python - 如何很好地打印带有编号的列和行的数组(在Python中)？

android - 无法找到 tools.jar Fedora

Python Beautiful Soup 4 使用 .select() 获取元素的子元素

python - QTreeWidget 关闭选择

python - 查找列名包含特定字符串的所有行

python - 如何根据Python中的时间戳索引将行转置为单列？