python - pandas.crosstab 切片并添加总计

我目前确实在处理数据帧方面遇到困难。通过运行代码:(res_sum = 数据帧名称)

summary_table = pd.crosstab(index=[res_sum["Type"],res_sum["Size"]],
                        columns=res_sum["Found"],margins=True)
summary_table = summary_table.div(summary_table["All"] / 100, axis=0)

结果:

Found                 Exact   Near     No    All
Type        Size                            
X           10          0.0    0.0  100.0  100.0
            100         0.0  100.0    0.0  100.0
            500         0.0  100.0    0.0  100.0
            1000        0.0  100.0    0.0  100.0
            5000        0.0  100.0    0.0  100.0
Y           10          0.0  100.0    0.0  100.0
            100         0.0    0.0  100.0  100.0
            500         0.0  100.0    0.0  100.0
            1000        0.0  100.0    0.0  100.0
            5000        0.0  100.0    0.0  100.0
....... (more)
All                     5.0   65.0   30.0  100.0

我想要这样的东西:

Found                 Exact   Near     No    All
Type        Size                            
X           10          0.0    0.0  100.0  100.0
            100         0.0  100.0    0.0  100.0
            500         0.0  100.0    0.0  100.0
            1000        0.0  100.0    0.0  100.0
            5000        0.0  100.0    0.0  100.0
Total X                 0.0   80.0   20.0
Y           10          0.0  100.0    0.0  100.0
            100         0.0    0.0  100.0  100.0
            500         0.0  100.0    0.0  100.0
            1000        0.0  100.0    0.0  100.0
            5000        0.0  100.0    0.0  100.0
Total Y                 0.0   80.0   20.0
.......(more)
All                     5.0   65.0   30.0  100.0

这在 pd.crosstab 中似乎不可能，所以我尝试制作每种类型的子集，然后再次将数据帧粘贴在一起。它有点有效，但它删除了所有总数。例如代码:

x5 = summary_table.loc(axis=0)[['X'], slice(None)]
x6 = summary_table.loc(axis=0)[['Y'], slice(None)]

frames = [x5, x6]
result = pd.concat(frames)

结果是，它完全忽略了 pd.crosstable 中的“margins=True”。将“margins=True”添加到 dataframe.loc 不起作用。

Found                 Exact   Near     No    All
Type        Size                            
X           10          0.0    0.0  100.0  100.0
            100         0.0    0.0  100.0  100.0
            500       100.0    0.0    0.0  100.0
            1000        0.0  100.0    0.0  100.0
            5000        0.0  100.0    0.0  100.0
Y           10          0.0    0.0  100.0  100.0
            100         0.0  100.0    0.0  100.0
            500         0.0  100.0    0.0  100.0
            1000        0.0  100.0    0.0  100.0
            5000        0.0  100.0    0.0  100.0

稍微清楚一点的是，我需要这个的原因是，我需要每种类型的摘要，并且将来在类型中每种尺寸都会有更多值(因此不再是所有内容都是 100.0%)。谁能帮我组织这些数据框？ (此外，如果删除每行末尾的“全部”，我会非常高兴。看来我只能添加两个“全部”列，即使我只需要列。)

按要求编辑:

我使用的数据片段(我在问题中将类型更改为 X、Y、Z，但这些也可以正常工作:

Found   Size    Type
Exact   500     INV
Near    100     DEL
Near    500     DEL
Near    1000    DEL
Near    5000    DEL
Near    100     INS
Near    500     INS
Near    1000    INS
Near    1000    INV
Near    5000    INV
Near    10      DUP
Near    500     DUP
Near    1000    DUP
Near    5000    DUP
No      10      DEL
No      10      INS
No      5000    INS
No      10      INV
No      100     INV
No      100     DUP

最佳答案

您可以使用subtotals = df.groupby(level=['Type']).mean() 计算小计。然后

label_order = ['{}{}'.format(pre,label) for label in subtotals.index 
                                        for pre in ['', 'Total_']] + ['All']

生成所需的标签顺序。最后，df = df.loc[label_order] 对行重新排序:

import pandas as pd
import numpy as np
nan = np.nan
df = pd.DataFrame({'All': [100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, nan], 'Exact': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 65.0], 'Near': [0.0, 100.0, 100.0, 100.0, 100.0, 100.0, 0.0, 100.0, 100.0, 100.0, 30.0], 'No': [100.0, 0.0, 0.0, 0.0, 0.0, 0.0, 100.0, 0.0, 0.0, 0.0, 100.0], 'Size': [10.0, 100.0, 500.0, 1000.0, 5000.0, 10.0, 100.0, 500.0, 1000.0, 5000.0, 5.0], 'Type': ['X', 'X', 'X', 'X', 'X', 'Y', 'Y', 'Y', 'Y', 'Y', 'All']})

df = df.set_index(['Type','Size'])
df.columns.name = 'Found'

subtotals = df.groupby(level=['Type']).mean()
subtotals = subtotals.loc[subtotals.index != 'All']
label_order = ['{}{}'.format(pre,label) for label in subtotals.index for pre in ['', 'Total_']] + ['All']
subtotals.index = ['Total_{}'.format(label) for label in subtotals.index]
subtotals['Size'] = ''

df = pd.concat([df.reset_index('Size'), subtotals], axis=0, sort=False)
df = df.loc[label_order]
df = df.set_index('Size', append=True)

产量

                  All  Exact   Near     No
        Size                              
X       10.0    100.0    0.0    0.0  100.0
        100.0   100.0    0.0  100.0    0.0
        500.0   100.0    0.0  100.0    0.0
        1000.0  100.0    0.0  100.0    0.0
        5000.0  100.0    0.0  100.0    0.0
Total_X         100.0    0.0   80.0   20.0
Y       10.0    100.0    0.0  100.0    0.0
        100.0   100.0    0.0    0.0  100.0
        500.0   100.0    0.0  100.0    0.0
        1000.0  100.0    0.0  100.0    0.0
        5000.0  100.0    0.0  100.0    0.0
Total_Y         100.0    0.0   80.0   20.0
All     5.0       NaN   65.0   30.0  100.0

关于python - pandas.crosstab 切片并添加总计，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/52704580/

python - pandas.crosstab 切片并添加总计

上一篇：python - 使用 Python 库生成有向图任何 python 库

下一篇：python - Pandas:解释表格摘要中的条目差异和特定列值

python - pandas.crosstab 切片并添加总计

上一篇：python - 使用 Python 库生成有向图 任何 python 库

下一篇：python - Pandas:解释表格摘要中的条目差异和特定列值

上一篇：python - 使用 Python 库生成有向图任何 python 库