我目前确实在处理数据帧方面遇到困难。 通过运行代码:(res_sum = 数据帧名称)
summary_table = pd.crosstab(index=[res_sum["Type"],res_sum["Size"]],
columns=res_sum["Found"],margins=True)
summary_table = summary_table.div(summary_table["All"] / 100, axis=0)
结果:
Found Exact Near No All
Type Size
X 10 0.0 0.0 100.0 100.0
100 0.0 100.0 0.0 100.0
500 0.0 100.0 0.0 100.0
1000 0.0 100.0 0.0 100.0
5000 0.0 100.0 0.0 100.0
Y 10 0.0 100.0 0.0 100.0
100 0.0 0.0 100.0 100.0
500 0.0 100.0 0.0 100.0
1000 0.0 100.0 0.0 100.0
5000 0.0 100.0 0.0 100.0
....... (more)
All 5.0 65.0 30.0 100.0
我想要这样的东西:
Found Exact Near No All
Type Size
X 10 0.0 0.0 100.0 100.0
100 0.0 100.0 0.0 100.0
500 0.0 100.0 0.0 100.0
1000 0.0 100.0 0.0 100.0
5000 0.0 100.0 0.0 100.0
Total X 0.0 80.0 20.0
Y 10 0.0 100.0 0.0 100.0
100 0.0 0.0 100.0 100.0
500 0.0 100.0 0.0 100.0
1000 0.0 100.0 0.0 100.0
5000 0.0 100.0 0.0 100.0
Total Y 0.0 80.0 20.0
.......(more)
All 5.0 65.0 30.0 100.0
这在 pd.crosstab 中似乎不可能,所以我尝试制作每种类型的子集,然后再次将数据帧粘贴在一起。它有点有效,但它删除了所有总数。例如代码:
x5 = summary_table.loc(axis=0)[['X'], slice(None)]
x6 = summary_table.loc(axis=0)[['Y'], slice(None)]
frames = [x5, x6]
result = pd.concat(frames)
结果是,它完全忽略了 pd.crosstable 中的“margins=True”。将“margins=True”添加到 dataframe.loc 不起作用。
Found Exact Near No All
Type Size
X 10 0.0 0.0 100.0 100.0
100 0.0 0.0 100.0 100.0
500 100.0 0.0 0.0 100.0
1000 0.0 100.0 0.0 100.0
5000 0.0 100.0 0.0 100.0
Y 10 0.0 0.0 100.0 100.0
100 0.0 100.0 0.0 100.0
500 0.0 100.0 0.0 100.0
1000 0.0 100.0 0.0 100.0
5000 0.0 100.0 0.0 100.0
稍微清楚一点的是,我需要这个的原因是,我需要每种类型的摘要,并且将来在类型中每种尺寸都会有更多值(因此不再是所有内容都是 100.0%)。谁能帮我组织这些数据框? (此外,如果删除每行末尾的“全部”,我会非常高兴。看来我只能添加两个“全部”列,即使我只需要列。)
按要求编辑:
我使用的数据片段(我在问题中将类型更改为 X、Y、Z,但这些也可以正常工作:
Found Size Type
Exact 500 INV
Near 100 DEL
Near 500 DEL
Near 1000 DEL
Near 5000 DEL
Near 100 INS
Near 500 INS
Near 1000 INS
Near 1000 INV
Near 5000 INV
Near 10 DUP
Near 500 DUP
Near 1000 DUP
Near 5000 DUP
No 10 DEL
No 10 INS
No 5000 INS
No 10 INV
No 100 INV
No 100 DUP
最佳答案
您可以使用subtotals = df.groupby(level=['Type']).mean()
计算小计。然后
label_order = ['{}{}'.format(pre,label) for label in subtotals.index
for pre in ['', 'Total_']] + ['All']
生成所需的标签顺序。最后,df = df.loc[label_order]
对行重新排序:
import pandas as pd
import numpy as np
nan = np.nan
df = pd.DataFrame({'All': [100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, nan], 'Exact': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 65.0], 'Near': [0.0, 100.0, 100.0, 100.0, 100.0, 100.0, 0.0, 100.0, 100.0, 100.0, 30.0], 'No': [100.0, 0.0, 0.0, 0.0, 0.0, 0.0, 100.0, 0.0, 0.0, 0.0, 100.0], 'Size': [10.0, 100.0, 500.0, 1000.0, 5000.0, 10.0, 100.0, 500.0, 1000.0, 5000.0, 5.0], 'Type': ['X', 'X', 'X', 'X', 'X', 'Y', 'Y', 'Y', 'Y', 'Y', 'All']})
df = df.set_index(['Type','Size'])
df.columns.name = 'Found'
subtotals = df.groupby(level=['Type']).mean()
subtotals = subtotals.loc[subtotals.index != 'All']
label_order = ['{}{}'.format(pre,label) for label in subtotals.index for pre in ['', 'Total_']] + ['All']
subtotals.index = ['Total_{}'.format(label) for label in subtotals.index]
subtotals['Size'] = ''
df = pd.concat([df.reset_index('Size'), subtotals], axis=0, sort=False)
df = df.loc[label_order]
df = df.set_index('Size', append=True)
产量
All Exact Near No
Size
X 10.0 100.0 0.0 0.0 100.0
100.0 100.0 0.0 100.0 0.0
500.0 100.0 0.0 100.0 0.0
1000.0 100.0 0.0 100.0 0.0
5000.0 100.0 0.0 100.0 0.0
Total_X 100.0 0.0 80.0 20.0
Y 10.0 100.0 0.0 100.0 0.0
100.0 100.0 0.0 0.0 100.0
500.0 100.0 0.0 100.0 0.0
1000.0 100.0 0.0 100.0 0.0
5000.0 100.0 0.0 100.0 0.0
Total_Y 100.0 0.0 80.0 20.0
All 5.0 NaN 65.0 30.0 100.0
关于python - pandas.crosstab 切片并添加总计,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52704580/