我有兴趣以类似于以下的方式查找实验模拟数据的偏差:
my_frame = pd.DataFrame(data={'simulation1':[71,4.8,65,4.7],
'simulation2':[71,4.8,69,4.7],
'simulation3':[70,3.8,68,4.9],
'experiment':[70.3,3.5,65,4.4],
'Material':['Copper','Copper',
'Aluminum','Aluminum'],
'Property':['Temperature','Weight',
'Temperature','Weight']})
my_frame.set_index(keys=['Material','Property'], inplace=True)
simulation1 simulation2 simulation3 experiment
Material Property
Copper Temperature 71.0 71.0 70.0 70.3
Weight 4.8 4.8 3.8 3.5
Aluminum Temperature 65.0 69.0 68.0 65.0
Weight 4.7 4.7 4.9 4.4
我希望每个类别与引用列有偏差(在我的案例实验中)
simulation1 simulation2 simulation3 experiment
Material Property
Copper Temperature 71.0 71.0 70.0 70.3
Weight 4.8 4.8 3.8 3.5
ERROR(Weight-exp) 0.7 0.7 0.3 0.0
ERROR(Temp -exp) 1.3 1.3 0.3 0.0
Aluminum Temperature 65.0 69.0 68.0 65.0
Weight 4.7 4.7 4.9 4.4
ERROR(Weight-exp) 0.0 4.0 3.0 0.0
ERROR(Temp -exp) 0.3 0.3 0.5 0.0
我确信这可以在 pandas 中轻松完成,但我不确定如何实现。
最佳答案
通过DataFrame.sub
减去列实验
来创建新的DataFrame然后更改MultiIndex
:
df = my_frame.sub(my_frame['experiment'], axis=0)
a = df.index.get_level_values(0) + '_ERR'
b = df.index.get_level_values(1)
df.index = [a, b]
print (df)
simulation1 simulation2 simulation3 experiment
Material Property
Copper_ERR Temperature 0.7 0.7 -0.3 0.0
Weight 1.3 1.3 0.3 0.0
Aluminum_ERR Temperature 0.0 4.0 3.0 0.0
Weight 0.3 0.3 0.5 0.0
上次使用concat
与 DataFrame.sort_index
:
my_frame = pd.concat([my_frame, df]).sort_index()
print (my_frame)
simulation1 simulation2 simulation3 experiment
Material Property
Aluminum Temperature 65.0 69.0 68.0 65.0
Weight 4.7 4.7 4.9 4.4
Aluminum_ERR Temperature 0.0 4.0 3.0 0.0
Weight 0.3 0.3 0.5 0.0
Copper Temperature 71.0 71.0 70.0 70.3
Weight 4.8 4.8 3.8 3.5
Copper_ERR Temperature 0.7 0.7 -0.3 0.0
Weight 1.3 1.3 0.3 0.0
另一个具有更改二级的解决方案:
df = my_frame.sub(my_frame['experiment'], axis=0)
a = df.index.get_level_values(0)
b = 'ERROR(' + df.index.get_level_values(1) + '-exp)'
df.index = [a, b]
print (df)
simulation1 simulation2 simulation3 \
Material Property
Copper ERROR(Temperature-exp) 0.7 0.7 -0.3
ERROR(Weight-exp) 1.3 1.3 0.3
Aluminum ERROR(Temperature-exp) 0.0 4.0 3.0
ERROR(Weight-exp) 0.3 0.3 0.5
experiment
Material Property
Copper ERROR(Temperature-exp) 0.0
ERROR(Weight-exp) 0.0
Aluminum ERROR(Temperature-exp) 0.0
ERROR(Weight-exp) 0.0
my_frame = pd.concat([my_frame, df]).sort_index(ascending=False)
print (my_frame)
simulation1 simulation2 simulation3 \
Material Property
Copper Weight 4.8 4.8 3.8
Temperature 71.0 71.0 70.0
ERROR(Weight-exp) 1.3 1.3 0.3
ERROR(Temperature-exp) 0.7 0.7 -0.3
Aluminum Weight 4.7 4.7 4.9
Temperature 65.0 69.0 68.0
ERROR(Weight-exp) 0.3 0.3 0.5
ERROR(Temperature-exp) 0.0 4.0 3.0
experiment
Material Property
Copper Weight 3.5
Temperature 70.3
ERROR(Weight-exp) 0.0
ERROR(Temperature-exp) 0.0
Aluminum Weight 4.4
Temperature 65.0
ERROR(Weight-exp) 0.0
ERROR(Temperature-exp) 0.0
关于python - 如何获取多索引数据帧与引用值的偏差,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57574467/