python - 如何获取多索引数据帧与引用值的偏差

标签 python pandas

我有兴趣以类似于以下的方式查找实验模拟数据的偏差:

my_frame = pd.DataFrame(data={'simulation1':[71,4.8,65,4.7],
                              'simulation2':[71,4.8,69,4.7],
                              'simulation3':[70,3.8,68,4.9],
                              'experiment':[70.3,3.5,65,4.4],
                              'Material':['Copper','Copper',
                                        'Aluminum','Aluminum'],
                              'Property':['Temperature','Weight',
                                        'Temperature','Weight']})
my_frame.set_index(keys=['Material','Property'], inplace=True)


                         simulation1 simulation2 simulation3 experiment
Material    Property                
Copper      Temperature  71.0        71.0        70.0        70.3
Weight                   4.8         4.8         3.8         3.5
Aluminum    Temperature  65.0        69.0        68.0        65.0
Weight                   4.7         4.7         4.9         4.4

我希望每个类别与引用列有偏差(在我的案例实验中)

                         simulation1 simulation2 simulation3 experiment
Material    Property                
Copper      Temperature  71.0        71.0        70.0        70.3
Weight                   4.8         4.8         3.8         3.5
ERROR(Weight-exp)        0.7         0.7         0.3         0.0
ERROR(Temp  -exp)        1.3         1.3         0.3         0.0

Aluminum    Temperature  65.0        69.0        68.0        65.0
Weight                   4.7         4.7         4.9         4.4
ERROR(Weight-exp)        0.0         4.0         3.0         0.0
ERROR(Temp  -exp)        0.3         0.3         0.5         0.0

我确信这可以在 pandas 中轻松完成,但我不确定如何实现。

最佳答案

通过DataFrame.sub减去列实验来创建新的DataFrame然后更改MultiIndex:

df = my_frame.sub(my_frame['experiment'], axis=0)
a = df.index.get_level_values(0) + '_ERR'
b = df.index.get_level_values(1)

df.index = [a, b]
print (df)
                          simulation1  simulation2  simulation3  experiment
Material     Property                                                      
Copper_ERR   Temperature          0.7          0.7         -0.3         0.0
             Weight               1.3          1.3          0.3         0.0
Aluminum_ERR Temperature          0.0          4.0          3.0         0.0
             Weight               0.3          0.3          0.5         0.0

上次使用concatDataFrame.sort_index :

my_frame = pd.concat([my_frame, df]).sort_index()
print (my_frame)
                          simulation1  simulation2  simulation3  experiment
Material     Property                                                      
Aluminum     Temperature         65.0         69.0         68.0        65.0
             Weight               4.7          4.7          4.9         4.4
Aluminum_ERR Temperature          0.0          4.0          3.0         0.0
             Weight               0.3          0.3          0.5         0.0
Copper       Temperature         71.0         71.0         70.0        70.3
             Weight               4.8          4.8          3.8         3.5
Copper_ERR   Temperature          0.7          0.7         -0.3         0.0
             Weight               1.3          1.3          0.3         0.0

另一个具有更改二级的解决方案:

df = my_frame.sub(my_frame['experiment'], axis=0)
a = df.index.get_level_values(0)
b = 'ERROR(' + df.index.get_level_values(1) + '-exp)'

df.index = [a, b]
print (df)
                                 simulation1  simulation2  simulation3  \
Material Property                                                        
Copper   ERROR(Temperature-exp)          0.7          0.7         -0.3   
         ERROR(Weight-exp)               1.3          1.3          0.3   
Aluminum ERROR(Temperature-exp)          0.0          4.0          3.0   
         ERROR(Weight-exp)               0.3          0.3          0.5   

                                 experiment  
Material Property                            
Copper   ERROR(Temperature-exp)         0.0  
         ERROR(Weight-exp)              0.0  
Aluminum ERROR(Temperature-exp)         0.0  
         ERROR(Weight-exp)              0.0  

my_frame = pd.concat([my_frame, df]).sort_index(ascending=False)
print (my_frame)
                                 simulation1  simulation2  simulation3  \
Material Property                                                        
Copper   Weight                          4.8          4.8          3.8   
         Temperature                    71.0         71.0         70.0   
         ERROR(Weight-exp)               1.3          1.3          0.3   
         ERROR(Temperature-exp)          0.7          0.7         -0.3   
Aluminum Weight                          4.7          4.7          4.9   
         Temperature                    65.0         69.0         68.0   
         ERROR(Weight-exp)               0.3          0.3          0.5   
         ERROR(Temperature-exp)          0.0          4.0          3.0   

                                 experiment  
Material Property                            
Copper   Weight                         3.5  
         Temperature                   70.3  
         ERROR(Weight-exp)              0.0  
         ERROR(Temperature-exp)         0.0  
Aluminum Weight                         4.4  
         Temperature                   65.0  
         ERROR(Weight-exp)              0.0  
         ERROR(Temperature-exp)         0.0  

关于python - 如何获取多索引数据帧与引用值的偏差,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57574467/

相关文章:

Python - 复制jpg文件时出错

python - 查找最大和最小日期

python - 在 Python 中实现霍纳方法的问题

pandas - 填充 : how to pad values over the next x days

python - 神秘时间转换( Pandas 和日期时间)

python - 如何将行添加到缺少日期(天)的表中,并使用从下面的行复制的值填充添加的行?

python - 在 pandas 导入上过滤 pytables 表

python - 根据授权值舍入 python 数据框列的值

python - 使用极地立体投影在 cartopy 中放置纬度标签

php - 在linux命令行上使用python(未运行)编译字符串