python - Pandas:按行划分多索引数据帧

我有一个带有多索引(面板)的数据框，我想将每个组(县)和每行的值按特定年份进行划分。

>>> fields
Out[39]: ['emplvl', 'population', 'estab', 'estab_pop', 'emp_pop']
>>> df[fields]
Out[40]: 
                   emplvl  population    estab  estab_pop   emp_pop
county year                                                        
1001   2003  11134.500000       46800   801.75   0.017131  0.237917
       2004  11209.166667       48366   824.00   0.017037  0.231757
       2005  11452.166667       49676   870.75   0.017529  0.230537
       2006  11259.250000       51328   862.50   0.016804  0.219359
       2007  11403.333333       52405   879.25   0.016778  0.217600
       2008  11272.833333       53277   890.25   0.016710  0.211589
       2009  11003.833333       54135   877.00   0.016200  0.203267
       2010  10693.916667       54632   877.00   0.016053  0.195745
       2011  10627.000000         NaN   862.00        NaN       NaN
       2012  10136.916667         NaN   841.75        NaN       NaN
1003   2003  51372.250000      151509  4272.00   0.028196  0.339071
       2004  53450.583333      156266  4536.25   0.029029  0.342049
       2005  56110.250000      162183  4880.50   0.030093  0.345969
       2006  59291.000000      168121  5067.50   0.030142  0.352669
       2007  62600.083333      172404  5337.25   0.030958  0.363101
       2008  62611.500000      175827  5529.25   0.031447  0.356097
       2009  58947.666667      179406  5273.75   0.029396  0.328571
       2010  58139.583333      183195  5171.25   0.028228  0.317364
       2011  59581.000000         NaN  5157.75        NaN       NaN
       2012  60440.250000         NaN  5171.75        NaN       NaN

要除以的行

>>> df[fields].loc[df.index.get_level_values('year') == 2007, fields]
Out[32]: 
                   emplvl  population    estab  estab_pop   emp_pop
county year                                                        
1001   2007  11403.333333       52405   879.25   0.016778  0.217600
1003   2007  62600.083333      172404  5337.25   0.030958  0.363101

但是，两者

df[fields].div(df.loc[df.index.get_level_values('year') == 2007, fields], axis=0)
df[fields].div(df.loc[df.index.get_level_values('year') == 2007, fields], axis=1)

给了我一个充满NaN的数据框，可能是因为pandas试图除以year索引，但没有找到任何可以分割的东西。

为了弥补这一点，我也尝试过

df[fields].div(df.loc[df.index.get_level_values('year') == 2007, fields].values)

这给了我ValueError:传递值的形状是(5, 2)，索引暗示(5, 20)。

最佳答案

我想你可以 reset_index与 df1 ，然后使用 div :

fields = ['emplvl', 'population', 'estab', 'estab_pop', 'emp_pop'] 

df1 =  df.loc[df.index.get_level_values('year') == 2007, fields].reset_index(level=1)
print df1
        year        emplvl  population    estab  estab_pop   emp_pop
county                                                              
1001    2007  11403.333333     52405.0   879.25   0.016778  0.217600
1003    2007  62600.083333    172404.0  5337.25   0.030958  0.363101

print df.div(df1[fields], axis=0)
               emplvl  population     estab  estab_pop   emp_pop
county year                                                     
1001   2003  0.976425    0.893045  0.911857   1.021039  1.093369
       2004  0.982973    0.922927  0.937162   1.015437  1.065060
       2005  1.004282    0.947925  0.990333   1.044761  1.059453
       2006  0.987365    0.979449  0.980950   1.001550  1.008084
       2007  1.000000    1.000000  1.000000   1.000000  1.000000
       2008  0.988556    1.016640  1.012511   0.995947  0.972376
       2009  0.964966    1.033012  0.997441   0.965550  0.934131
       2010  0.937789    1.042496  0.997441   0.956789  0.899563
       2011  0.931920         NaN  0.980381        NaN       NaN
       2012  0.888943         NaN  0.957350        NaN       NaN
1003   2003  0.820642    0.878802  0.800412   0.910782  0.933820
       2004  0.853842    0.906394  0.849923   0.937690  0.942022
       2005  0.896329    0.940715  0.914422   0.972059  0.952818
       2006  0.947139    0.975157  0.949459   0.973642  0.971270
       2007  1.000000    1.000000  1.000000   1.000000  1.000000
       2008  1.000182    1.019855  1.035974   1.015796  0.980711
       2009  0.941655    1.040614  0.988102   0.949545  0.904902
       2010  0.928746    1.062591  0.968898   0.911816  0.874038
       2011  0.951772         NaN  0.966368        NaN       NaN
       2012  0.965498         NaN  0.968992        NaN       NaN

关于python - Pandas:按行划分多索引数据帧，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/36678611/

python - Pandas:按行划分多索引数据帧

上一篇：python - 使用 TreeTagger 进行标记时必须使用 unicode 字符串作为标记文本吗？

下一篇：python - 为什么我的散点图没有显示颜色？

python - Pandas:按行划分多索引数据帧

上一篇：python - 使用 TreeTagger 进行标记时必须使用 *unicode* 字符串作为标记文本吗？

下一篇：python - 为什么我的散点图没有显示颜色？

上一篇：python - 使用 TreeTagger 进行标记时必须使用 unicode 字符串作为标记文本吗？