python - 如何访问多索引 Panda 数据框中的先前行

如何访问日期时间索引的多级数据帧，如下所示:这是下载的 Fin 数据。困难的部分是进入框架并访问特定内部级别的非相邻行，而不明确指定外部级别日期，因为我有数千个这样的行..

                                       ABC        DEF        GHI  \  
Date                STATS                                            
2012-07-19 00:00:00                    NaN         NaN         NaN   
                    investment        4             9          13        
                    price             5             8          1  
                    quantity          12            9          8

所以我正在搜索的两个公式可以总结为

X(today row) = quantity(prior row)*price(prior row) 
or                           
X(today row) = quantity(prior row)*price(today)

困难在于如何使用 numpy 或 panda 来制定对多级索引对这些行的访问，并且这些行不相邻。

最后我会得到这样的结果:

                                         ABC        DEF        GHI    XN
Date                STATS                                            
2012-07-19 00:00:00                    NaN         NaN         NaN   
                    investment          4            9          13    X1
                    price               5            8           1   
                    quantity            12           9           8    

2012-07-18 00:00:00                    NaN         NaN         NaN   
                    investment          1             2          3    X2
                    price               2             3          4   
                    quantity           18             6          7    

X1= (18*2)+(6*3)+(7*4) (quantity_day_2 *price_day_2 data) 
or for the other formula
X1= (18*5)+(6*8)+(7*1) (quantity_day_2 *price_day_1 data)

我可以使用groupby吗？

最佳答案

如果需要将输出添加到原始DataFrame，那么就更复杂了:

print (df)
                        ABC  DEF   GHI
Date       STATS                      
2012-07-19              NaN  NaN   NaN
           investment   4.0  9.0  13.0
           price        5.0  8.0   1.0
           quantity    12.0  9.0   8.0
2012-07-18              NaN  NaN   NaN
           investment   1.0  2.0   3.0
           price        2.0  3.0   4.0
           quantity    18.0  6.0   7.0
2012-07-17              NaN  NaN   NaN
           investment   1.0  2.0   3.0
           price        0.0  1.0   4.0
           quantity     5.0  1.0   0.0

df.sort_index(inplace=True)

#rename value in level to investment - align data in final concat
idx = pd.IndexSlice
p = df.loc[idx[:,'price'],:].rename(index={'price':'investment'})
q = df.loc[idx[:,'quantity'],:].rename(index={'quantity':'investment'})
print (p)
                       ABC  DEF  GHI
Date       STATS                    
2012-07-17 investment  0.0  1.0  4.0
2012-07-18 investment  2.0  3.0  4.0
2012-07-19 investment  5.0  8.0  1.0

print (q)
                        ABC  DEF  GHI
Date       STATS                     
2012-07-17 investment   5.0  1.0  0.0
2012-07-18 investment  18.0  6.0  7.0
2012-07-19 investment  12.0  9.0  8.0

#multiple and concat to original df
print (p * q)
                        ABC   DEF   GHI
Date       STATS                       
2012-07-17 investment   0.0   1.0   0.0
2012-07-18 investment  36.0  18.0  28.0
2012-07-19 investment  60.0  72.0   8.0

a = (p * q).sum(axis=1).rename('col1')
print (pd.concat([df, a], axis=1))
                        ABC  DEF   GHI   col1
Date       STATS                             
2012-07-17              NaN  NaN   NaN    NaN
           investment   1.0  2.0   3.0    1.0
           price        0.0  1.0   4.0    NaN
           quantity     5.0  1.0   0.0    NaN
2012-07-18              NaN  NaN   NaN    NaN
           investment   1.0  2.0   3.0   82.0
           price        2.0  3.0   4.0    NaN
           quantity    18.0  6.0   7.0    NaN
2012-07-19              NaN  NaN   NaN    NaN
           investment   4.0  9.0  13.0  140.0
           price        5.0  8.0   1.0    NaN
           quantity    12.0  9.0   8.0    NaN

#shift with Multiindex - not supported yet - first create Datatimeindex with unstack
#, then shift and last reshape to original by stack

#multiple and concat to original df
print (p.unstack().shift(-1, freq='D').stack() * q)
                        ABC   DEF  GHI
Date       STATS                      
2012-07-16 investment   NaN   NaN  NaN
2012-07-17 investment  10.0   3.0  0.0
2012-07-18 investment  90.0  48.0  7.0
2012-07-19 investment   NaN   NaN  NaN

b = (p.unstack().shift(-1, freq='D').stack() * q).sum(axis=1).rename('col2')
print (pd.concat([df, b], axis=1))
                        ABC  DEF   GHI   col2
Date       STATS                             
2012-07-16 investment   NaN  NaN   NaN    0.0
2012-07-17              NaN  NaN   NaN    NaN
           investment   1.0  2.0   3.0   13.0
           price        0.0  1.0   4.0    NaN
           quantity     5.0  1.0   0.0    NaN
2012-07-18              NaN  NaN   NaN    NaN
           investment   1.0  2.0   3.0  145.0
           price        2.0  3.0   4.0    NaN
           quantity    18.0  6.0   7.0    NaN
2012-07-19              NaN  NaN   NaN    NaN
           investment   4.0  9.0  13.0    0.0
           price        5.0  8.0   1.0    NaN
           quantity    12.0  9.0   8.0    NaN

关于python - 如何访问多索引 Panda 数据框中的先前行，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/39601110/

python - 如何访问多索引 Panda 数据框中的先前行

上一篇：python - 如何将列表列表追加到一个列表中

下一篇：Python 空 csr_matrix 抛出 ValueError : cannot infer dimensions from zero sized index arrays