如何访问日期时间索引的多级数据帧,如下所示:这是下载的 Fin 数据。 困难的部分是进入框架并访问特定内部级别的非相邻行,而不明确指定外部级别日期,因为我有数千个这样的行..
ABC DEF GHI \
Date STATS
2012-07-19 00:00:00 NaN NaN NaN
investment 4 9 13
price 5 8 1
quantity 12 9 8
所以我正在搜索的两个公式可以总结为
X(today row) = quantity(prior row)*price(prior row)
or
X(today row) = quantity(prior row)*price(today)
困难在于如何使用 numpy 或 panda 来制定对多级索引对这些行的访问,并且这些行不相邻。
最后我会得到这样的结果:
ABC DEF GHI XN
Date STATS
2012-07-19 00:00:00 NaN NaN NaN
investment 4 9 13 X1
price 5 8 1
quantity 12 9 8
2012-07-18 00:00:00 NaN NaN NaN
investment 1 2 3 X2
price 2 3 4
quantity 18 6 7
X1= (18*2)+(6*3)+(7*4) (quantity_day_2 *price_day_2 data)
or for the other formula
X1= (18*5)+(6*8)+(7*1) (quantity_day_2 *price_day_1 data)
我可以使用groupby吗?
最佳答案
如果需要将输出添加到原始DataFrame
,那么就更复杂了:
print (df)
ABC DEF GHI
Date STATS
2012-07-19 NaN NaN NaN
investment 4.0 9.0 13.0
price 5.0 8.0 1.0
quantity 12.0 9.0 8.0
2012-07-18 NaN NaN NaN
investment 1.0 2.0 3.0
price 2.0 3.0 4.0
quantity 18.0 6.0 7.0
2012-07-17 NaN NaN NaN
investment 1.0 2.0 3.0
price 0.0 1.0 4.0
quantity 5.0 1.0 0.0
df.sort_index(inplace=True)
#rename value in level to investment - align data in final concat
idx = pd.IndexSlice
p = df.loc[idx[:,'price'],:].rename(index={'price':'investment'})
q = df.loc[idx[:,'quantity'],:].rename(index={'quantity':'investment'})
print (p)
ABC DEF GHI
Date STATS
2012-07-17 investment 0.0 1.0 4.0
2012-07-18 investment 2.0 3.0 4.0
2012-07-19 investment 5.0 8.0 1.0
print (q)
ABC DEF GHI
Date STATS
2012-07-17 investment 5.0 1.0 0.0
2012-07-18 investment 18.0 6.0 7.0
2012-07-19 investment 12.0 9.0 8.0
#multiple and concat to original df
print (p * q)
ABC DEF GHI
Date STATS
2012-07-17 investment 0.0 1.0 0.0
2012-07-18 investment 36.0 18.0 28.0
2012-07-19 investment 60.0 72.0 8.0
a = (p * q).sum(axis=1).rename('col1')
print (pd.concat([df, a], axis=1))
ABC DEF GHI col1
Date STATS
2012-07-17 NaN NaN NaN NaN
investment 1.0 2.0 3.0 1.0
price 0.0 1.0 4.0 NaN
quantity 5.0 1.0 0.0 NaN
2012-07-18 NaN NaN NaN NaN
investment 1.0 2.0 3.0 82.0
price 2.0 3.0 4.0 NaN
quantity 18.0 6.0 7.0 NaN
2012-07-19 NaN NaN NaN NaN
investment 4.0 9.0 13.0 140.0
price 5.0 8.0 1.0 NaN
quantity 12.0 9.0 8.0 NaN
#shift with Multiindex - not supported yet - first create Datatimeindex with unstack
#, then shift and last reshape to original by stack
#multiple and concat to original df
print (p.unstack().shift(-1, freq='D').stack() * q)
ABC DEF GHI
Date STATS
2012-07-16 investment NaN NaN NaN
2012-07-17 investment 10.0 3.0 0.0
2012-07-18 investment 90.0 48.0 7.0
2012-07-19 investment NaN NaN NaN
b = (p.unstack().shift(-1, freq='D').stack() * q).sum(axis=1).rename('col2')
print (pd.concat([df, b], axis=1))
ABC DEF GHI col2
Date STATS
2012-07-16 investment NaN NaN NaN 0.0
2012-07-17 NaN NaN NaN NaN
investment 1.0 2.0 3.0 13.0
price 0.0 1.0 4.0 NaN
quantity 5.0 1.0 0.0 NaN
2012-07-18 NaN NaN NaN NaN
investment 1.0 2.0 3.0 145.0
price 2.0 3.0 4.0 NaN
quantity 18.0 6.0 7.0 NaN
2012-07-19 NaN NaN NaN NaN
investment 4.0 9.0 13.0 0.0
price 5.0 8.0 1.0 NaN
quantity 12.0 9.0 8.0 NaN
关于python - 如何访问多索引 Panda 数据框中的先前行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39601110/