我是 Python 新手,正在寻求帮助来随着时间的推移将 2 个数据帧相乘。任何帮助理解错误的帮助将不胜感激。
第一个数据帧(cov)
Date NoDur Durbl Manuf
2018-12-27 NoDur 0.000109 0.000112 0.000118
Durbl 0.000112 0.000339 0.000238
Manuf 0.000118 0.000238 0.000246
2018-12-28 NoDur 0.000109 0.000113 0.000117
Durbl 0.000113 0.000339 0.000239
Manuf 0.000117 0.000239 0.000242
2018-12-31 NoDur 0.000109 0.000113 0.000118
Durbl 0.000113 0.000339 0.000239
Manuf 0.000118 0.000239 0.000245
第二个数据帧(w)
Date NoDur Durbl Manuf
2018-12-27 -69.190732 -96.316224 -324.058486
2018-12-28 -113.831750 30.426696 -410.055587
2018-12-31 -101.365016 -16.613136 -362.232014
代码:
std = np.dot(np.transpose(w) , np.matmul(cov , w))
错误:
ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 12361 is different from 10)
我只显示数据帧中的一小部分摘录。原始cov数据帧为123610行×10列,w数据帧为12361行×10列。
预期输出:
Date
2018-12-27 44.45574103083
2018-12-28 46.593367859
2018-12-31 45.282932300
非常感谢!
最佳答案
我认为您可以在'Date'
级别上使用groupby
,然后将与组中日期相对应的w
中的权重相乘:
cov.groupby(level='Date').apply(lambda g: w.loc[g.name].dot(g.values@(w.loc[g.name])))
由于您的数据确实可以更好地由三维数组表示,因此您还可以避免 apply
中组的隐式循环并使用 np.einsum
:
reshaped = cov.values.reshape(cov.index.levels[0].nunique(), cov.index.levels[1].nunique(), cov.shape[-1])
np.einsum('ik,ik->i', w.values, np.einsum('ijk,ik->ij', reshaped, w.values))
从性能角度来看,第二种解决方案似乎更好:
%timeit cov.groupby(level='Date').apply(lambda g: w.loc[g.name].dot(g.values@(w.loc[g.name])))
4.74 ms ± 614 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit np.einsum('ik,ik->i', w.values, np.einsum('ijk,ik->ij', reshaped, w.values))
35.6 µs ± 5.19 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
数据:
cov = pd.DataFrame.from_dict({'NoDur': {('2018-12-27', 'NoDur'): 0.000109,
('2018-12-27', 'Durbl'): 0.000112,
('2018-12-27', 'Manuf'): 0.000118,
('2018-12-28', 'NoDur'): 0.000109,
('2018-12-28', 'Durbl'): 0.000113,
('2018-12-28', 'Manuf'): 0.000117,
('2018-12-31', 'NoDur'): 0.000109,
('2018-12-31', 'Durbl'): 0.000113,
('2018-12-31', 'Manuf'): 0.000118},
'Durbl': {('2018-12-27', 'NoDur'): 0.000112,
('2018-12-27', 'Durbl'): 0.000339,
('2018-12-27', 'Manuf'): 0.000238,
('2018-12-28', 'NoDur'): 0.000113,
('2018-12-28', 'Durbl'): 0.000339,
('2018-12-28', 'Manuf'): 0.000239,
('2018-12-31', 'NoDur'): 0.000113,
('2018-12-31', 'Durbl'): 0.000339,
('2018-12-31', 'Manuf'): 0.000239},
'Manuf': {('2018-12-27', 'NoDur'): 0.000118,
('2018-12-27', 'Durbl'): 0.000238,
('2018-12-27', 'Manuf'): 0.000246,
('2018-12-28', 'NoDur'): 0.000117,
('2018-12-28', 'Durbl'): 0.000239,
('2018-12-28', 'Manuf'): 0.000242,
('2018-12-31', 'NoDur'): 0.000118,
('2018-12-31', 'Durbl'): 0.000239,
('2018-12-31', 'Manuf'): 0.000245}})
w = pd.DataFrame.from_dict({'NoDur': {'2018-12-27': -69.190732,
'2018-12-28': -113.83175,
'2018-12-31': -101.365016},
'Durbl': {'2018-12-27': -96.316224,
'2018-12-28': 30.426696,
'2018-12-31': -16.613136},
'Manuf': {'2018-12-27': -324.058486,
'2018-12-28': -410.055587,
'2018-12-31': -362.232014}})
关于python - 随着时间的推移,将多维多索引数据帧与单索引数据帧相乘,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67087980/