Python - 带有数据框的顺序循环

我正在尝试使用其他不同数据帧提供的一些参数来计算数据帧行的递归方程。下面提供了方程式，应对矩阵的每一列进行计算。它看起来像一个指数移动平均线，除了衰减不是恒定的并且是从另一个数据帧给出的。

给定:

与输出大小相同的矩阵 Alpha
一个与输出大小相同的矩阵P
与输出宽度相同的向量 M0

我第一次尝试使用双循环(使用 .iloc):

import pandas as pd
import numpy as np

"""
Assuming inputs:
    - Matrix P of size 1000x4
    - Matrix alpha of size 1000x4
    - Vector M0 of size 1X4
"""

# input variables
height = 1000
width = 4
np.random.seed(1)
P = pd.DataFrame(np.random.normal(loc=170, scale=12, size=(height, width)), index=range(height), columns=range(width))
np.random.seed(1)
alpha = pd.DataFrame(np.random.normal(loc=0.04, scale=0.04, size=(height, width)), index=range(height), columns=range(width))
np.random.seed(1)
M0 = pd.DataFrame(np.random.normal(loc=170, scale=12, size=(height, width)), columns=range(width))


# Output table
MA = P.copy()*0
MA.iloc[0] = M0 

# Recursive equation
for x in range(width):
    for y in range(1, height):
        MA.iloc[y][x] = alpha.iloc[y][x]*P.iloc[y][x] + (1-alpha.iloc)* MA.iloc[y-1][x]

然后通过将问题扩展为累积产品(请参见下面的等式)来尝试矢量化，但未能检索到预期值(稍后将更新代码):

我可以重做我的数学。但是我想知道是否有更有效/更简单的方法来完成它，因为它需要一段时间。

感谢您的帮助!

更新 1: 几条评论:

我的原始数据框是不同 Assets (列)的价格矩阵，行是向下递增的天数(过去在顶部，现在在底部)
从那里开始，我的初始移动平均日取决于一个函数，该函数取决于返回初始窗口的 Assets 。因此，该算法不是列对称的 -我的策略是遍历列，提取所需的向量，执行 numpy 计算并将其放回数据框中:

递归方式: 我将代码重写为:

ema = P.copy()*0

for x in ema.columns:

    # define which row to start the algorithm
    start = max (100, 250, int(windows[x]))

    # store index (dates) to be re-inject after numpy calculus
    i_d = (p.iloc[start:]).index

    # extract corresponding vectors from original matrices
    alpha_temp= alpha.iloc[start:][x].values
    p_temp = p.iloc[start:][x].values
    ema_temp = ema.iloc[start:][x].values

    #MO 
    ema_temp[0] = m0[x]

    #recursive equation
    for y in range (1, len(ema_temp)):
        ema_temp[y] = alpha_temp[y]*p_temp[y]+(1-alpha_temp[y])*ema_temp[y-1]

    #transformation into a dtaframe and re-injection in the datframe ema
    ema_temp = pd.DataFrame(ema_temp)
    ema_temp.index=ema.index[-len(ema_temp):]
    ema_temp.columns=[x]
    ema.update(ema_temp)

等式的展开

感谢一位客人的帮助。

# This is the product within the summation.
prod = np.flipud(np.cumprod(1 - np.flipud(alpha)))

# This is the sum over the scaled products.
sum_prod = np.cumsum((alpha * P)[:-1] * prod[1:])

# Combining all elements.
result = (alpha * P)[1:] + sum_prod + M0*prod[0]

我试过你的代码，但我无法提供正确的答案。我不确定是否能 100% 理解它。

假设我的数据是向下的，第一行将提供:

我不明白如何在第二行中使用它，因为它已经包含了 1-a_n 无处不在。

非常感谢!

最佳答案

我会推荐两个修改:

1. 为了简化:由于用于计算移动平均值的列的独立性。单个 for 循环就足以遍历行。此外，这将提供较小的性能提升。

for y in range(1,height):
    MA.iloc[y] = alpha.iloc[y]*P.iloc[y] + (1-alpha.iloc[y])*MA.iloc[y-1]

2. 为了提高计算效率/速度:使用numpy ndarray/array索引/em> 而不是 pandas dataFrame/Series 将提供可观的性能改进。

MA = MA.values # converted to ndarray from dataFrame alpha = alpha.values # -do- P = P.values # -do- for y in range(1,height): MA[y] = alpha[y]*P[y] + (1-alpha[y])*MA[y-1]

关于Python - 带有数据框的顺序循环，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/51408552/

Python - 带有数据框的顺序循环

上一篇：python - 如何将按 x、y、z 坐标排序的 pandas 数据框转换为 numpy 数组列表？

下一篇：python - 将相似的模式合并为单一的共识模式