python-3.x - 试图找到一种在引用前一行的 Pandas 中使用 while 循环的有效方法

我想对数千行数据快速多次运行此自定义函数。我认为解决这个问题的方式需要很长时间才能运行。

我曾尝试使用 .apply，但看不到如何仅应用于某些行。我想过尝试将前一行的解决方案存储为一个变量，但无法推理出代码并认为它可能是相同的速度。

下面的代码是我试图提高效率的示例。这就像这里的 excel 版本 https://www.youtube.com/watch?v=Dt0KQg52c6c&t=274s 在 4 分 30 秒处

我是编码和自学的新手，如果有人能给我指出一个方向，可以帮助我想出一种在非循环变体中计算这个的方法，那将对我有很大帮助并适用于我的以后对编码的理解，谢谢!

import pandas as pd
import numpy as np
import time

start_program = time.time()

df = pd.DataFrame({'Date':['2019-09-01','2019-09-02','2019-09-03','2019-09-04','2019-09-05','2019-09-06'], 'price':[10,8,5,20,50,60]})

df['Date'] = pd.to_datetime(df["Date"])

df.set_index('Date',inplace=True)

df.insert(1,'AVG', "")

df['AVG'] = df['AVG'].apply(pd.to_numeric)

df.iloc[3, df.columns.get_loc('AVG')] = np.mean(df['price'].iloc[0:4])

def avgfunc(df,target_column,price_column,row,num_avg):
    df.iloc[row, df.columns.get_loc(target_column)] = ((df[target_column].iloc[row -1]*(num_avg - 1))+df[price_column].iloc[row])/num_avg
    return df.iloc[row, df.columns.get_loc(target_column)]

leng = len(df['price'])

i=4
while i < leng:
    avgfunc(df,'AVG','price',i,5)
    i += 1      

print(df)

end_program = time.time()
print("Total time to complete program is :", end_program - start_program)

$ python test_loop.py
        price  AVG
Date
2019-09-01     10    NaN
2019-09-02      8    NaN
2019-09-03      5    NaN
2019-09-04     20  10.75
2019-09-05     50  18.60
2019-09-06     60  26.88
Total time to complete program is : 0.03003978729248047

最佳答案

这是使用 numpy 的一种方式

ave= np.frompyfunc(lambda a,b: (a+b)/2,2,1)
v=ave.accumulate(df.price.values, dtype=np.object)
v
Out[525]: array([1, 1.5, 2.25, 3.125, 4.0625, 5.03125], dtype=object)

或者我们可以用 numba 加速

from numba import njit
@njit
def ave(x):
    total = 1
    result = []
    for y in x:
        total = (y+total)/2
        result.append(total)
    return result
ave(df.price.values)
Out[528]: [1.0, 1.5, 2.25, 3.125, 4.0625, 5.03125]

关于python-3.x - 试图找到一种在引用前一行的 Pandas 中使用 while 循环的有效方法，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/58177156/

python-3.x - 试图找到一种在引用前一行的 Pandas 中使用 while 循环的有效方法

上一篇：variables - 如何在 TWIG 宏中访问模板变量？

下一篇：ruby-on-rails-3.1 - 带有 mysql2 适配器的 Rails Rubber gem - 由于用户错误而拒绝访问