python - 如何在 pandas 数据框列上最好地执行递归

标签 python pandas recursion

我正在尝试计算 pandas 数据框中时间序列的索引值。该索引取决于前一行的结果来计算第一次迭代后的每一行。我尝试在数据帧行的迭代中递归地执行此操作,但我发现计算的前两行是正确的,但第三行和后续行不准确。

我认为这是因为在初始值之后,后续的索引计算出错,然后将所有其他后续计算设置为错误。

是什么导致了这种不准确。有没有比我所采取的方法更好的方法?

输出示例如下所示:


ticket_cat   Sector   Year       factor        Incorrect_index_value  correct_index_value     prev_row
Revenue      LSE      Jan 2004                 100.00                 100.00                  
Revenue      LSE      Jan 2005   4.323542894   104.3235               104.3235                100.00
Revenue      LSE      Jan 2006   3.096308080   98.823                 107.5537      <--incorrect row        
Revenue      LSE      Jan 2007   6.211666      107.476                114.2345  <--incorrect row              
Revenue      LD       Jan 2004                 100.00                 100.0000
Revenue      LD       Jan 2005   3.5218        103.5218               103.5218
Revenue      LD       Jan 2006   2.7417        99.2464                106.3602   <--- incorrect row
Revenue      LD       Jan 2007   3.3506        104.1353               109.9239  <--- incorrect row                           

我的代码片段如下:stpassrev是数据帧

#insert initial value for index
stpassrev['index_value'] = np.where(
       (stpassrev['Year'] == 'Jan 2004' ) & (stpassrev['Ticket_cat']=='Revenue'),
        100.00,np.nan )

#set up initial values for prec_row column
stpassrev['prev_row'] = np.where(
              #only have relevant row impacted
                (stpassrev['Year'] == 'Jan 2005' ) & (stpassrev['Ticke_cat']=='Revenue'),
        100.00,
        np.nan
        )

#calculate the index_value
for i in range(1,len(stpassrev)):
        stpassrev.loc[i,'passrev'] = np.where(
            (stpassrev.loc[i,'Ticket_cat']=='Revenue'  )  & (pd.isna(stpassrev.loc[i,'factor'])==False),
                ((100+stpassrev.loc[i,'factor'] ) /stpassrev.loc[i-1,'index_value'])*100,
                stpassrev.loc[i,'index_value'])


     stpassrev.loc[i,'prev_row'] = stpassrev.loc[i-1,'index_value']

最佳答案

根据您更新的问题,您只需执行以下操作:

# assign a new temp_factor with initial values and prep for cumprod
stpassrev['temp_factor'] = np.where(stpassrev['factor'].isna(), 1, stpassrev['factor'].add(100).div(100))

# calculate the cumprod based on the temp_factor (grouped by Sector) and multiply by 100 for index_value
stpassrev['index_value'] = stpassrev.groupby('Sector')['temp_factor'].cumprod().mul(100)

结果:

  ticket_cat Sector      Year    factor  temp_factor  index_value
0    Revenue    LSE  Jan 2004       NaN     1.000000   100.000000
1    Revenue    LSE  Jan 2005  4.323543     1.043235   104.323543
2    Revenue    LSE  Jan 2006  3.096308     1.030963   107.553721
3    Revenue    LSE  Jan 2007  6.211666     1.062117   114.234599
4    Revenue     LD  Jan 2004       NaN     1.000000   100.000000
5    Revenue     LD  Jan 2005  3.521800     1.035218   103.521800
6    Revenue     LD  Jan 2006  2.741700     1.027417   106.360057
7    Revenue     LD  Jan 2007  3.350600     1.033506   109.923757

如果需要四舍五入到 4 位精度,请在 .mul(100) 之后添加 .round(4):

stpassrev['index_value'] = stpassrev.groupby('Sector')['temp_factor'].cumprod().mul(100).round(4)

  ticket_cat Sector      Year    factor  temp_factor  index_value
0    Revenue    LSE  Jan 2004       NaN     1.000000     100.0000
1    Revenue    LSE  Jan 2005  4.323543     1.043235     104.3235
2    Revenue    LSE  Jan 2006  3.096308     1.030963     107.5537
3    Revenue    LSE  Jan 2007  6.211666     1.062117     114.2346
4    Revenue     LD  Jan 2004       NaN     1.000000     100.0000
5    Revenue     LD  Jan 2005  3.521800     1.035218     103.5218
6    Revenue     LD  Jan 2006  2.741700     1.027417     106.3601
7    Revenue     LD  Jan 2007  3.350600     1.033506     109.9238

关于python - 如何在 pandas 数据框列上最好地执行递归,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58916783/

相关文章:

python - Django - (OperationalError) fatal error : Ident authentication failed for user "username"

python - 如何 "iron out"一列重复的数字

python - 使用条件语句从 pandas df 列中减去标量给出 ValueError : The truth value of a Series is ambiguous

Python - 如何根据现有列中具有相应值的唯一值在数据框中创建新列?

Black Hat Python 书中的 Python 嗅探

python - 指示参数应该是可变引用

recursion - 为什么递归函数比elisp中的迭代函数表现更好?

python - 在python中递归打印目录结构的程序不起作用

algorithm - O(n.logn) 中的 Josephus 排列(移除顺序)

python - 手动调用 __enter__ 和 __exit__