python - 指数股票价格在开始日期返回至 100

我有一个数据集，其中包含不同股票行业的每日百分比返回。完整的数据集太大，无法在此处显示，但这里有一个具有或多或少相同结构的虚拟数据框:

df = pd.DataFrame(np.array([['01/01/2020', 'energy', 0.25], ['01/02/2020', 'energy', -2], ['01/01/2020', 'technology', 1.5], ['01/02/2020', 'technology', 1], ['01/01/2020', 'healthcare', -1], ['01/02/2020', 'healthcare', 0.5]]),
                       columns=['date', 'industry', 'return'])

         date    industry return
0  01/01/2020      energy   0.25
1  01/02/2020      energy     -2
2  01/01/2020  technology    1.5
3  01/02/2020  technology      1
4  01/01/2020  healthcare     -1
5  01/02/2020  healthcare    0.5

我想为每个不同的行业创建一个索引，该索引从数据帧的第一个日期的 100 开始，然后根据当天的返回百分比增加/减少，直到数据帧的最终日期。我可以为最早的日期填写起始值 100:

df['index'] = np.where(df['date'] == df['date'].min(), 100, 0)

         date    industry return  index
0  01/01/2020      energy   0.25    100
1  01/02/2020      energy     -2      0
2  01/01/2020  technology    1.5    100
3  01/02/2020  technology      1      0
4  01/01/2020  healthcare     -1    100
5  01/02/2020  healthcare    0.5      0

但我真的不知道如何从这里继续填写其他索引值。输出应如下所示:

         date    industry return  index
0  01/01/2020      energy   0.25    100
1  01/02/2020      energy     -2     98
2  01/01/2020  technology    1.5    100
3  01/02/2020  technology      1    101
4  01/01/2020  healthcare     -1    100
5  01/02/2020  healthcare    0.5  100.5

这里有人知道如何做到这一点/可以为我指出正确的方向吗？ * 澄清:我需要“复合”索引值，而不是累积的。例如，假设某个行业有 3 个日期，百分比返回率分别为 0.5、0.1 和 1.2，则指数输出应为 100(起始日期)、100.1 (100*1.001) 和 101.3012 100(100.1*1.012) )

最佳答案

使用GroupBy.cumsum使用Series.mask为每个行业设置return的第一个值:

#df['return']=df['return'].astype(float) #if necessary
df['index'] = (df['return'].mask(df['industry'].ne(df['industry'].shift()),0)
                           .groupby(df['industry'])
                           .cumsum().add(100))
print(df)
         date    industry  return  index
0  01/01/2020      energy    0.25  100.0
1  01/02/2020      energy   -2.00   98.0
2  01/01/2020  technology    1.50  100.0
3  01/02/2020  technology    1.00  101.0
4  01/01/2020  healthcare   -1.00  100.0
5  01/02/2020  healthcare    0.50  100.5

<小时/>

已更新

df['index'] =  (df['return'].astype(float)
                            .div(100)
                            .add(1)
                            .mask(df['industry'].ne(df['industry'].shift()),100)
                            .groupby(df['industry'])
                            .cumprod())
print(df)
         date    industry  return  index
0  01/01/2020      energy     0.5  100.0
1  01/02/2020      energy     0.1  100.1
2  01/01/2020  technology     1.2  100.0
3  01/02/2020  technology     0.5  100.5
4  01/01/2020  healthcare     0.1  100.0
5  01/02/2020  healthcare     1.2  101.2

关于python - 指数股票价格在开始日期返回至 100，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59937113/

python - 指数股票价格在开始日期返回至 100

上一篇：python - pandas 动态获取连续周订单数量为空

下一篇：python - Python 上的等高线图。类型错误 : Input z must be a 2D array