python - pandas MultiIndex 滚动平均值

标签 python pandas multi-index

前言:我是新手,但已经在这里和 pandas documentation 中搜索了几个小时。没有成功。我还读过韦斯的book .

我正在为对冲基金的股票市场数据建模,并有一个带有股票代码、日期(每日)和字段的简单多索引数据框架。这里的样本来自彭博社。 3 个月 - 2016 年 12 月至 2017 年 2 月,3 个股票代码(AAPL、IBM、MSFT)。

import numpy as np
import pandas as pd
import os

# get data from Excel
curr_directory = os.getcwd()
filename = 'Sample Data File.xlsx'
filepath = os.path.join(curr_directory, filename)
df = pd.read_excel(filepath, sheetname = 'Sheet1', index_col = [0,1], parse_cols = 'A:D')

# sort
df.sort_index(inplace=True)

# sample of the data
df.head(15)
Out[4]: 
                           PX_LAST  PX_VOLUME
Security Name  date                          
AAPL US Equity 2016-12-01   109.49   37086862
               2016-12-02   109.90   26527997
               2016-12-05   109.11   34324540
               2016-12-06   109.95   26195462
               2016-12-07   111.03   29998719
               2016-12-08   112.12   27068316
               2016-12-09   113.95   34402627
               2016-12-12   113.30   26374377
               2016-12-13   115.19   43733811
               2016-12-14   115.19   34031834
               2016-12-15   115.82   46524544
               2016-12-16   115.97   44351134
               2016-12-19   116.64   27779423
               2016-12-20   116.95   21424965
               2016-12-21   117.06   23783165

df.tail(15)
Out[5]: 
                           PX_LAST  PX_VOLUME
Security Name  date                          
MSFT US Equity 2017-02-07    63.43   20277226
               2017-02-08    63.34   18096358
               2017-02-09    64.06   22644443
               2017-02-10    64.00   18170729
               2017-02-13    64.72   22920101
               2017-02-14    64.57   23108426
               2017-02-15    64.53   17005157
               2017-02-16    64.52   20546345
               2017-02-17    64.62   21248818
               2017-02-21    64.49   20655869
               2017-02-22    64.36   19292651
               2017-02-23    64.62   20273128
               2017-02-24    64.62   21796800
               2017-02-27    64.23   15871507
               2017-02-28    63.98   23239825

当我计算每日价格变化时,像这样,它似乎有效,只有第一天是 NaN,因为它应该是:

df.head(5)
Out[7]: 
                           PX_LAST  PX_VOLUME  px_change_%
Security Name  date                                       
AAPL US Equity 2016-12-01   109.49   37086862          NaN
               2016-12-02   109.90   26527997     0.003745
               2016-12-05   109.11   34324540    -0.007188
               2016-12-06   109.95   26195462     0.007699
               2016-12-07   111.03   29998719     0.009823

但每日 30 天成交量却不然。它只应在前 29 天为 NaN,但在所有情况下均为 NaN:

# daily change from 30 day volume - doesn't work
df['30_day_volume'] = df.groupby(level=0,group_keys=True)['PX_VOLUME'].rolling(window=30).mean()
df['volume_change_%'] = (df['PX_VOLUME'] - df['30_day_volume']) / df['30_day_volume']

df.iloc[:,3:].tail(40)
Out[12]: 
                           30_day_volume  volume_change_%
Security Name  date                                      
MSFT US Equity 2016-12-30            NaN              NaN
               2017-01-03            NaN              NaN
               2017-01-04            NaN              NaN
               2017-01-05            NaN              NaN
               2017-01-06            NaN              NaN
               2017-01-09            NaN              NaN
               2017-01-10            NaN              NaN
               2017-01-11            NaN              NaN
               2017-01-12            NaN              NaN
               2017-01-13            NaN              NaN
               2017-01-17            NaN              NaN
               2017-01-18            NaN              NaN
               2017-01-19            NaN              NaN
               2017-01-20            NaN              NaN
               2017-01-23            NaN              NaN
               2017-01-24            NaN              NaN
               2017-01-25            NaN              NaN
               2017-01-26            NaN              NaN
               2017-01-27            NaN              NaN
               2017-01-30            NaN              NaN
               2017-01-31            NaN              NaN
               2017-02-01            NaN              NaN
               2017-02-02            NaN              NaN
               2017-02-03            NaN              NaN
               2017-02-06            NaN              NaN
               2017-02-07            NaN              NaN
               2017-02-08            NaN              NaN
               2017-02-09            NaN              NaN
               2017-02-10            NaN              NaN
               2017-02-13            NaN              NaN
               2017-02-14            NaN              NaN
               2017-02-15            NaN              NaN
               2017-02-16            NaN              NaN
               2017-02-17            NaN              NaN
               2017-02-21            NaN              NaN
               2017-02-22            NaN              NaN
               2017-02-23            NaN              NaN
               2017-02-24            NaN              NaN
               2017-02-27            NaN              NaN
               2017-02-28            NaN              NaN

由于 pandas 似乎是专门为金融设计的,我很惊讶这并不简单。

编辑:我也尝试过其他一些方法。

  • 尝试将其转换为面板 (3D),但除了转换为 DataFrame 并返回之外,没有找到任何适用于 Windows 的内置函数,因此没有任何优势。
  • 尝试创建数据透视表,但找不到仅引用 MultiIndex 的第一级的方法。 df.index.levels[0]...levels[1] 不起作用。

谢谢!

最佳答案

您可以尝试以下方法看看是否有效吗?

df['30_day_volume'] = df.groupby(level=0)['PX_VOLUME'].rolling(window=30).mean().values

df['volume_change_%'] = (df['PX_VOLUME'] - df['30_day_volume']) / df['30_day_volume']

关于python - pandas MultiIndex 滚动平均值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43859105/

相关文章:

python - matplotlib 等高线图标签重叠轴

python - 归一化直方图

python - 错误: Failed to enable GUI event loop integration for 'qt' while importing pandas

python - 将单元格设置为等于 pandas 中的值

python - 使用 .concat 创建 Pandas 数据框时包括空系列

python - 展平多索引数据帧级别并从列名称末尾删除字符串(如果包含)

python - django rest框架过滤器

python - 创建与每一行具有相同列表的新 pandas 列?

python - 值错误 : Length mismatch: Expected axis has 0 elements while creating hierarchical columns in pandas dataframe

python - 使用web3.py查询远程以太坊节点时出现间歇性 "Read time out"错误