如何使用 Numpy 或 Pandas 执行与 rollapply(...., by.column=FALSE) 等效的 R(xts)?当给定一个数据帧时,pandas rolling_apply 似乎只能逐列工作,而不是提供向目标函数提供完整(窗口大小)x(数据帧宽度)矩阵的选项。
import pandas as pd
import numpy as np
xx = pd.DataFrame(np.zeros([10, 10]))
pd.rolling_apply(xx, 5, lambda x: np.shape(x)[0])
0 1 2 3 4 5 6 7 8 9
0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 5 5 5 5 5 5 5 5 5 5
5 5 5 5 5 5 5 5 5 5 5
6 5 5 5 5 5 5 5 5 5 5
7 5 5 5 5 5 5 5 5 5 5
8 5 5 5 5 5 5 5 5 5 5
9 5 5 5 5 5 5 5 5 5 5
所以发生的事情是 rolling_apply 依次沿着每一列向下移动,并在每一列下方应用一个 5 长度的滑动窗口,而我想要的是滑动窗口每次都是一个 5x10 数组,在这种情况下,我会得到一个单列向量(不是二维数组)结果。
最佳答案
我确实找不到一种方法来计算 pandas 中的“广泛”滚动应用程序 文档,所以我会使用 numpy 在数组上获取“窗口化” View 并应用 ufunc 给它。这是一个例子:
In [40]: arr = np.arange(50).reshape(10, 5); arr
Out[40]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34],
[35, 36, 37, 38, 39],
[40, 41, 42, 43, 44],
[45, 46, 47, 48, 49]])
In [41]: win_size = 5
In [42]: isize = arr.itemsize; isize
Out[42]: 8
arr.itemsize
是 8 因为默认 dtype 是 np.int64
,你需要它用于以下“窗口” View 习惯用法:
In [43]: windowed = np.lib.stride_tricks.as_strided(arr,
shape=(arr.shape[0] - win_size + 1, win_size, arr.shape[1]),
strides=(arr.shape[1] * isize, arr.shape[1] * isize, isize)); windowed
Out[43]:
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]],
[[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29]],
[[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]],
[[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34],
[35, 36, 37, 38, 39]],
[[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34],
[35, 36, 37, 38, 39],
[40, 41, 42, 43, 44]],
[[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34],
[35, 36, 37, 38, 39],
[40, 41, 42, 43, 44],
[45, 46, 47, 48, 49]]])
步幅是沿给定轴的两个相邻元素之间的字节数,
因此 strides=(arr.shape[1] * isize, arr.shape[1] * isize, isize)
表示跳过 5
从 windowed[0] 到 windowed[1] 时跳过 5 个元素
从 windowed[0, 0] 到 windowed[0, 1]。现在你可以在
结果数组,例如:
In [44]: windowed.sum(axis=(1,2))
Out[44]: array([300, 425, 550, 675, 800, 925])
关于python - 使用 Pandas 或 Numpy 的 n 维滑动窗口,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26371509/