我有一些关于根据条件计算滚动平均值/标准差的问题。老实说,这更像是一个语法问题,但由于我认为它大大减慢了我的代码速度,所以我认为我应该在这里问它以了解发生了什么。我有一些财务数据,其中包含 Stock Name
、Midquotes
等列,我想根据股票计算滚动平均值和滚动标准差。
现在我希望计算每只股票的波动性,这是通过采用前 20 个中间报价的滚动标准差来完成的。为此,在stackoverflow论坛搜索后,我发现了一行使用data.table
包的内容,如下所示:
DT[, volatility:=( roll_sd(DT$Midquotes, 20, fill=0, align = "right") ), by = Stock]
其中DT
是包含我所有数据的data.table
。
现在,这在计算上相当慢,特别是当我将其与没有此处给出的任何条件的典型滚动标准偏差计算进行比较时:
DT$volatility <- roll_sd(DT$Midquotes, 20, fill=0, align = "right")
但是当我尝试用条件的滚动标准差做类似的事情时,R 不会让我这样做:
DT$volatility <- DT[, ( roll_sd(DT$Midquotes, 20, fill=0, align = "right") ), by = Stock]
此行出现错误:
Error: cannot allocate vector of size 10.9 Gb
所以我只是想知道,为什么这一行是:DT[,volatility:=( roll_sd(DT$Midquotes, 20, fill=0,align = "right") ), by = Stock]
这么慢?每次计算每种不同股票的滚动标准差时,是否可能会复制整个data.table
?
最佳答案
我认为您的问题是您使用了 :=
函数,并且在方括号内使用了 DT
。我假设你的设置是这样的:
> library(data.table)
> set.seed(83385668)
> DT <- data.table(
+ x = rnorm(5 * 3),
+ stock = c(sapply(letters[1:3], rep, times = 5)),
+ time = c(replicate(3, 1:5)))
> DT
x stock time
1: 0.25073356 a 1
2: -0.24408170 a 2
3: -0.87475856 a 3
4: 0.50843761 a 4
5: -1.91331773 a 5
6: 0.07850094 b 1
7: -0.15922989 b 2
8: 1.09806870 b 3
9: 0.27995610 b 4
10: 0.45090842 b 5
11: 0.03400554 c 1
12: -0.34918734 c 2
13: 2.16602740 c 3
14: -0.04758261 c 4
15: 1.24869663 c 5
我不确定 roll_sd
函数来自哪里。但是,您可以计算例如zoo
库的滚动平均值如下:
> library(zoo)
> setkey(DT, stock, time) # make sure data is sorted by time
> DT[, rollmean := rollmean(x, k = 3, fill = 0, align = "right"),
+ by = .(stock)]
> DT
x stock time rollmean
1: 0.25073356 a 1 0.0000000
2: -0.24408170 a 2 0.0000000
3: -0.87475856 a 3 -0.2893689
4: 0.50843761 a 4 -0.2034676
5: -1.91331773 a 5 -0.7598796
6: 0.07850094 b 1 0.0000000
7: -0.15922989 b 2 0.0000000
8: 1.09806870 b 3 0.3391132
9: 0.27995610 b 4 0.4062650
10: 0.45090842 b 5 0.6096444
11: 0.03400554 c 1 0.0000000
12: -0.34918734 c 2 0.0000000
13: 2.16602740 c 3 0.6169485
14: -0.04758261 c 4 0.5897525
15: 1.24869663 c 5 1.1223805
或者同等的
> DT[, `:=`(rollmean = rollmean(x, k = 3, fill = 0, align = "right")),
+ by = .(stock)]
> DT
x stock time rollmean
1: 0.25073356 a 1 0.0000000
2: -0.24408170 a 2 0.0000000
3: -0.87475856 a 3 -0.2893689
4: 0.50843761 a 4 -0.2034676
5: -1.91331773 a 5 -0.7598796
6: 0.07850094 b 1 0.0000000
7: -0.15922989 b 2 0.0000000
8: 1.09806870 b 3 0.3391132
9: 0.27995610 b 4 0.4062650
10: 0.45090842 b 5 0.6096444
11: 0.03400554 c 1 0.0000000
12: -0.34918734 c 2 0.0000000
13: 2.16602740 c 3 0.6169485
14: -0.04758261 c 4 0.5897525
15: 1.24869663 c 5 1.1223805
关于滚动平均值/标准差(随条件变化),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46438975/