我有一个包含时间戳的 data.table 对象(测量为午夜后的秒数)。我的目标是运行一个函数,该函数为每一行返回观察前最大 $k$ 秒发生的观察次数。
require(data.table, dplyr, dtplyr)
set.seed(123)
DF <- data.frame(Secs=cumsum(rexp(10000,1)))
setDT(DF)
> DF
Secs
1: 8.434573e-01
2: 1.420068e+00
3: 2.749122e+00
4: 2.780700e+00
5: 2.836911e+00
---
9996: 1.003014e+04
9997: 1.003382e+04
9998: 1.003384e+04
9999: 1.003414e+04
10000: 1.003781e+04
我想应用到每一行的函数是
nS<-function(Second,k=5)
max(1,nrow(DF%>%filter(Secs<Second & Secs>=Second-k)))
获得我想要的东西的一种方法是使用应用,这需要相当长的时间。
system.time(val <- apply(DF,1,nS))
User System verstrichen
20.56 0.03 20.66
#Not working
DF%>%mutate(nS=nS(Secs,100))%>%head()
# Also not working
library(lazyeval)
f = function(col1, new_col_name) {
mutate_call = lazyeval::interp(~ nS(a), a = as.name(col1))
DF%>%mutate_(.dots=setNames(list(mutate_call),new_col_name))
}
head(f('Secs', 'nS'))
DF%>%mutate(minTime=Secs-k)%>%head()
难道不能通过使用 mutate 来实现这种方法吗? 非常感谢您的帮助!
最佳答案
使用 rowwise()
对你有用吗?
DF %>% rowwise() %>% mutate(ns = nS(Secs), # default k = 5, equal to your apply
ns2 = nS(Secs, 100)) # second test case k = 100
Source: local data frame [10,000 x 3]
Groups: <by row>
# A tibble: 10,000 × 3
Secs ns ns2
<dbl> <dbl> <dbl>
1 0.1757671 1 1
2 1.1956531 1 1
3 1.6594676 2 2
4 2.6988685 3 3
5 2.8845783 4 4
6 3.1012975 5 5
7 4.1258548 6 6
8 4.1584318 7 7
9 4.2346702 8 8
10 6.0375495 8 9
# ... with 9,990 more rows
它只比 apply
快一点,在我的机器上...
system.time(DF %>% rowwise() %>% mutate(ns = nS(Secs)))
user system elapsed
13.934 1.060 15.280
system.time(apply(DF, 1, nS))
user system elapsed
14.938 1.101 16.438
关于r - 在 mutate 中应用函数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41879458/