我有 tibble 三列:
- runner - 表示参赛者姓名的字符串
- race - 代表比赛次数的数值
- 日期 - 比赛日期
我想添加第四列 last45d,它表示与当前行的日期相比过去 45 天的比赛数量。我的代表包括示例数据和我尝试生成新行(我得到所有 NA)。
代表:
library(tidyverse)
library(lubridate)
library(reprex)
df<-tibble(runner=c("D.Wottle","D.Wottle","D.Wottle","D.Wottle","D.Wottle","D.Wottle","C.Hottle","C.Hottle","C.Hottle","C.Hottle","C.Hottle","C.Hottle","JJ.Watt","JJ.Watt","JJ.Watt","JJ.Watt","JJ.Watt","JJ.Watt"),
race=c(6,5,4,3,2,1,6,5,4,3,2,1,6,5,4,3,2,1),
date=c(ymd('20170625'),ymd('20170524'),ymd('20170420'),ymd('20170329'),ymd('20170308'),ymd('20170215'),ymd('20170625'),ymd('20170524'),ymd('20170410'),ymd('20170329'),ymd('20170304'),ymd('20170215'),ymd('20170615'),ymd('20170524'),ymd('20170428'),ymd('20170329'),ymd('20170301'),ymd('20170225')),
surface=c('T','T','D','T','D','T','T','T','D','T','D','T','T','T','D','T','D','T'),
distance=c(1400,1400,1600,1400,1500,1400,1400,1400,1600,1400,1500,1400,1400,1400,1600,1400,1500,1400),
finish=c(1,2,2,1,2,3,2,3,3,2,1,1,3,1,1,3,3,2)
)
df <- df %>%
group_by(runner) %>%
mutate(last45 = map_int(date, ~ sum(between(as.numeric(difftime(.x, date, units = "days")), 1e-9, 90)))) %>%
ungroup()
df
#> # A tibble: 18 x 7
#> runner race date surface distance finish last45
#> <chr> <dbl> <date> <chr> <dbl> <dbl> <int>
#> 1 D.Wottle 6 2017-06-25 T 1400 1 3
#> 2 D.Wottle 5 2017-05-24 T 1400 2 3
#> 3 D.Wottle 4 2017-04-20 D 1600 2 3
#> 4 D.Wottle 3 2017-03-29 T 1400 1 2
#> 5 D.Wottle 2 2017-03-08 D 1500 2 1
#> 6 D.Wottle 1 2017-02-15 T 1400 3 0
#> 7 C.Hottle 6 2017-06-25 T 1400 2 3
#> 8 C.Hottle 5 2017-05-24 T 1400 3 3
#> 9 C.Hottle 4 2017-04-10 D 1600 3 3
#> 10 C.Hottle 3 2017-03-29 T 1400 2 2
#> 11 C.Hottle 2 2017-03-04 D 1500 1 1
#> 12 C.Hottle 1 2017-02-15 T 1400 1 0
#> 13 JJ.Watt 6 2017-06-15 T 1400 3 3
#> 14 JJ.Watt 5 2017-05-24 T 1400 1 4
#> 15 JJ.Watt 4 2017-04-28 D 1600 1 3
#> 16 JJ.Watt 3 2017-03-29 T 1400 3 2
#> 17 JJ.Watt 2 2017-03-01 D 1500 3 1
#> 18 JJ.Watt 1 2017-02-25 T 1400 2 0
由 reprex package 创建于 2020-05-13 (v0.3.0)
这就是我想要的最终结果:
最佳答案
df %>%
group_by(runner) %>%
mutate(
last45 = map_int(date, ~ sum(between(as.numeric(difftime(.x, date, units = "days")), 1e-9, 45)))
# ^^^^1 ^^^^2
) %>%
ungroup()
# # A tibble: 6 x 4
# runner race date last45
# <chr> <dbl> <date> <int>
# 1 D.Wottle 6 2017-06-25 1
# 2 D.Wottle 5 2017-05-24 1
# 3 D.Wottle 4 2017-04-20 2
# 4 D.Wottle 3 2017-03-29 2
# 5 D.Wottle 2 2017-03-08 1
# 6 D.Wottle 1 2017-02-15 0
注意事项:
对
date
的两个引用是不同的:“1”(波浪线函数之外)一次一个地传输到.x
中,所以.x
永远是一个日期; “2”(在代字号函数内)是原始日期列,因此将具有与当前运行者有行一样多的值;和我使用
1e-9
,因为如果我使用0
,那么总是考虑当天;通过使用1e-9
(或一些同样小的数字),我们得到有效的(lower,upper]
边界,副dplyr::between
默认为[lower,upper]
(两侧都关闭)。
关于r - 如何使用 dplyr 确定在指定天数内发生的事件数?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61785002/