r - 在 R 中按组创建连续年份的计数

标签 r dplyr tidyr lubridate

这里是新手。我正在寻找一个 dplyr 解决方案(最好)来创建一个显示组内连续年份数的向量。如果序列被任何间隙打断,即使是同一组,计数器也应该重新启动。

我的数据看起来与此类似:

library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
library(magrittr)
library(tidyverse)

df <- tribble(
    ~id, ~ref, ~branch, ~year, ~unit, ~client, ~group,
    1, 561, "LA", 2000, "x", "y", "z",  
    2, 561, "LA", 2001, "x", "y", "z",
    3, 561, "LA", 2002, "x", "y", "z",
    4, 561, "LA", 2003, "x", "y", "z",
    5, 561, "LA", 2004, "x", "y", "z",
    6, 561, "LA", 2005, "x", "y", "z",
    7, 561, "LA", 2007, "x", "y", "z",
    8, 561, "LA", 2008, "x", "y", "z",
    9, 561, "LA", 2009, "x", "y", "z",
    )

我的预期输出将是这样的,其中添加了“seq_count”:

df_exp <- tribble(
    ~id, ~ref, ~branch, ~year, ~unit, ~client, ~group, ~seq_count,
    1, 561, "LA", 2000, "x", "y", "z", 6,
    2, 561, "LA", 2001, "x", "y", "z", 6,
    3, 561, "LA", 2002, "x", "y", "z", 6,
    4, 561, "LA", 2003, "x", "y", "z", 6,
    5, 561, "LA", 2004, "x", "y", "z", 6,
    6, 561, "LA", 2005, "x", "y", "z", 6,
    7, 561, "LA", 2007, "x", "y", "z", 3,
    8, 561, "LA", 2008, "x", "y", "z", 3,
    9, 561, "LA", 2009, "x", "y", "z", 3,
    )

我已尝试使用 dplyr::add_count 如下:

df1 <- df %>% 
    group_by(ref, branch, unit, client, group) %>% 
    add_count()

但是,这仅添加了 group_by 命令指定的计数,并没有考虑 2005 年和 2007 年之间的差距。有没有一种方法可以在 R 中以简洁的方式执行此操作?

最佳答案

您可以创建另一个组,该组会在年份之间存在差距时发生变化。

library(dplyr)
df %>% 
    add_count(group, grp = cumsum(year - lag(year, default = first(year)) > 1), 
               name = 'seq_count')

# A tibble: 9 x 9
#     id   ref branch  year unit  client group   grp seq_count
#  <dbl> <dbl> <chr>  <dbl> <chr> <chr>  <chr> <int>     <int>
#1     1   561 LA      2000 x     y      z         0         6
#2     2   561 LA      2001 x     y      z         0         6
#3     3   561 LA      2002 x     y      z         0         6
#4     4   561 LA      2003 x     y      z         0         6
#5     5   561 LA      2004 x     y      z         0         6
#6     6   561 LA      2005 x     y      z         0         6
#7     7   561 LA      2007 x     y      z         1         3
#8     8   561 LA      2008 x     y      z         1         3
#9     9   561 LA      2009 x     y      z         1         3

或者使用n()

df %>%
  group_by(group, grp = cumsum(year - lag(year, default = first(year)) > 1)) %>%
  mutate(seq_count = n())

关于r - 在 R 中按组创建连续年份的计数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62630702/

相关文章:

正则表达式删除 r 中的 .csv

r - 在R中:how to extract part of list of list

r - 向量化矩阵

mysql - 有没有在数据库查询中使用 REGEXP 的 dplyr 方法?

r - 创建新变量的字符位置标识

r - 如何使用gather()函数指定多列来整理数据

r - kmeans 对分组数据进行聚类

重新排序具有重复 ID 的数据框

r - 绘制GLM回归方程并在GGPlot上平方R

R pivot_wider 所以重复行成为标题