r - 为唯一事件创建新列,然后按组对 R 中的事件进行计数

标签 r time-series uniqueidentifier data-management

我的数据子集如下所示,但具有更多分组 (ID):

 ID                          time                class    
   <chr>                       <dttm>              <fct>    
 1 BBR-b172021-M_fall_winter_4 2022-11-01 19:03:31 migrating
 2 BBR-b172021-M_fall_winter_4 2022-11-04 22:03:33 migrating 
 3 BBR-b172021-M_fall_winter_4 2022-11-07 18:03:34 migrating 
 4 BBR-b172021-M_fall_winter_4 2022-11-08 21:03:34 stopover 
 5 BBR-b172021-M_fall_winter_4 2022-11-10 21:03:39 stopover 
 6 BBR-b172021-M_fall_winter_4 2022-11-14 18:03:37 migrating 
 7 BBR-b172021-M_fall_winter_4 2022-11-17 06:04:08 migrating 
 8 BBR-b172021-M_fall_winter_4 2022-11-18 06:04:08 stopover 
 9 BBR-b172021-M_fall_winter_4 2022-11-19 00:03:41 winter 
10 BBR-b172021-M_fall_winter_4 2022-11-27 00:03:51 winter 
11 LINWR-b1282020-M_fall_winter_3 2022-01-14 11:00:08 migrating
12 LINWR-b1282020-M_fall_winter_3 2022-01-15 13:59:45 stopover
13 LINWR-b1282020-M_fall_winter_3 2022-01-20 02:59:54 stopover
14 LINWR-b1282020-M_fall_winter_3 2022-01-21 03:00:14 migrating
15 LINWR-b1282020-M_fall_winter_3 2022-01-21 16:59:47 stopover
16 LINWR-b1282020-M_fall_winter_3 2022-01-22 16:59:45 winter

我正在尝试通过 mapping 创建唯一的列或group_bymutate但我不知道从哪里开始。我想要几个新的专栏来描述独特的顺序事件、它们的总和以及它们的持续时间。添加到数据框中的新列我怀疑会如下所示:


newcols <- data.frame(unique_class = c("migrating1", "migrating1", "migrating1", "stopover1", 
                                       "stopover1", "migrating2", "migrating2", "stopover2", 
                                       "winter1", "winter1", "migrating1", "stopover1", 
                                       "stopover1", "migrating2", "stopover2", "winter1"),
                      migrate_sum = c(2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2),
                      stopover_sum = c(2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2),
                      winter_sum = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1),
                      event_duration = c(6,6,6,2,2,3,3,0,8,8,0,5,5,0,0,0))

...其中 event_duration 列等于以天或小时为单位的时间。我知道我需要group_by(ID)但是和mutate()但不确定如何获取每个类(class)的独特类(class)或滞后时间。任何帮助表示赞赏。

不知道该把它放在哪里,所以编辑我的问题:我尝试了@AKRUN 解决方案,但它不太有效。它很好地生成了 UNIQUE_CLASS,但摘要并非不正确。以下是使用以下解决方案生成的数据帧示例,并按唯一 ID 进行子集:fall_mig2 %>% filter(BirdsID_season == "BBR-b432021-M_fall_winter_4") %>% select(BirdsID_season, x, y, time, unique_class, class, stopover_sum) slice_head <- fall_mig2 %>% filter(BirdsID_season == "BBR-b432021-M_fall_winter_4") %>% slice_head(n = 10) slice_tail <- fall_mig2 %>% filter(BirdsID_season == "BBR-b432021-M_fall_winter_4") %>% slice_tail(n = 10) bind_rows(slice_head, slice_tail) %>% select(BirdsID_season, x, y, time, stopover_sum)结果:

 BirdsID_season                  x     y time                unique_class class     stopover_sum
   <chr>                       <dbl> <dbl> <dttm>              <chr>        <chr>            <int>
 1 BBR-b432021-M_fall_winter_4 -99.2  48.1 2022-11-09 19:09:01 migrating1   migrating            3
 2 BBR-b432021-M_fall_winter_4 -99.2  48.1 2022-11-09 21:08:36 migrating1   migrating            3
 3 BBR-b432021-M_fall_winter_4 -99.2  48.1 2022-11-09 23:08:55 migrating1   migrating            3
 4 BBR-b432021-M_fall_winter_4 -99.2  48.1 2022-11-10 01:09:11 migrating1   migrating            3
 5 BBR-b432021-M_fall_winter_4 -99.2  48.1 2022-11-10 03:08:50 migrating1   migrating            3
 6 BBR-b432021-M_fall_winter_4 -99.2  48.1 2022-11-10 05:09:06 migrating1   migrating            3
 7 BBR-b432021-M_fall_winter_4 -99.2  48.1 2022-11-10 07:08:43 migrating1   migrating            3
 8 BBR-b432021-M_fall_winter_4 -99.2  48.1 2022-11-10 09:08:54 migrating1   migrating            3
 9 BBR-b432021-M_fall_winter_4 -99.2  48.1 2022-11-10 11:09:07 migrating1   migrating            3
10 BBR-b432021-M_fall_winter_4 -99.2  48.1 2022-11-10 13:08:39 migrating1   migrating            3
11 BBR-b432021-M_fall_winter_4 -89.3  36.7 2022-12-13 23:08:30 winter1      winter               1
12 BBR-b432021-M_fall_winter_4 -89.3  36.7 2022-12-14 01:08:45 winter1      winter               1
13 BBR-b432021-M_fall_winter_4 -89.3  36.7 2022-12-14 03:08:45 winter1      winter               1
14 BBR-b432021-M_fall_winter_4 -89.3  36.7 2022-12-14 05:08:26 winter1      winter               1
15 BBR-b432021-M_fall_winter_4 -89.3  36.7 2022-12-14 07:08:22 winter1      winter               1
16 BBR-b432021-M_fall_winter_4 -89.3  36.7 2022-12-14 09:08:45 winter1      winter               1
17 BBR-b432021-M_fall_winter_4 -89.3  36.7 2022-12-14 11:08:54 winter1      winter               1
18 BBR-b432021-M_fall_winter_4 -89.3  36.7 2022-12-14 13:08:19 winter1      winter               1
19 BBR-b432021-M_fall_winter_4 -89.3  36.7 2022-12-14 15:08:47 winter1      winter               1
20 BBR-b432021-M_fall_winter_4 -89.4  36.7 2022-12-14 17:08:19 winter1      winter               1

stopover_sum应该是 1 (位于子集 df 的中间) 。我不知道3是从哪里来的。现在尝试剖析解决方案。

最佳答案

我们可以创建一个按“类别”分组的运行长度 ID 列,将“时间”转换为 Date class,然后按 'ID'、'class' 分组,获取 'grp' 中不同 ( n_distinct ) 元素的数量,以及 unique_classpaste 创建使用 unique 来查找“类” “grp”索引。按“ID”、“unique_class”进行第二次分组以计算“event_duration”,即。 max/min 之间的天数“日期”值,选择感兴趣的列,使用 pivot_wider reshape 为“宽”和fill _sum 中的值到之前的非 NA 值

library(dplyr)
library(lubridate)
library(tidyr)
library(stringr)
library(data.table)
df1 %>% 
   mutate(grp = rleid(class), date = as.Date(ymd_hms(time))) %>% 
   group_by(ID, class) %>% 
   mutate(Count = n_distinct(grp), 
     unique_class = str_c(class, match(grp, unique(grp)))) %>% 
   group_by(ID, unique_class) %>% 
   mutate(event_duration = as.integer(max(date) - min(date))) %>% 
   ungroup %>% 
   transmute(rn = row_number(), class = str_c(class, '_sum'),
    Count, unique_class, event_duration) %>% 
   pivot_wider(names_from = class, values_from = Count) %>% 
   fill(ends_with("_sum"), .direction = "downup") %>% 
   select(-rn) %>%
   relocate(event_duration, .after = last_col())

-输出

# A tibble: 16 × 5
   unique_class migrating_sum stopover_sum winter_sum event_duration
   <chr>                <int>        <int>      <int>          <int>
 1 migrating1               2            2          1              6
 2 migrating1               2            2          1              6
 3 migrating1               2            2          1              6
 4 stopover1                2            2          1              2
 5 stopover1                2            2          1              2
 6 migrating2               2            2          1              3
 7 migrating2               2            2          1              3
 8 stopover2                2            2          1              0
 9 winter1                  2            2          1              8
10 winter1                  2            2          1              8
11 migrating1               2            2          1              0
12 stopover1                2            2          1              5
13 stopover1                2            2          1              5
14 migrating2               2            2          1              0
15 stopover2                2            2          1              0
16 winter1                  2            2          1              0

数据

df1 <- structure(list(ID = c("BBR-b172021-M_fall_winter_4",
 "BBR-b172021-M_fall_winter_4", 
"BBR-b172021-M_fall_winter_4", "BBR-b172021-M_fall_winter_4", 
"BBR-b172021-M_fall_winter_4", "BBR-b172021-M_fall_winter_4", 
"BBR-b172021-M_fall_winter_4", "BBR-b172021-M_fall_winter_4", 
"BBR-b172021-M_fall_winter_4", "BBR-b172021-M_fall_winter_4", 
"LINWR-b1282020-M_fall_winter_3", "LINWR-b1282020-M_fall_winter_3", 
"LINWR-b1282020-M_fall_winter_3", "LINWR-b1282020-M_fall_winter_3", 
"LINWR-b1282020-M_fall_winter_3", "LINWR-b1282020-M_fall_winter_3"
), time = c("2022-11-01 19:03:31", "2022-11-04 22:03:33", "2022-11-07 18:03:34", 
"2022-11-08 21:03:34", "2022-11-10 21:03:39", "2022-11-14 18:03:37", 
"2022-11-17 06:04:08", "2022-11-18 06:04:08", "2022-11-19 00:03:41", 
"2022-11-27 00:03:51", "2022-01-14 11:00:08", "2022-01-15 13:59:45", 
"2022-01-20 02:59:54", "2022-01-21 03:00:14", "2022-01-21 16:59:47", 
"2022-01-22 16:59:45"), class = c("migrating", "migrating", "migrating", 
"stopover", "stopover", "migrating", "migrating", "stopover", 
"winter", "winter", "migrating", "stopover", "stopover", "migrating", 
"stopover", "winter")), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16"))

关于r - 为唯一事件创建新列,然后按组对 R 中的事件进行计数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/75403669/

相关文章:

r - geom_smooth和指数拟合

css - 如何删除与 Shiny 控件小部件标签对应的像素?

machine-learning - 时间序列预测: weekly vs daily predictions

android - 主机卡仿真 - 是否可以将字符串从一部手机传输到另一部手机?

r - 转置和计算 PIL 逊相关

r - 使用 doParallel 时如何获取节点或进程 ID?

python - 如何对齐两个大小不等的时间序列 numpy 数组?

python - Python脚本中数组数据转换的设计模式

java - 将 Unix hostid 获取到 Java

windows - 生成唯一硬件 ID 的可靠方法