我的数据子集如下所示,但具有更多分组 (ID):
ID time class
<chr> <dttm> <fct>
1 BBR-b172021-M_fall_winter_4 2022-11-01 19:03:31 migrating
2 BBR-b172021-M_fall_winter_4 2022-11-04 22:03:33 migrating
3 BBR-b172021-M_fall_winter_4 2022-11-07 18:03:34 migrating
4 BBR-b172021-M_fall_winter_4 2022-11-08 21:03:34 stopover
5 BBR-b172021-M_fall_winter_4 2022-11-10 21:03:39 stopover
6 BBR-b172021-M_fall_winter_4 2022-11-14 18:03:37 migrating
7 BBR-b172021-M_fall_winter_4 2022-11-17 06:04:08 migrating
8 BBR-b172021-M_fall_winter_4 2022-11-18 06:04:08 stopover
9 BBR-b172021-M_fall_winter_4 2022-11-19 00:03:41 winter
10 BBR-b172021-M_fall_winter_4 2022-11-27 00:03:51 winter
11 LINWR-b1282020-M_fall_winter_3 2022-01-14 11:00:08 migrating
12 LINWR-b1282020-M_fall_winter_3 2022-01-15 13:59:45 stopover
13 LINWR-b1282020-M_fall_winter_3 2022-01-20 02:59:54 stopover
14 LINWR-b1282020-M_fall_winter_3 2022-01-21 03:00:14 migrating
15 LINWR-b1282020-M_fall_winter_3 2022-01-21 16:59:47 stopover
16 LINWR-b1282020-M_fall_winter_3 2022-01-22 16:59:45 winter
我正在尝试通过 mapping
创建唯一的列或group_by
和mutate
但我不知道从哪里开始。我想要几个新的专栏来描述独特的顺序事件、它们的总和以及它们的持续时间。添加到数据框中的新列我怀疑会如下所示:
newcols <- data.frame(unique_class = c("migrating1", "migrating1", "migrating1", "stopover1",
"stopover1", "migrating2", "migrating2", "stopover2",
"winter1", "winter1", "migrating1", "stopover1",
"stopover1", "migrating2", "stopover2", "winter1"),
migrate_sum = c(2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2),
stopover_sum = c(2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2),
winter_sum = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1),
event_duration = c(6,6,6,2,2,3,3,0,8,8,0,5,5,0,0,0))
...其中 event_duration 列等于以天或小时为单位的时间。我知道我需要group_by(ID)
但是和mutate()
但不确定如何获取每个类(class)的独特类(class)或滞后时间。任何帮助表示赞赏。
不知道该把它放在哪里,所以编辑我的问题:我尝试了@AKRUN 解决方案,但它不太有效。它很好地生成了 UNIQUE_CLASS,但摘要并非不正确。以下是使用以下解决方案生成的数据帧示例,并按唯一 ID 进行子集:fall_mig2 %>% filter(BirdsID_season == "BBR-b432021-M_fall_winter_4") %>% select(BirdsID_season, x, y, time, unique_class, class, stopover_sum) slice_head <- fall_mig2 %>% filter(BirdsID_season == "BBR-b432021-M_fall_winter_4") %>% slice_head(n = 10) slice_tail <- fall_mig2 %>% filter(BirdsID_season == "BBR-b432021-M_fall_winter_4") %>% slice_tail(n = 10) bind_rows(slice_head, slice_tail) %>% select(BirdsID_season, x, y, time, stopover_sum)
结果:
BirdsID_season x y time unique_class class stopover_sum
<chr> <dbl> <dbl> <dttm> <chr> <chr> <int>
1 BBR-b432021-M_fall_winter_4 -99.2 48.1 2022-11-09 19:09:01 migrating1 migrating 3
2 BBR-b432021-M_fall_winter_4 -99.2 48.1 2022-11-09 21:08:36 migrating1 migrating 3
3 BBR-b432021-M_fall_winter_4 -99.2 48.1 2022-11-09 23:08:55 migrating1 migrating 3
4 BBR-b432021-M_fall_winter_4 -99.2 48.1 2022-11-10 01:09:11 migrating1 migrating 3
5 BBR-b432021-M_fall_winter_4 -99.2 48.1 2022-11-10 03:08:50 migrating1 migrating 3
6 BBR-b432021-M_fall_winter_4 -99.2 48.1 2022-11-10 05:09:06 migrating1 migrating 3
7 BBR-b432021-M_fall_winter_4 -99.2 48.1 2022-11-10 07:08:43 migrating1 migrating 3
8 BBR-b432021-M_fall_winter_4 -99.2 48.1 2022-11-10 09:08:54 migrating1 migrating 3
9 BBR-b432021-M_fall_winter_4 -99.2 48.1 2022-11-10 11:09:07 migrating1 migrating 3
10 BBR-b432021-M_fall_winter_4 -99.2 48.1 2022-11-10 13:08:39 migrating1 migrating 3
11 BBR-b432021-M_fall_winter_4 -89.3 36.7 2022-12-13 23:08:30 winter1 winter 1
12 BBR-b432021-M_fall_winter_4 -89.3 36.7 2022-12-14 01:08:45 winter1 winter 1
13 BBR-b432021-M_fall_winter_4 -89.3 36.7 2022-12-14 03:08:45 winter1 winter 1
14 BBR-b432021-M_fall_winter_4 -89.3 36.7 2022-12-14 05:08:26 winter1 winter 1
15 BBR-b432021-M_fall_winter_4 -89.3 36.7 2022-12-14 07:08:22 winter1 winter 1
16 BBR-b432021-M_fall_winter_4 -89.3 36.7 2022-12-14 09:08:45 winter1 winter 1
17 BBR-b432021-M_fall_winter_4 -89.3 36.7 2022-12-14 11:08:54 winter1 winter 1
18 BBR-b432021-M_fall_winter_4 -89.3 36.7 2022-12-14 13:08:19 winter1 winter 1
19 BBR-b432021-M_fall_winter_4 -89.3 36.7 2022-12-14 15:08:47 winter1 winter 1
20 BBR-b432021-M_fall_winter_4 -89.4 36.7 2022-12-14 17:08:19 winter1 winter 1
stopover_sum
应该是 1 (位于子集 df 的中间)
。我不知道3是从哪里来的。现在尝试剖析解决方案。
最佳答案
我们可以创建一个按“类别”分组的运行长度 ID 列,将“时间”转换为 Date
class,然后按 'ID'、'class' 分组,获取 'grp' 中不同 ( n_distinct
) 元素的数量,以及 unique_class
由 paste
创建使用 unique
来查找“类” “grp”索引。按“ID”、“unique_class”进行第二次分组以计算“event_duration”,即。 max/min
之间的天数“日期”值,选择感兴趣的列,使用 pivot_wider
reshape 为“宽”和fill
_sum
中的值到之前的非 NA 值
library(dplyr)
library(lubridate)
library(tidyr)
library(stringr)
library(data.table)
df1 %>%
mutate(grp = rleid(class), date = as.Date(ymd_hms(time))) %>%
group_by(ID, class) %>%
mutate(Count = n_distinct(grp),
unique_class = str_c(class, match(grp, unique(grp)))) %>%
group_by(ID, unique_class) %>%
mutate(event_duration = as.integer(max(date) - min(date))) %>%
ungroup %>%
transmute(rn = row_number(), class = str_c(class, '_sum'),
Count, unique_class, event_duration) %>%
pivot_wider(names_from = class, values_from = Count) %>%
fill(ends_with("_sum"), .direction = "downup") %>%
select(-rn) %>%
relocate(event_duration, .after = last_col())
-输出
# A tibble: 16 × 5
unique_class migrating_sum stopover_sum winter_sum event_duration
<chr> <int> <int> <int> <int>
1 migrating1 2 2 1 6
2 migrating1 2 2 1 6
3 migrating1 2 2 1 6
4 stopover1 2 2 1 2
5 stopover1 2 2 1 2
6 migrating2 2 2 1 3
7 migrating2 2 2 1 3
8 stopover2 2 2 1 0
9 winter1 2 2 1 8
10 winter1 2 2 1 8
11 migrating1 2 2 1 0
12 stopover1 2 2 1 5
13 stopover1 2 2 1 5
14 migrating2 2 2 1 0
15 stopover2 2 2 1 0
16 winter1 2 2 1 0
数据
df1 <- structure(list(ID = c("BBR-b172021-M_fall_winter_4",
"BBR-b172021-M_fall_winter_4",
"BBR-b172021-M_fall_winter_4", "BBR-b172021-M_fall_winter_4",
"BBR-b172021-M_fall_winter_4", "BBR-b172021-M_fall_winter_4",
"BBR-b172021-M_fall_winter_4", "BBR-b172021-M_fall_winter_4",
"BBR-b172021-M_fall_winter_4", "BBR-b172021-M_fall_winter_4",
"LINWR-b1282020-M_fall_winter_3", "LINWR-b1282020-M_fall_winter_3",
"LINWR-b1282020-M_fall_winter_3", "LINWR-b1282020-M_fall_winter_3",
"LINWR-b1282020-M_fall_winter_3", "LINWR-b1282020-M_fall_winter_3"
), time = c("2022-11-01 19:03:31", "2022-11-04 22:03:33", "2022-11-07 18:03:34",
"2022-11-08 21:03:34", "2022-11-10 21:03:39", "2022-11-14 18:03:37",
"2022-11-17 06:04:08", "2022-11-18 06:04:08", "2022-11-19 00:03:41",
"2022-11-27 00:03:51", "2022-01-14 11:00:08", "2022-01-15 13:59:45",
"2022-01-20 02:59:54", "2022-01-21 03:00:14", "2022-01-21 16:59:47",
"2022-01-22 16:59:45"), class = c("migrating", "migrating", "migrating",
"stopover", "stopover", "migrating", "migrating", "stopover",
"winter", "winter", "migrating", "stopover", "stopover", "migrating",
"stopover", "winter")), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16"))
关于r - 为唯一事件创建新列,然后按组对 R 中的事件进行计数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/75403669/