使用 R 我当前的表如下所示:
C1 C2 C3
1 2011-02-01 04:30:00 4
2 2011-02-01 04:45:00 3
3 2011-02-01 05:00:00 5
4 2011-02-01 05:15:00 6
我希望它看起来像这样:
C1 C2 C3 C4
1 2011-02-01 04:30:00 4 2011-02-01 04:30:00
2 2011-02-01 04:30:00 4 2011-02-01 04:35:00
3 2011-02-01 04:30:00 4 2011-02-01 04:40:00
4 2011-02-01 04:45:00 3 2011-02-01 04:45:00
5 2011-02-01 04:45:00 3 2011-02-01 04:50:00
6 2011-02-01 04:45:00 3 2011-02-01 04:55:00
7 2011-02-01 05:00:00 5 2011-02-01 05:00:00
8 2011-02-01 05:00:00 5 2011-02-01 05:05:00
等等。等等,基本上只是想制作另一列,该列以五分钟的间隔上升,但与 C2 中的间隔匹配。 我正在考虑类似rep()函数的东西,但这意味着C2中的间隔总是一致的,但它们可能不是一致的。我真的在寻找能够根据 C2 中的间隔进行五分钟间隔的东西。
对此问题的任何帮助或反馈将不胜感激。谢谢
最佳答案
我们可以使用map2
通过获取Datetime
转换后的'C2的seq
影响来创建一个list
列',length
由“C3”的相应元素按
5 分钟间隔指定,并取消嵌套
list
列
library(tidyverse)
df1 %>%
mutate(C4 = map2(lubridate::ymd_hms(C2), C3, ~ seq(.x, length.out = .y, by = '5 min'))) %>%
unnest
# C1 C2 C3 C4
#1 1 2011-02-01 04:30:00 4 2011-02-01 04:30:00
#2 1 2011-02-01 04:30:00 4 2011-02-01 04:35:00
#3 1 2011-02-01 04:30:00 4 2011-02-01 04:40:00
#4 1 2011-02-01 04:30:00 4 2011-02-01 04:45:00
#5 2 2011-02-01 04:45:00 3 2011-02-01 04:45:00
#6 2 2011-02-01 04:45:00 3 2011-02-01 04:50:00
#7 2 2011-02-01 04:45:00 3 2011-02-01 04:55:00
#8 3 2011-02-01 05:00:00 5 2011-02-01 05:00:00
#9 3 2011-02-01 05:00:00 5 2011-02-01 05:05:00
#10 3 2011-02-01 05:00:00 5 2011-02-01 05:10:00
#11 3 2011-02-01 05:00:00 5 2011-02-01 05:15:00
#12 3 2011-02-01 05:00:00 5 2011-02-01 05:20:00
#13 4 2011-02-01 05:15:00 6 2011-02-01 05:15:00
#14 4 2011-02-01 05:15:00 6 2011-02-01 05:20:00
#15 4 2011-02-01 05:15:00 6 2011-02-01 05:25:00
#16 4 2011-02-01 05:15:00 6 2011-02-01 05:30:00
#17 4 2011-02-01 05:15:00 6 2011-02-01 05:35:00
#18 4 2011-02-01 05:15:00 6 2011-02-01 05:40:00
或者使用base R
中的Map
,获取日期时间序列的列表
,其逻辑与上面相同。通过根据“lst1”的长度
复制行序列来扩展原始数据集,并创建新列“C4”
lst1 <- Map(function(x, y) seq(x, length.out = y, by = '5 min'),
as.POSIXct(df1$C2), df1$C3)
df2 <- df1[rep(seq_len(nrow(df1)), lengths(lst1)),]
df2$C4 <- do.call(c, lst1)
row.names(df2) <- NULL
如果条件基于“C2”的下一个值
df1 %>%
mutate(C4 = map2(ymd_hms(C2), lubridate::ymd_hms(lead(C2, default = last(C2))),
seq, by = '5 min')) %>%
unnest %>%
group_by(C1) %>%
slice(-1)
# A tibble: 9 x 4
# Groups: C1 [3]
# C1 C2 C3 C4
# <int> <chr> <int> <dttm>
#1 1 2011-02-01 04:30:00 4 2011-02-01 04:35:00
#2 1 2011-02-01 04:30:00 4 2011-02-01 04:40:00
#3 1 2011-02-01 04:30:00 4 2011-02-01 04:45:00
#4 2 2011-02-01 04:45:00 3 2011-02-01 04:50:00
#5 2 2011-02-01 04:45:00 3 2011-02-01 04:55:00
#6 2 2011-02-01 04:45:00 3 2011-02-01 05:00:00
#7 3 2011-02-01 05:00:00 5 2011-02-01 05:05:00
#8 3 2011-02-01 05:00:00 5 2011-02-01 05:10:00
#9 3 2011-02-01 05:00:00 5 2011-02-01 05:15:00
或使用data.table
中的方法的类似选项
library(data.table)
setDT(df1)[, C2 := as.POSIXct(C2)][, C4 := list(Map(seq,
MoreArgs = list(by = '5 min'), C2, shift(C2, type = 'lead',
fill = last(C2))))][, unnest(.SD)][, .SD[-1], by = C1]
数据
df1 <- structure(list(C1 = 1:4, C2 = c("2011-02-01 04:30:00", "2011-02-01 04:45:00",
"2011-02-01 05:00:00", "2011-02-01 05:15:00"), C3 = c(4L, 3L,
5L, 6L)), class = "data.frame", row.names = c(NA, -4L))
关于r - 重复行,以便所有列保持不变,但一列依次变大,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54069721/