我想替换特定列中两次之间的值。我知道这些值的最小和最大时间,并希望用特定标签替换这两个时间之间的所有数据点。
我有一个包含许多数据组的大型数据集,因此我将尝试在这里做一个简单的示例。假设我想用“峰值”替换“几乎峰值”,并且我知道这些标 checkout 现的最小/最大次数。
points <- c(1,2,3,3,4,3,2,1,11,12,13,14,13,13,12,11)
Status <- c("base", "base", "almost peak", "almost peak", "peak", "almost peak", "base", "base", "base", "base", "almost peak", "peak", "almost peak", "almost peak", "base", "base")
DateTime <- seq(from = as.POSIXct("2021-10-16 11:37:23"), to = as.POSIXct("2021-10-16 11:37:38"), by = "sec")
Group <- c(1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2)
df <- data.frame(points, Status, DateTime, Group)
#for getting the min and max times of "almost peak" occurrences
df.test <- df %>% group_by(Group) %>%
filter(Status == "almost peak") %>%
summarise(
MinTime = min(DateTime),
MaxTime = max(DateTime)
)
>print(df)
points Status DateTime Group
1 1 base 2021-10-16 11:37:23 1
2 2 base 2021-10-16 11:37:24 1
3 3 almost peak 2021-10-16 11:37:25 1
4 3 almost peak 2021-10-16 11:37:26 1
5 4 peak 2021-10-16 11:37:27 1
6 3 almost peak 2021-10-16 11:37:28 1
7 2 base 2021-10-16 11:37:29 1
8 1 base 2021-10-16 11:37:30 1
9 11 base 2021-10-16 11:37:31 2
10 12 base 2021-10-16 11:37:32 2
11 13 almost peak 2021-10-16 11:37:33 2
12 14 peak 2021-10-16 11:37:34 2
13 13 almost peak 2021-10-16 11:37:35 2
14 13 almost peak 2021-10-16 11:37:36 2
15 12 base 2021-10-16 11:37:37 2
16 11 base 2021-10-16 11:37:38 2
同样,我想将每个组的 MinTime
和 MaxTime
之间的所有数据点替换为“peak”。
我尝试过将 mutate()
与 replace()
一起使用,如下所示,但它似乎不起作用。
我认为这很接近,但不太正确。
df.test.replace <- df %>%
group_by(Group) %>%
mutate(Status = replace(Status, DateTime >= df.test$MinTime & DateTime <= df.test$MaxTime, "peak"))
作为澄清,这是我想要的输出。最短/最长时间之间的所有状态标签均已替换为“峰值”
points Status DateTime Group
1 1 base 2021-10-16 11:37:23 1
2 2 base 2021-10-16 11:37:24 1
3 3 peak 2021-10-16 11:37:25 1
4 3 peak 2021-10-16 11:37:26 1
5 4 peak 2021-10-16 11:37:27 1
6 3 peak 2021-10-16 11:37:28 1
7 2 base 2021-10-16 11:37:29 1
8 1 base 2021-10-16 11:37:30 1
9 11 base 2021-10-16 11:37:31 2
10 12 base 2021-10-16 11:37:32 2
11 13 peak 2021-10-16 11:37:33 2
12 14 peak 2021-10-16 11:37:34 2
13 13 peak 2021-10-16 11:37:35 2
14 13 peak 2021-10-16 11:37:36 2
15 12 base 2021-10-16 11:37:37 2
16 11 base 2021-10-16 11:37:38 2
任何指示将不胜感激。谢谢。
最佳答案
您需要索引正确的值才能进行替换。尝试使用case_when
来自 dplyr:
library(dplyr)
df %>%
group_by(Group) %>%
mutate(Status = case_when(
DateTime >= df.test$MinTime[1] &
DateTime <= df.test$MaxTime[1] ~ "peak",
DateTime >= df.test$MinTime[2] &
DateTime <= df.test$MaxTime[2] ~ "peak",
TRUE ~ as.character(Status)))
如果您想避免手动索引,请将所有数据放在同一个数据框中:
df_all <- dplyr::left_join(df, df.test, by = "Group")
然后使用同一表中的变量“MinTime”和“MaxTime”运行代码,而不是从另一个数据帧调用:
df_all %>%
mutate(Status = case_when(
DateTime >= MinTime &
DateTime <= MaxTime ~ "peak",
TRUE ~ as.character(Status)))
关于替换两个给定时间之间的列中的值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/70026171/