根据其他列替换数据框中的列值

我有以下按名称和时间排序的数据框。

set.seed(100)
df <- data.frame('name' = c(rep('x', 6), rep('y', 4)), 
                 'time' = c(rep(1, 2), rep(2, 3), 3, 1, 2, 3, 4),
                 'score' = c(0, sample(1:10, 3), 0, sample(1:10, 2), 0, sample(1:10, 2))
                 )
> df
   name time score
1     x    1     0
2     x    1     4
3     x    2     3
4     x    2     5
5     x    2     0
6     x    3     1
7     y    1     5
8     y    2     0
9     y    3     5
10    y    4     8

在 df$score 中有零后面跟着未知数量的实际值，即 df[1:4,]，有时会有重叠的 df$name 在两个 df$score == 0 之间，即 df[6:7,]。

我想更改 df$time，其中 df$score != 0。具体来说，如果 df$name 匹配，我想用 df$score == 0 分配最近的上行的时间值。

以下代码提供了良好的输出，但我的数据有数百万行，因此该解决方案效率非常低。

score_0 <- append(which(df$score == 0), dim(df)[1] + 1)

for(i in 1:(length(score_0) - 1)) {
  df$time[score_0[i]:(score_0[i + 1] - 1)] <-
    ifelse(df$name[score_0[i]:(score_0[i + 1] - 1)] == df$name[score_0[i]], 
           df$time[score_0[i]], 
           df$time[score_0[i]:(score_0[i + 1] - 1)])
 }

> df
   name time score
1     x    1     0
2     x    1     4
3     x    1     3
4     x    1     5
5     x    2     0
6     x    2     1
7     y    1     5
8     y    2     0
9     y    2     5
10    y    2     8

score_0 给出了 df$score == 0 的索引。我们看到 df$time[2:4] 现在都等于 1，在 df$time[6:7] 中只有第一个改变了，因为第二个有 df$name == 'y' 并且最近的上行有 df$score == 0 有 df$name == 'x'。最后两行也已正确更改。

最佳答案

你可以这样做:

library(dplyr)
df %>% group_by(name) %>% mutate(ID=cumsum(score==0)) %>% 
       group_by(name,ID) %>% mutate(time = head(time,1)) %>% 
       ungroup() %>%  select(name,time,score) %>% as.data.frame()

#       name time  score
# 1     x    1     0
# 2     x    1     8
# 3     x    1    10
# 4     x    1     6
# 5     x    2     0
# 6     x    2     5
# 7     y    1     4
# 8     y    2     0
# 9     y    2     5
# 10    y    2     9

关于根据其他列替换数据框中的列值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/53125098/

根据其他列替换数据框中的列值

上一篇：netsuite - 从消息记录查询附件

下一篇：r - 单个菜单的多个输出