r - 如何按组用第一个非缺失值填充缺失值？

我有以下数据结构:

  library(dplyr)

  test_data <- data.frame(some_dimension = c(rep("first",6),rep("second",6)),
                          first_col = c(rep(NA,3),rep(1,3),rep(NA,3),rep(0,3)),
                          second_col = c(rep(NA,3),rep(0,3),rep(NA,3),rep(1,3)),
                          third_col = c(rep(NA,3),rep(1,3),rep(NA,3),rep(1,3)))

      some_dimension first_col second_col third_col
1           first        NA         NA        NA
2           first        NA         NA        NA
3           first        NA         NA        NA
4           first         1          0         1
5           first         1          0         1
6           first         1          0         1
7          second        NA         NA        NA
8          second        NA         NA        NA
9          second        NA         NA        NA
10         second         0          1         1
11         second         0          1         1
12         second         0          1         1

我想得到如下数据结构:

  expexted_data <- data.frame(some_dimension = c(rep("first",6),rep("second",6)),
                          first_col = c(rep(0,3),rep(1,3),rep(1,3),rep(0,3)),
                          second_col = c(rep(1,3),rep(0,3),rep(0,3),rep(1,3)),
                          third_col = c(rep(0,3),rep(1,3),rep(0,3),rep(1,3)))


     some_dimension first_col second_col third_col
1           first         0          1         0
2           first         0          1         0
3           first         0          1         0
4           first         1          0         1
5           first         1          0         1
6           first         1          0         1
7          second         1          0         0
8          second         1          0         0
9          second         1          0         0
10         second         0          1         1
11         second         0          1         1
12         second         0          1         1

也就是说，我想用第一个非缺失值(按 some_dimension 分组)的相反值填充缺失值，其中值的范围为 (0,1)。

我最后尝试的是以下内容。它基本上找到所有非遗漏并采用最小的索引。但是我在正确应用该功能方面遇到了一些困难:

my_fun <- function(x){
   all_non_missings <- which(!is.na(x))
   first_non_missing <- min(all_non_missings)
   if(.data[first_non_missing] == 1){
    is.na(x) <- rep(0, length.out = length(x))
  } else {
    is.na(x) <- rep(1, length.out = length(x))
  }
}

test_data %>% group_by(some_dimension) %>% mutate_if(is.numeric, funs(new = my_fun(.)))

我总是遇到一些错误，例如:

mutate_impl(.data, dots) 中的错误:评估错误:(list) 对象不能被强制键入“double”。回溯: 例如

最佳答案

试试“zoo”包中的na.locf函数:

library(zoo)
test_data %>%
   group_by(some_dimension) %>% 
   mutate_if(is.numeric,funs(ifelse(is.na(.),1-na.locf(.,fromLast=TRUE),.)))
#   some_dimension first_col second_col third_col
#1           first         0          1         0
#2           first         0          1         0
#3           first         0          1         0
#4           first         1          0         1
#5           first         1          0         1
#6           first         1          0         1
#7          second         1          0         0
#8          second         1          0         0
#9          second         1          0         0
#10         second         0          1         1
#11         second         0          1         1
#12         second         0          1         1

或更短:

test_data %>% 
  group_by(some_dimension) %>%
  mutate_if(is.numeric,funs(coalesce(.,1-na.locf(.,fromLast=TRUE))))

关于r - 如何按组用第一个非缺失值填充缺失值？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/52648751/

r - 如何按组用第一个非缺失值填充缺失值？

上一篇：r - 计算光栅的质心

下一篇：django - 如何使 Django 表单字段唯一？