用向量逐行替换缺失值

标签 r missing-data imputation

我正在研究数据集中的缺失值。我有一个预测模型,可以生成估算缺失值的值。当我使用 dfm[is.na(dfm)]<-impute 进行插补时,它会按列插补值。但我需要按行进行插补,所以我转置了数据矩阵。我的问题是,有没有一种优雅的方法可以在不转置矩阵的情况下做到这一点?这是一个带有可重现示例的 rcode。

      set.seed(1)
      r=5 
      c=4
      df<-matrix(runif(r*c), ncol=c) 
      df
           [,1]       [,2]      [,3]      [,4]
 [1,] 0.2655087 0.89838968 0.2059746 0.4976992
 [2,] 0.3721239 0.94467527 0.1765568 0.7176185
 [3,] 0.5728534 0.66079779 0.6870228 0.9919061
 [4,] 0.9082078 0.62911404 0.3841037 0.3800352
 [5,] 0.2016819 0.06178627 0.7698414 0.7774452

  d=dim(df)
  p=0.30 

  #### generate missing data matrix by replacing some values by NAs
  dfm<-df
  dfm[matrix(rbinom(prod(d), size=1,prob=p)==1,nrow=d[1])]<-NA
  dfm
           [,1]       [,2]      [,3]      [,4]
 [1,]        NA 0.89838968 0.2059746 0.4976992
 [2,] 0.3721239 0.94467527 0.1765568        NA
 [3,] 0.5728534 0.66079779 0.6870228 0.9919061
 [4,] 0.9082078         NA 0.3841037        NA
 [5,] 0.2016819 0.06178627        NA 0.7774452

  # generate values to impute the missing
 impute<-rgamma(sum(is.na(dfm)),shape=1,scale=0.5)
  impute
 [1] 0.6804725 0.6029941 0.2770577 0.6035013 0.7812393

 #imputes columnwise
  dfm[is.na(dfm)]<-impute
   dfm
          [,1]       [,2]      [,3]      [,4]
 [1,] 0.6804725 0.89838968 0.2059746 0.4976992
 [2,] 0.3721239 0.94467527 0.1765568 0.6035013
 [3,] 0.5728534 0.66079779 0.6870228 0.9919061
 [4,] 0.9082078 0.60299409 0.3841037 0.7812393
 [5,] 0.2016819 0.06178627 0.2770577 0.7774452

 #impute rowwise
       tdfm<-t(dfm)
  tdfm[is.na(tdfm)]<-impute
  tdfm
           [,1]      [,2]      [,3]      [,4]       [,5]
 [1,] 0.6804725 0.3721239 0.5728534 0.9082078 0.20168193
 [2,] 0.8983897 0.9446753 0.6607978 0.2770577 0.06178627
 [3,] 0.2059746 0.1765568 0.6870228 0.3841037 0.78123933
 [4,] 0.4976992 0.6029941 0.9919061 0.6035013 0.77744522

       dfm.fill<-t(tdfm)
       dfm.fill

           [,1]       [,2]      [,3]      [,4]
   [1,] 0.6804725 0.89838968 0.2059746 0.4976992
   [2,] 0.3721239 0.94467527 0.1765568 0.6029941
   [3,] 0.5728534 0.66079779 0.6870228 0.9919061
   [4,] 0.9082078 0.27705769 0.3841037 0.6035013
   [5,] 0.2016819 0.06178627 0.7812393 0.7774452

最佳答案

使用 which 代替,并结合 arr.ind 以便您可以先按行排序。

示例:

test1 <- matrix(1:12, 3, 4, byrow = TRUE)
test1[c(1, 3, 8, 6, 10)] <- NA
test2 <- test1

impute <- c(-1, -4, -7, -9, -10)

## What you are currently doing--column-wise
test1[is.na(test1)] <- impute
test1
#      [,1] [,2] [,3] [,4]
# [1,]   -1    2    3  -10
# [2,]    5    6   -9    8
# [3,]   -4   -7   11   12

## What it sounds like you want--row-wise
nas <- which(is.na(test2), arr.ind = TRUE)
test2[nas[order(nas[, "row"]), ]] <- impute
test2
#      [,1] [,2] [,3] [,4]
# [1,]   -1    2    3   -4
# [2,]    5    6   -7    8
# [3,]   -9  -10   11   12

关于用向量逐行替换缺失值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42732799/

相关文章:

r - 创建十六进制图

ios - XLIFF 中的 xib 缺少字符串(Xcode 基本本地化)

r - 按组用均值插补缺失数据

python - 缺失值的插值

python - 如何使用模式/均值来估算 pandas 数据框中的整个缺失值?

python - 选择插补方法

r - R中缺失和审查数据的多重插补

r - 使用 data.table 在每组数据前插入一行

R编程,使用变量命名输出文件

R pROC 精度