按索引替换行

在下面的例子中:

library(data.table)
df1 <- data.table("1A"=c(0,0,0,0),"1B"=c(4:3),"2A"=c(0,0,0,0), "2B"=c(4:3))
df2 <- data.table("1A"=c(0,0),"1B"=c(1:2),"2A"=c(0,0), "2B"=c(1:2))

df1
#    1A 1B 2A 2B
# 1:  0  4  0  4
# 2:  0  3  0  3
# 3:  0  4  0  4
# 4:  0  3  0  3

df2
#    1A 1B 2A 2B
# 1:  0  1  0  1
# 2:  0  2  0  2

indx = c(1,3)
indx
# [1] 1 3

df1[indx,] <- df2
df1
#    1A 1B 2A 2B
# 1:  0  1  0  1
# 2:  0  3  0  3
# 3:  0  2  0  2
# 4:  0  3  0  3

我成功地将 df1 中的第 1 行和第 3 行替换为 df2。在我的真实数据中复制相同的练习，我遇到了错误:

Can't assign to the same column twice in the same query (duplicates detected).

在这个表达式中:

Z4[positionpdis,] <- ZpdisRow2

对象具有以下属性:

is.data.table(ZpdisRow2)
# [1] TRUE
is.data.table(Z4)
# [1] TRUE
dim(Z4)
# [1] 7968 7968
dim(Z4[positionpdis,])
# [1]   48 7968
dim(ZpdisRow2)
# [1]   48 7968
str(positionpdis)
# int [1:48] 91 257 423 589 755 921 1087 1253 1419 1585 ...
> length(unique(positionpdis))
# [1] 48

错误的来源是什么？

最佳答案

我猜我们可能在原始数据集中有一些重复的列名。例如，如果我们将第三列名称更改为与第一列名称相同，则会出现错误。

colnames(df1)[3] <- '1A'
df1[indx,] <- df2

Error in [<-.data.table(*tmp*, indx, , value = list(1A = c(0, 0), : Can't assign to the same column twice in the same query (duplicates detected).

我们可以使用 make.unique 使该列名称唯一对于此类情况，这是一个方便的功能，而无需查找每个列名称是否重复。

 colnames(df1) <- make.unique(colnames(df1)) 
 df1[indx,] <- df2
 df1
 #  1A 1B 1A.1 2B
 #1:  0  1    0  1
 #2:  0  3    0  3
 #3:  0  2    0  2
 #4:  0  3    0  3

另一个应该也适用于重复列名的选项是 set .作为 [.data.table 中的开销非常有效被避免。在这里，我们遍历列索引 ( seq_along(df1) )，并基于行 ( i ) 和列 ( j ) 索引，我们 set 'df1' 中的值与 'df2' 的值。

 for(j in seq_along(df1)){
           set(df1, i= as.integer(indx), j=j, df2[[j]])
  }
 df1
#   1A 1B 1A 2B
#1:  0  1  0  1
#2:  0  3  0  3
#3:  0  2  0  2
#4:  0  3  0  3

关于按索引替换行，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/31984836/

上一篇：react-native - React Native - 如何像 iOS 或 instagram 那样做模糊 View ？

下一篇：tdd - 以敏捷方式实现用户故事