r - uniqueN 在 j 中的条件下返回错误结果

给定这样的数据集:

 test =data.table(
  id = c("a", "b", "c", "d", "e", "e", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t"),
  int=c(NA, NA, 0, 0, 1, 2, 3, 1, 2, 2, 3, 4, NA, 5, NA, 6, 7, NA, 8, NA, 8, 10))

我想计算 int 具有特定值的唯一 ID 的数量:

test[, .(three=uniqueN(id[int==3]), zero=uniqueN(id[int==0]), missing= uniqueN(id[is.na(int)]))]

结果

   three zero missing
1:     3    3       6

显然是错误的:只有 2 个 id 的 int 为 0 或 3。正确的结果应该如下所示:

   three zero missing
1:     2    2       6

这种方法有什么问题吗？非常感谢。

最佳答案

int中有NA元素，需要注意，即==与NA返回NA。使用 %in% 或使用 & 和 !is.na 创建第二个条件，即该值不是 NA 以便 NA 元素返回 FALSE 而不是 NA

test[, .(three = uniqueN(id[int == 3 & !is.na(int)]), 
         zero=uniqueN(id[int %in% 0]))]
#    three zero
#1:     2    2

或者另一种选择是在 uniqueN 中使用 na.rm，默认情况下为 FALSE，因此，它会计算 >NA 作为另一个唯一值

test[, .(three=uniqueN(id[int==3], na.rm = TRUE), 
       zero=uniqueN(id[int==0], na.rm = TRUE),
       missing= uniqueN(id[is.na(int)]))]
#   three zero missing
#1:     2    2       6

或者另一种方法是首先使用 na.omit 或 complete.cases 处理 NA，然后使用 OP 的代码

na.omit(test)[, .(three = uniqueN(id[int == 3]),
      zero = uniqueN(id[int == 0]))]
#    three zero
#1:     2    2

通过执行 == 而不考虑 NA，它会返回 NA 而不是 FALSE 并且这个子集化时也会返回NA

c(NA, 3) == 3
#[1]   NA TRUE

c(5, 4)[c(NA, 3) == 3]
#[1] NA  4

而

c(NA, 3) == 3 & !is.na(c(NA, 3))
#[1] FALSE  TRUE

或者使用%in%

c(NA, 3) %in% 3
#[1] FALSE  TRUE

关于r - uniqueN 在 j 中的条件下返回错误结果，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/66040616/

r - uniqueN 在 j 中的条件下返回错误结果

上一篇：swift - AppDelegates 函数 "supportedInterfaceOrientationsFor"不会在 iPad 上被调用

下一篇：javascript - 当赠品结束时如何向赠品主持人发送私信