R 的带有 null 函数的 tapply

我无法理解当 FUN 参数为 null 时 tapply 函数的作用。

If FUN is NULL, tapply returns a vector which can be used to subscript the multi-way array tapply normally produces.

例如，以下文档示例的作用是什么？

ind <- list(c(1, 2, 2), c("A", "A", "B"))
tapply(1:3, ind) #-> the split vector

我不明白结果:

[1] 1 2 4

谢谢。

最佳答案

如果您使用指定函数(非 NULL)运行 tapply，比如 sum，就像帮助中一样，您将看到结果是一个二维数组在一个单元格中包含 NA:

res <- tapply(1:3, ind, sum)
res
   A  B
 1 1 NA
 2 2  3

这意味着不存在一个因素组合，即 (1, B)。当FUN为NULL时，它返回与所有存在的因子组合相对应的向量索引。要检查这一点:

> which(!is.na(res))
[1] 1 2 4

值得一提的是，指定的函数可以返回 NA 本身，如下面的玩具示例所示:

> f <- function(x){
      if(x[[1]] == 1) return(NA)
      return(sum(x))
  }
> tapply(1:3, ind, f)
   A  B
1 NA NA
2  2  3

因此，一般来说，NA 并不意味着不存在某个因素组合。

关于R 的带有 null 函数的 tapply，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/37391261/