r - 将函数应用于由分类变量的所有可能组合子集的数据框

具有分类变量 catA、catB 和 catC 的示例数据框。 Obs 是一些观察值。

catA <- rep(factor(c("a","b","c")), length.out=100)
catB <- rep(factor(1:4), length.out=100)
catC <- rep(factor(c("d","e","f")), length.out=100)
obs <- runif(100,0,100)
dat <- data.frame(catA, catB, catC, obs)

分类变量的所有可能的数据子集。

allsubs <- expand.grid(catA = c(NA,levels(catA)), catB = c(NA,levels(catB)),
    catC = c(NA,levels(catC)))
> head(allsubs, n=10)
   catA catB catC
 1  <NA> <NA> <NA>
 2     a <NA> <NA>
 3     b <NA> <NA>
 4     c <NA> <NA>
 5  <NA>    1 <NA>
 6     a    1 <NA>
 7     b    1 <NA>
 8     c    1 <NA>
 9  <NA>    2 <NA>
 10    a    2 <NA>

现在，创建一个输出数据框的最简单方法是什么，其结果列包含一个函数的结果，该函数应用于 dat.因此，输出应类似于以下数据框“whatiwant”，其中结果列将包含应用于每个子集的函数的结果。

> whatiwant
    catA catB catC results
 1  <NA> <NA> <NA>       *
 2     a <NA> <NA>       *
 3     b <NA> <NA>       *
 4     c <NA> <NA>       *
 5  <NA>    1 <NA>       *
 6     a    1 <NA>       *
 7     b    1 <NA>       *
 8     c    1 <NA>       *
 9  <NA>    2 <NA>       *
 10    a    2 <NA>       *

所以，如果应用的函数是“平均”，结果应该是:

dat$results[1] = mean(subset(dat,)$obs)
dat$results[2] = mean(subset(dat, catA=="a")$obs)

等等等等。

最佳答案

ans <- with(dat, tapply(obs, list(catA, catB, catC), mean))
ans <- data.frame(expand.grid(dimnames(ans)), results=c(ans))
names(ans)[1:3] <- names(dat)[1:3]

str(ans)
# 'data.frame':  36 obs. of  4 variables:
#  $ catA   : Factor w/ 3 levels "a","b","c": 1 2 3 1 2 3 1 2 3 1 ...
#  $ catB   : Factor w/ 4 levels "1","2","3","4": 1 1 1 2 2 2 3 3 3 4 ...
#  $ catC   : Factor w/ 3 levels "d","e","f": 1 1 1 1 1 1 1 1 1 1 ...
#  $ results: num  69.7 NA NA 55.3 NA ...

关于r - 将函数应用于由分类变量的所有可能组合子集的数据框，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/16824544/

r - 将函数应用于由分类变量的所有可能组合子集的数据框

上一篇：PostGIS - 函数的数据库安装失败

下一篇：google-mirror-api - Google Glass 镜像客户端 api 使用限制问题