r - 如何使用 ddply 以编程方式汇总多列?

标签 r parsing eval plyr

是否可以从函数的参数中指定使用 ddply 聚合哪些列,而不使用 eval + parse?这是我到目前为止所拥有的:

x <- c(2,4,3,1,5,7)
y <- c(3,2,6,3,4,6)
group1 <- c("A","A","A","A","B","B")
group2 <- c("X","X","Y","Y","Z","X")

data <- data.frame(group1, group2, x, y)

这就是我想要的输出:

aggFunction <- function(dataframe, toAverage, toGroup) {
  out <- ddply(dataframe, toGroup, summarise, 
               x = mean(x),
               y = mean(y))
  return(out)
}

aggFunction(data, c("x", "y"), c("group1", "group2"))

# group1 group2 x   y
# 1      A      X 3 2.5
# 2      A      Y 2 4.5
# 3      B      X 7 6.0
# 4      B      Z 5 4.0

这是我使用 parse(eval) 的解决方案:

aggFunction <- function(dataframe, toAverage, toGroup) {

  toAverageArgs <- paste(toAverage, " = mean(", toAverage, ")", sep = "", collapse = ", ")
  out <- eval(parse(text = paste("ddply(dataframe, toGroup, here(summarize),", toAverageArgs, ")")))

  return(out)
}

这给了我我想要的输出。

我想知道是否有更好的方法来做到这一点。我知道使用 do.call() 和 get(),但我对这些的尝试都没有成功。

这是一次尝试; get(string) 不起作用,但这里(总结)让我获取字符串值。不幸的是,这意味着 ddply 将它们视为字符串:

aggFunction <- function(dataframe, toAverage, toGroup) {

  string <- paste(toAverage, " = mean(", toAverage, ")", sep = "", collapse = ", ")
  out <- ddply(dataframe, toGroup, here(summarise), string)

  return(out)
}

aggFunction(data, c("x", "y"), c("group1", "group2"))

# group1 group2                      ..2
# 1      A      X x = mean(x), y = mean(y)
# 2      A      Y x = mean(x), y = mean(y)
# 3      B      X x = mean(x), y = mean(y)
# 4      B      Z x = mean(x), y = mean(y)

也尝试过 do.call,但它们仍然被视为字符串:

aggFunction <- function(dataframe, toAverage, toGroup) {

  string <- paste(toAverage, " = mean(", toAverage, ")", sep = "", collapse = ", ")
  print(string)

  args <- list(dataframe, toGroup, here(summarise), string)
  out <- do.call(ddply, args)

  return(out)
}
aggFunction(data, c("x", "y"), c("group1", "group2"))

# group1 group2 "x = mean(x), y = mean(y)"
# 1      A      X   x = mean(x), y = mean(y)
# 2      A      Y   x = mean(x), y = mean(y)
# 3      B      X   x = mean(x), y = mean(y)
# 4      B      Z   x = mean(x), y = mean(y)

最后我尝试在mean()中进行硬编码,但后来我无法设置列名称。如果我使用 get(testVar) = Mean(get(testVar)) 我会得到意外的=。

aggFunction <- function(dataframe, toAverage, toGroup) {

  testVar <- "x"

  out <- ddply(dataframe, toGroup, here(summarise), 
           get(testVar) = mean(get(testVar))
           ## 
  return(out)
}

最佳答案

在基础 R 中使用聚合

aggFunction <- function(dataframe, toAverage, toGroup) {
  aggregate(dataframe[, toAverage], dataframe[, toGroup], mean)
}

aggFunction(data, c("x", "y"), c("group1", "group2"))

   group1 group2 x   y
1      A      X 3 2.5
2      B      X 7 6.0
3      A      Y 2 4.5
4      B      Z 5 4.0

关于r - 如何使用 ddply 以编程方式汇总多列?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32146832/

相关文章:

r - 检查列表中的所有多个值是否存在于数据框中

r - 在 Shiny App 中训练模型时显示加载标志

javascript - NullLiteral 如何以树形式表示?

javascript - 评估的替代方案

R CMD 构建无法忽略 .Rbuildignore 中引用的小插图

r - 如何显示R中删除的观测值的数量?

ruby - 解释这个原始文本 - 一种策略?

perl - 解析具有重复键的缩进文本

javascript - 在 JSON 上调用 eval 进行函数存储的替代方案?

python - 为什么 Python eval(input ("Enter input: ")) 改变输入数据类型?