我有一个使用的大型数据集dplyr()
summarize
产生一些手段。
有时,我想对该输出执行算术运算。
例如,我想从下面的输出中获取均值的平均值,例如“m.biomass”。
我已经尝试过这个mean(data.sum[,7])
还有这个mean(as.list(data.sum[,7]))
。有没有一种快速简便的方法来实现这一目标?
data.sum <-structure(list(scenario = c("future", "future", "future", "future"
), state = c("fl", "ga", "ok", "va"), m.soc = c(4090.31654013689,
3654.45350562628, 2564.33199749487, 4193.83388887064), m.npp = c(1032.244475,
821.319385, 753.401315, 636.885535), sd.soc = c(56.0344229400332,
97.8553643582118, 68.2248389927858, 79.0739969429246), sd.npp = c(34.9421782033153,
27.6443555578531, 26.0728757486901, 24.0375040705595), m.biomass = c(5322.76631158111,
3936.79457763176, 3591.0902359206, 2888.25308402464), sd.m.biomass = c(3026.59250918009,
2799.40317348016, 2515.10516340438, 2273.45510178843), max.biomass = c(9592.9303,
8105.109, 7272.4896, 6439.2259), time = c("1980-1999", "1980-1999",
"1980-1999", "1980-1999")), .Names = c("scenario", "state", "m.soc",
"m.npp", "sd.soc", "sd.npp", "m.biomass", "sd.m.biomass", "max.biomass",
"time"), class = c("grouped_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -4), vars = list(quote(scenario)), labels = structure(list(
scenario = "future"), class = "data.frame", row.names = c(NA,
-1), vars = list(quote(scenario)), drop = TRUE, .Names = "scenario"), indices = list(0:3))
最佳答案
我们可以使用[[
将列提取为向量
; as mean
仅适用于向量
或矩阵
- 不适用于data.frame
。如果OP想在单个列上执行此操作,请使用:
mean(data.sum[[7]])
#[1] 3934.726
如果只有 data.frame
类,data.sum[,7]
会将其提取为向量
,但 tbl_df
阻止其将其折叠为 vector
对于多列,dplyr
还具有专门的功能
data.sum %>%
summarise_each(funs(mean), 3:7)
关于r - R 中 dplyr 汇总数据帧的算术,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41833522/