如何按列分组,然后计算 R 中每隔一列的均值和标准差?
例如,考虑著名的 Iris 数据集。
我想做一些类似于按物种分组的事情,
然后计算花瓣/萼片长度/宽度测量值的平均值和标准差。
我知道这与 split-apply-combine 有关,
但我不知道如何从那里开始。
我能想到的是:
require(plyr)
x <- ddply(iris, .(Species), summarise,
Sepal.Length.Mean = mean(Sepal.Length),
Sepal.Length.Sd = sd(Sepal.Length),
Sepal.Width.Mean = mean(Sepal.Width),
Sepal.Width.Sd = sd(Sepal.Width),
Petal.Length.Mean = mean(Petal.Length),
Petal.Length.Sd = sd(Petal.Length),
Petal.Width.Mean = mean(Petal.Width),
Petal.Width.Sd = sd(Petal.Width))
Species Sepal.Length.Mean Sepal.Length.Sd Sepal.Width.Mean Sepal.Width.Sd
1 setosa 5.006 0.3524897 3.428 0.3790644
2 versicolor 5.936 0.5161711 2.770 0.3137983
3 virginica 6.588 0.6358796 2.974 0.3224966
Petal.Length.Mean Petal.Length.Sd Petal.Width.Mean Petal.Width.Sd
1 1.462 0.1736640 0.246 0.1053856
2 4.260 0.4699110 1.326 0.1977527
3 5.552 0.5518947 2.026 0.2746501
所需输出:
z <- data.frame(setosa = c(5.006, 0.3524897, 3.428, 0.3790644,
1.462, 0.1736640, 0.246, 0.1053856),
versicolor = c(5.936, 0.5161711, 2.770, 0.3137983,
4.260, 0.4699110, 1.326, 0.1977527),
virginica = c(6.588, 0.6358796, 2.974, 0.3225966,
5.552, 0.5518947, 2.026, 0.2746501))
rownames(z) <- c('Sepal.Length.Mean', 'Sepal.Length.Sd',
'Sepal.Width.Mean', 'Sepal.Width.Sd',
'Petal.Length.Mean', 'Petal.Length.Sd',
'Petal.Width.Mean', 'Petal.Width.Sd')
setosa versicolor virginica
Sepal.Length.Mean 5.0060000 5.9360000 6.5880000
Sepal.Length.Sd 0.3524897 0.5161711 0.6358796
Sepal.Width.Mean 3.4280000 2.7700000 2.9740000
Sepal.Width.Sd 0.3790644 0.3137983 0.3225966
Petal.Length.Mean 1.4620000 4.2600000 5.5520000
Petal.Length.Sd 0.1736640 0.4699110 0.5518947
Petal.Width.Mean 0.2460000 1.3260000 2.0260000
Petal.Width.Sd 0.1053856 0.1977527 0.2746501
最佳答案
我们可以试试 dplyr
library(dplyr)
res <- iris %>%
group_by(Species) %>%
summarise_each(funs(mean, sd))
`colnames<-`(t(res[-1]), as.character(res$Species))
# setosa versicolor virginica
#Sepal.Length_mean 5.0060000 5.9360000 6.5880000
#Sepal.Width_mean 3.4280000 2.7700000 2.9740000
#Petal.Length_mean 1.4620000 4.2600000 5.5520000
#Petal.Width_mean 0.2460000 1.3260000 2.0260000
#Sepal.Length_sd 0.3524897 0.5161711 0.6358796
#Sepal.Width_sd 0.3790644 0.3137983 0.3224966
#Petal.Length_sd 0.1736640 0.4699110 0.5518947
#Petal.Width_sd 0.1053856 0.1977527 0.2746501
或者如评论中提到的@Steven Beaupre,可以通过使用
spread
进行整形来获得输出library(tidyr)
iris %>%
group_by(Species) %>%
summarise_each(funs(mean, sd)) %>%
gather(key, value, -Species) %>%
spread(Species, value)
关于r - 按列分组,然后计算 R 中每隔一列的均值和标准差,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37457493/