我想使用 pastecs
包中的信息丰富的 stat.desc
函数来按组描述我的数据框中的许多列。让我们将 iris
数据集作为 MWE。
所以我对每一列都这样做:
by(iris$Sepal.Length,list(iris$Species),pastecs::stat.desc,norm = TRUE)
by(iris$Sepal.Width,list(iris$Species),pastecs::stat.desc,norm = TRUE)
by(iris$Petal.Length,list(iris$Species),pastecs::stat.desc,norm = TRUE)
by(iris$Petal.Width,list(iris$Species),pastecs::stat.desc,norm = TRUE)
但是当您有很多列时,这绝对是乏味的,因此您通常希望对此进行矢量化。经过多次试验,我发现了一种使用 apply
和 by()
函数的方法,如下所示:
apply (iris[,1:4],2,function (x) by (x,list (iris$Species),pastecs::stat.desc,norm=TRUE))
list
参数是根据哪个组来判断的,norm=TRUE
是属于stat.desc的参数,用来描述数据的正态性。
结果
$Sepal.Length
: setosa
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 4.30000 5.80000 1.50000 250.30000 5.00000 5.00600 0.04985
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.10018 0.12425 0.35249 0.07041 0.11298 0.16782 -0.45087 -0.34059 0.97770 0.45951
------------------------------------------------------------------------------------------------------
: versicolor
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 4.90000 7.00000 2.10000 296.80000 5.90000 5.93600 0.07300
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.14669 0.26643 0.51617 0.08696 0.09914 0.14727 -0.69391 -0.52418 0.97784 0.46474
------------------------------------------------------------------------------------------------------
: virginica
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 4.90000 7.90000 3.00000 329.40000 6.50000 6.58800 0.08993
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.18071 0.40434 0.63588 0.09652 0.11103 0.16493 -0.20326 -0.15354 0.97118 0.25831
$Sepal.Width
: setosa
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 2.30000 4.40000 2.10000 171.40000 3.40000 3.42800 0.05361
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.10773 0.14369 0.37906 0.11058 0.03873 0.05753 0.59595 0.45018 0.97172 0.27153
------------------------------------------------------------------------------------------------------
: versicolor
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 2.00000 3.40000 1.40000 138.50000 2.80000 2.77000 0.04438
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.08918 0.09847 0.31380 0.11328 -0.34136 -0.50708 -0.54932 -0.41495 0.97413 0.33800
------------------------------------------------------------------------------------------------------
: virginica
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 2.20000 3.80000 1.60000 148.70000 3.00000 2.97400 0.04561
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.09165 0.10400 0.32250 0.10844 0.34428 0.51141 0.38038 0.28734 0.96739 0.18090
$Petal.Length
: setosa
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 1.00000 1.90000 0.90000 73.10000 1.50000 1.46200 0.02456
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.04935 0.03016 0.17366 0.11879 0.10010 0.14869 0.65393 0.49397 0.95498 0.05481
------------------------------------------------------------------------------------------------------
: versicolor
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 3.00000 5.10000 2.10000 213.00000 4.35000 4.26000 0.06646
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.13355 0.22082 0.46991 0.11031 -0.57060 -0.84760 -0.19026 -0.14372 0.96600 0.15848
------------------------------------------------------------------------------------------------------
: virginica
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 4.50000 6.90000 2.40000 277.60000 5.55000 5.55200 0.07805
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.15685 0.30459 0.55189 0.09940 0.51692 0.76785 -0.36512 -0.27581 0.96219 0.10978
$Petal.Width
: setosa
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
5.000e+01 0.000e+00 0.000e+00 1.000e-01 6.000e-01 5.000e-01 1.230e+01 2.000e-01 2.460e-01 1.490e-02
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
2.995e-02 1.111e-02 1.054e-01 4.284e-01 1.180e+00 1.752e+00 1.259e+00 9.508e-01 7.998e-01 8.659e-07
------------------------------------------------------------------------------------------------------
: versicolor
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 1.00000 1.80000 0.80000 66.30000 1.30000 1.32600 0.02797
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.05620 0.03911 0.19775 0.14913 -0.02933 -0.04357 -0.58731 -0.44365 0.94763 0.02728
------------------------------------------------------------------------------------------------------
: virginica
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 1.40000 2.50000 1.10000 101.30000 2.00000 2.02600 0.03884
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.07805 0.07543 0.27465 0.13556 -0.12181 -0.18094 -0.75396 -0.56953 0.95977 0.08695
R> apply (iris[,1:4],2,function (x,y=iris$Species) by (x,list (y),pastecs::stat.desc,norm=TRUE))
$Sepal.Length
: setosa
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 4.30000 5.80000 1.50000 250.30000 5.00000 5.00600 0.04985
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.10018 0.12425 0.35249 0.07041 0.11298 0.16782 -0.45087 -0.34059 0.97770 0.45951
------------------------------------------------------------------------------------------------------
: versicolor
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 4.90000 7.00000 2.10000 296.80000 5.90000 5.93600 0.07300
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.14669 0.26643 0.51617 0.08696 0.09914 0.14727 -0.69391 -0.52418 0.97784 0.46474
------------------------------------------------------------------------------------------------------
: virginica
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 4.90000 7.90000 3.00000 329.40000 6.50000 6.58800 0.08993
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.18071 0.40434 0.63588 0.09652 0.11103 0.16493 -0.20326 -0.15354 0.97118 0.25831
$Sepal.Width
: setosa
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 2.30000 4.40000 2.10000 171.40000 3.40000 3.42800 0.05361
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.10773 0.14369 0.37906 0.11058 0.03873 0.05753 0.59595 0.45018 0.97172 0.27153
------------------------------------------------------------------------------------------------------
: versicolor
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 2.00000 3.40000 1.40000 138.50000 2.80000 2.77000 0.04438
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.08918 0.09847 0.31380 0.11328 -0.34136 -0.50708 -0.54932 -0.41495 0.97413 0.33800
------------------------------------------------------------------------------------------------------
: virginica
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 2.20000 3.80000 1.60000 148.70000 3.00000 2.97400 0.04561
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.09165 0.10400 0.32250 0.10844 0.34428 0.51141 0.38038 0.28734 0.96739 0.18090
$Petal.Length
: setosa
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 1.00000 1.90000 0.90000 73.10000 1.50000 1.46200 0.02456
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.04935 0.03016 0.17366 0.11879 0.10010 0.14869 0.65393 0.49397 0.95498 0.05481
------------------------------------------------------------------------------------------------------
: versicolor
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 3.00000 5.10000 2.10000 213.00000 4.35000 4.26000 0.06646
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.13355 0.22082 0.46991 0.11031 -0.57060 -0.84760 -0.19026 -0.14372 0.96600 0.15848
------------------------------------------------------------------------------------------------------
: virginica
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 4.50000 6.90000 2.40000 277.60000 5.55000 5.55200 0.07805
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.15685 0.30459 0.55189 0.09940 0.51692 0.76785 -0.36512 -0.27581 0.96219 0.10978
$Petal.Width
: setosa
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
5.000e+01 0.000e+00 0.000e+00 1.000e-01 6.000e-01 5.000e-01 1.230e+01 2.000e-01 2.460e-01 1.490e-02
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
2.995e-02 1.111e-02 1.054e-01 4.284e-01 1.180e+00 1.752e+00 1.259e+00 9.508e-01 7.998e-01 8.659e-07
------------------------------------------------------------------------------------------------------
: versicolor
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 1.00000 1.80000 0.80000 66.30000 1.30000 1.32600 0.02797
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.05620 0.03911 0.19775 0.14913 -0.02933 -0.04357 -0.58731 -0.44365 0.94763 0.02728
------------------------------------------------------------------------------------------------------
: virginica
nbr.val nbr.null nbr.na min max range sum median mean SE.mean
50.00000 0.00000 0.00000 1.40000 2.50000 1.10000 101.30000 2.00000 2.02600 0.03884
CI.mean.0.95 var std.dev coef.var skewness skew.2SE kurtosis kurt.2SE normtest.W normtest.p
0.07805 0.07543 0.27465 0.13556 -0.12181 -0.18094 -0.75396 -0.56953 0.95977 0.08695
问题
如何使用 dplyr
包重现这些结果?
我失败的尝试是:
iris %>%
group_by (Species) %>%
summarise_each(funs(pastecs::stat.desc,norm=TRUE))
最佳答案
这是一个使用 dplyr
library(pastecs)
library(dplyr)
res <- iris %>%
group_by(Species) %>%
do(data.frame(lapply(.[setdiff(names(.), 'Species')],
stat.desc, norm = TRUE))) %>%
mutate(measure = names(stat.desc(Sepal.Length, norm = TRUE)))
编辑:添加了对应于 stat.desc
的 names
(基于@Jaap 的建议)
关于r - 如何在 R 中使用 dplyr 包重现这个 "apply"示例?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35817219/