library(dplyr)
library(ggplot2)
library(magrittr)
diamonds %>%
group_by(cut) %>%
summarise(price_avg = t.test(
. %>% filter(color == "E") %$% price,
. %>% filter(color == "I") %$% price )$p.value)
我正在尝试获得 t.test 的结果以按组申请。在此示例中,查找相同切工时颜色的价格是否存在显着差异。我得到的结果是:
Error in summarise_impl(.data, dots) :
Evaluation error: is.atomic(x) is not TRUE.
最佳答案
library(tidyverse)
library(magrittr)
diamonds %>%
group_by(cut) %>%
summarise(price_avg = t.test(price[color=="E"], price[color=="I"])$p.value)
# # A tibble: 5 x 2
# cut price_avg
# <ord> <dbl>
# 1 Fair 3.90e- 3
# 2 Good 1.46e-12
# 3 Very Good 2.44e-39
# 4 Premium 7.27e-52
# 5 Ideal 7.63e-62
您的解决方案存在问题 是
.
不会得到你的数据集的子集(基于你的分组),而是整个数据集。这样做检查:diamonds %>%
group_by(cut) %>%
summarise(d = list(.))
# # A tibble: 5 x 2
# cut d
# <ord> <list>
# 1 Fair <tibble [53,940 x 10]>
# 2 Good <tibble [53,940 x 10]>
# 3 Very Good <tibble [53,940 x 10]>
# 4 Premium <tibble [53,940 x 10]>
# 5 Ideal <tibble [53,940 x 10]>
另一种解决方案是:
diamonds %>%
nest(-cut) %>%
mutate(price_avg = map_dbl(data, ~t.test(
.x %>% filter(color == "E") %$% price,
.x %>% filter(color == "I") %$% price )$p.value))
# # A tibble: 5 x 3
# cut data price_avg
# <ord> <list> <dbl>
# 1 Ideal <tibble [21,551 x 9]> 7.63e-62
# 2 Premium <tibble [13,791 x 9]> 7.27e-52
# 3 Good <tibble [4,906 x 9]> 1.46e-12
# 4 Very Good <tibble [12,082 x 9]> 2.44e-39
# 5 Fair <tibble [1,610 x 9]> 3.90e- 3
这适用于
filter
因为你可以传递给filter
每次都使用适当的数据子集(即列 data
)。
关于r - 分组后在 dplyr 中使用 t.test 汇总,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52588675/