r - 分组后在 dplyr 中使用 t.test 汇总

library(dplyr)
library(ggplot2)
library(magrittr)

diamonds %>% 
  group_by(cut) %>% 
  summarise(price_avg = t.test(
    . %>% filter(color == "E") %$% price,
    . %>% filter(color == "I") %$% price )$p.value)

我正在尝试获得 t.test 的结果以按组申请。在此示例中，查找相同切工时颜色的价格是否存在显着差异。我得到的结果是:

Error in summarise_impl(.data, dots) : 
Evaluation error: is.atomic(x) is not TRUE.

最佳答案

library(tidyverse)
library(magrittr)

diamonds %>% 
  group_by(cut) %>% 
  summarise(price_avg = t.test(price[color=="E"], price[color=="I"])$p.value)

# # A tibble: 5 x 2
#   cut       price_avg
#   <ord>         <dbl>
# 1 Fair       3.90e- 3
# 2 Good       1.46e-12
# 3 Very Good  2.44e-39
# 4 Premium    7.27e-52
# 5 Ideal      7.63e-62

您的解决方案存在问题 是.不会得到你的数据集的子集(基于你的分组)，而是整个数据集。这样做检查:

diamonds %>% 
  group_by(cut) %>% 
  summarise(d = list(.))

# # A tibble: 5 x 2
#     cut       d                     
#     <ord>     <list>                
#   1 Fair      <tibble [53,940 x 10]>
#   2 Good      <tibble [53,940 x 10]>
#   3 Very Good <tibble [53,940 x 10]>
#   4 Premium   <tibble [53,940 x 10]>
#   5 Ideal     <tibble [53,940 x 10]>

另一种解决方案是:

diamonds %>% 
  nest(-cut) %>%
  mutate(price_avg = map_dbl(data, ~t.test(
                                      .x %>% filter(color == "E") %$% price,
                                      .x %>% filter(color == "I") %$% price )$p.value))

# # A tibble: 5 x 3
#   cut       data                  price_avg
#   <ord>     <list>                    <dbl>
# 1 Ideal     <tibble [21,551 x 9]>  7.63e-62
# 2 Premium   <tibble [13,791 x 9]>  7.27e-52
# 3 Good      <tibble [4,906 x 9]>   1.46e-12
# 4 Very Good <tibble [12,082 x 9]>  2.44e-39
# 5 Fair      <tibble [1,610 x 9]>   3.90e- 3

这适用于 filter因为你可以传递给filter每次都使用适当的数据子集(即列 data )。

关于r - 分组后在 dplyr 中使用 t.test 汇总，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/52588675/

r - 分组后在 dplyr 中使用 t.test 汇总

上一篇：oracle - 为什么这个 block 的执行永远不会结束？

下一篇：Applescript Mojave 切换辅助功能灰度开/关