r - 使用 dplyr/tidyverse 同时对多个变量进行多个配对 t 检验

标签 r dplyr

假设这样的数据结构:

   ID testA_wave1 testA_wave2 testA_wave3 testB_wave1 testB_wave2 testB_wave3
1   1           3           2           3           6           5           3
2   2           4           4           4           3           6           6
3   3          10           2           1           4           4           4
4   4           5           3          12           2           7           4
5   5           5           3           9           2           4           2
6   6          10           0           2           6           6           5
7   7           6           8           4           6           8           3
8   8           1           5           4           5           6           0
9   9           3           2           7           8           4           4
10 10           4           9           5          11           8           8

我想要实现的是分别为每个测试计算配对 t 检验(在这种情况下意味着 testA 和 testB,但在现实生活中我有更多的测试)。我想这样做,将给定测试的第一波与同一测试的所有其他后续波进行比较(在 testA 的情况下,意味着 testA_wave1 与 testA_wave2 和 testA_wave1 与 testA_wave3)。

通过这种方式,我能够实现它:
df %>%
 gather(variable, value, -ID) %>%
 mutate(wave_ID = paste0("wave", parse_number(variable)),
        variable = ifelse(grepl("testA", variable), "testA",
                     ifelse(grepl("testB", variable), "testB", NA_character_))) %>%
 group_by(wave_ID, variable) %>% 
 summarise(value = list(value)) %>% 
 spread(wave_ID, value) %>% 
 group_by(variable) %>% 
 mutate(p_value_w1w2 = t.test(unlist(wave1), unlist(wave2), paired = TRUE)$p.value,
        p_value_w1w3 = t.test(unlist(wave1), unlist(wave3), paired = TRUE)$p.value) %>%
 select(variable, matches("(p_value)"))

  variable p_value_w1w2 p_value_w1w3
  <chr>           <dbl>        <dbl>
1 testA           0.664        0.921
2 testB           0.146        0.418

但是,我希望看到不同/更优雅的解决方案,可以提供类似的结果。我主要在找 dplyr/tidyverse解决方案,但如果有一种完全不同的方式来实现它,我并不反对。

样本数据:
set.seed(123)
df <- data.frame(ID = 1:20,
testA_wave1 = round(rnorm(20, 5, 3), 0),
testA_wave2 = round(rnorm(20, 5, 3), 0),
testA_wave3 = round(rnorm(20, 5, 3), 0),
testB_wave1 = round(rnorm(20, 5, 3), 0),
testB_wave2 = round(rnorm(20, 5, 3), 0),
testB_wave3 = round(rnorm(20, 5, 3), 0))

最佳答案

这是一种方法,使用 purrr相当多。

library("tidyverse")

set.seed(123)
df <- tibble(
  ID = 1:20,
  testA_wave1 = round(rnorm(20, 5, 3), 0),
  testA_wave2 = round(rnorm(20, 5, 3), 0),
  testA_wave3 = round(rnorm(20, 5, 3), 0),
  testB_wave1 = round(rnorm(20, 5, 3), 0),
  testB_wave2 = round(rnorm(20, 5, 3), 0),
  testB_wave3 = round(rnorm(20, 5, 3), 0)
)

pvalues <- df %>%
  # From wide tibble to long tibble
  gather(test, value, -ID) %>%
  separate(test, c("test", "wave")) %>%
  # Not stricly necessary; will order the waves alphabetically instead
  mutate(wave = parse_number(wave)) %>%
  inner_join(., ., by = c("ID", "test")) %>%
  # If there are two waves w1 and w2,
  # we end up with pairs (w1, w1), (w1, w2), (w2, w1) and (w2, w2),
  # so filter out to keep the pairing (w1, w2) only
  filter(wave.x == 1, wave.x < wave.y) %>%
  nest(ID, value.x, value.y) %>%
  mutate(pvalue = data %>%
           # Perform the test
           map(~t.test(.$value.x, .$value.y, paired = TRUE)) %>%
           map(broom::tidy) %>%
           # Also not strictly necessary; you might want to keep all
           # information about the test: estimate, statistic, etc.
           map_dbl(pluck, "p.value"))
pvalues
#> # A tibble: 4 x 5
#>   test  wave.x wave.y data              pvalue
#>   <chr>  <dbl>  <dbl> <list>             <dbl>
#> 1 testA      1      2 <tibble [20 x 3]>  0.664
#> 2 testA      1      3 <tibble [20 x 3]>  0.921
#> 3 testB      1      2 <tibble [20 x 3]>  0.146
#> 4 testB      1      3 <tibble [20 x 3]>  0.418

pvalues %>%
  # Drop the data in order to pivot the table
  select(- data) %>%
  unite("waves", wave.x, wave.y, sep = ":") %>%
  spread(waves, pvalue)
#> # A tibble: 2 x 3
#>   test  `1:2` `1:3`
#>   <chr> <dbl> <dbl>
#> 1 testA 0.664 0.921
#> 2 testB 0.146 0.418

创建于 2019-03-08 由 reprex package (v0.2.1)

关于r - 使用 dplyr/tidyverse 同时对多个变量进行多个配对 t 检验,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55068779/

相关文章:

r - R 中与 ggridges 重叠的线

r - 使用 dplyr 拟合多个回归模型

r - 如何从尼尔森输出图表?

r - 错误 : package or namespace load failed for ‘arulesViz’ : object ‘cividis’ is not exported by 'namespace:viridisLite'

r - 基于 R 中的其他列创建列序列

r - 基于具有很多条件的多列进行汇总

r - 如何在 R 中的同一个散点图中绘制多条回归线?

python - 相当于 R/dplyr group_by 的 Pandas 总结串联

r - 如何使用 R 的 {collapse} 包来实现正确的 fgroup_by() |> ftransform() 输出?

r - 按列分组并按 R 中的另一列排序