r - Purrr(或扫帚)用于计算分组数据集的比例测试(多比例测试)

标签 r loops dplyr tidyverse purrr

假设我有一个由“年份”和“认知障碍”组成的数据框(1=是,0=否则)

dataset

我想比较每年的比例。因此,2000 年将是:

 df %>% 
  filter(year == 2000) %>% 
  {prop.test(rev(table(.$cogimp)),p = 0.5, conf.level=0.95)}

我可以通过以下方式检查:
prop.test(x = 3, n = 30, p = 0.5, conf.level=0.95)

但是,在我看来,我可以通过使用 broom 或 purrr 来简化这些分析。
我的目标是拥有一张这样的 table :

final table

代码如下:
df <- structure(list(year = c(2000, 2000, 2015, 2015, 2000, 2015, 2000, 
                              2000, 2000, 2000, 2015, 2006, 2015, 2015, 2010, 2006, 2006, 2010, 
                              2000, 2006, 2015, 2006, 2015, 2015, 2000, 2015, 2000, 2015, 2015, 
                              2010, 2015, 2015, 2015, 2000, 2006, 2006, 2006, 2015, 2015, 2006, 
                              2015, 2010, 2000, 2000, 2010, 2006, 2010, 2010, 2015, 2000, 2015, 
                              2006, 2000, 2006, 2015, 2006, 2000, 2010, 2010, 2010, 2015, 2006, 
                              2015, 2000, 2015, 2010, 2010, 2010, 2010, 2000, 2000, 2000, 2006, 
                              2015, 2015, 2000, 2000, 2000, 2015, 2006, 2006, 2010, 2006, 2000, 
                              2010, 2000, 2015, 2015, 2015, 2015, 2010, 2000, 2000, 2010, 2006, 
                              2010, 2010, 2000, 2000, 2000), cogimp = c(0, 0, 0, 0, 0, 0, 0, 
                                                                        0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 
                                                                        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 
                                                                        0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 
                                                                        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 
                                                                        1, 1, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, -100L), class = c("tbl_df", 
                                                                                                                                         "tbl", "data.frame"))

df %>% 
  count(year, cogimp)

df %>% 
  filter(year == 2006) %>% 
  {prop.test(rev(table(.$cogimp)),p = 0.5, conf.level=0.95)}

prop.test(x = 3, n = 30, p = 0.5, conf.level=0.95)
prop.test(x = 2, n = 19, p = 0.5, conf.level=0.95)

最佳答案

使用 tidy从扫帚包。改编自 https://stackoverflow.com/a/30015869/13157536

library(dplyr)
library(broom)

df <- structure(list(year = c(2000, 2000, 2015, 2015, 2000, 2015, 2000, 
                              2000, 2000, 2000, 2015, 2006, 2015, 2015, 2010, 2006, 2006, 2010, 
                              2000, 2006, 2015, 2006, 2015, 2015, 2000, 2015, 2000, 2015, 2015, 
                              2010, 2015, 2015, 2015, 2000, 2006, 2006, 2006, 2015, 2015, 2006, 
                              2015, 2010, 2000, 2000, 2010, 2006, 2010, 2010, 2015, 2000, 2015, 
                              2006, 2000, 2006, 2015, 2006, 2000, 2010, 2010, 2010, 2015, 2006, 
                              2015, 2000, 2015, 2010, 2010, 2010, 2010, 2000, 2000, 2000, 2006, 
                              2015, 2015, 2000, 2000, 2000, 2015, 2006, 2006, 2010, 2006, 2000, 
                              2010, 2000, 2015, 2015, 2015, 2015, 2010, 2000, 2000, 2010, 2006, 
                              2010, 2010, 2000, 2000, 2000), cogimp = c(0, 0, 0, 0, 0, 0, 0, 
                                                                        0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 
                                                                        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 
                                                                        0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 
                                                                        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 
                                                                        1, 1, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, -100L), class = c("tbl_df", 
                                                                                                                                         "tbl", "data.frame"))

df_test <- df %>% 
  group_by(year) %>%
  summarize(cogimp = sum(cogimp), n = n()) %>%
  group_by(year, cogimp, n) %>%
  do(fitYear = prop.test(.$cogimp, .$n, p = 0.5, conf.level = 0.95))

tidy(df_test, fitYear) %>%
  select(year, cogimp, n, p.value)
#> # A tibble: 4 x 4
#> # Groups:   year, cogimp, n [4]
#>    year cogimp     n   p.value
#>   <dbl>  <dbl> <int>     <dbl>
#> 1  2000      3    30 0.0000268
#> 2  2006      2    19 0.00132  
#> 3  2010      8    20 0.502    
#> 4  2015      3    31 0.0000163

创建于 2020-04-06 由 reprex package (v0.3.0)

关于r - Purrr(或扫帚)用于计算分组数据集的比例测试(多比例测试),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61063396/

相关文章:

r - 如何在R环境中将变量的值用作键?

c# - 如何使用许多其他 if 条件循环 if 语句

重复将条件摘要应用于数据框中的组

r - R 中宽限月的 Emi 计算

r - 使用 ggplot2 绘制误差线时抑制一些交叉线

r - 每组两个日期之间的平均时间差

loops - Install4j 的默认后退按钮行为与循环不兼容。解决方法是什么?

r - 在 data.table 中执行行比较以形成有序组

r - 带有 stat_binhex 的 map 上的六边形热图给出错误 : geom_hex() only works with Cartesian coordinates

javascript - 如何构建带有循环的字符串?