我有一项队列研究的健康数据,该研究采用重复测量,每年对个体进行多次随访。在基线(访问 0),一些人已经被诊断出患有感兴趣的疾病,而其他人则没有。当我在分析中查看事件案例时,我需要从我的数据中删除那些在访问 0 时被诊断为“生病”的人。我如何在 tidyverse 中执行此操作?我在下面包含了我将要查看的那种数据结构的示例:
subject_id <- c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5)
visit <- c(0,1,2,3,0,1,2,3,0,1,2,3,0,1,2,3,0,1,2,3)
diagnosis <- c("not sick", "not sick", "not sick", "sick", "sick", "sick", "sick", "sick", "not sick", "not sick", "sick", "sick", "sick", "sick", "sick", "sick", "not sick", "not sick", "not sick", "sick")
cohort <- data.frame(subject_id, visit, diagnosis)
cohort
最佳答案
编辑:如果您想完全删除它们,则:
cohort %>%
group_by(subject_id) %>%
mutate(Condn = ifelse(visit==0 & diagnosis=="sick",1,0) ) %>%
filter(all(Condn==0))
原创
我们可以做到:
cohort %>%
group_by(subject_id) %>%
mutate(Condn = ifelse(visit==0 & diagnosis=="sick",1,0) ) %>%
filter(Condn==0) %>%
ungroup() %>%
select(-Condn)
关于根据基线特征从队列研究数据中删除个体,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56795215/