r - 根据多个条件筛选和提取行

我有一个由不同诊断的患者组成的大型时间序列数据集。数据集的快照如下:

time<-rep(1:3, times = 5)
ID<-c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5)
Dx1<-c("CBS", "CBS", "CBS", "OtherDx", "OtherDx", "OtherDx", "ACC", "ACC", "ACC", "OtherDx", "OtherDx", "CBS", "OtherDx", "OtherDx", "OtherDx")
Dx2<-c("OtherDx", "OtherDx", "OtherDx", "OtherDx", "OtherDx", "OtherDx", "CBS", "CBS", "CBS", "OtherDx","OtherDx", "OtherDx", "OtherDx","OtherDx", "OtherDx")
df<-tibble(time, ID, Dx1, Dx2)

 # A tibble: 15 x 4
      ID  time Dx1     Dx2    
   <dbl> <int> <chr>   <chr>  
 1     1     1 CBS     OtherDx
 2     1     2 CBS     OtherDx
 3     1     3 CBS     OtherDx
 4     2     1 OtherDx OtherDx
 5     2     2 OtherDx OtherDx
 6     2     3 OtherDx OtherDx
 7     3     1 ACC     CBS    
 8     3     2 ACC     CBS    
 9     3     3 ACC     CBS    
10     4     1 OtherDx OtherDx
11     4     2 OtherDx OtherDx
12     4     3 CBS     OtherDx
13     5     1 OtherDx OtherDx
14     5     2 OtherDx OtherDx
15     5     3 OtherDx OtherDx

在这里，对于所有三个时间观察，我想过滤并仅保留在 Dx1 和 Dx2 中都具有“OtherDx”的那些 ID。在此快照中，这意味着仅保留 ID 2 和 5(不保留 ID 4，因为在时间 3 处有一个非“OtherDx”值)。

我的 dplyr 代码是:

df2 <- df %>%
  group_by(ID, time) %>%
  filter(
    time== c(1:3) & Dx1== "OtherDx" & Dx2== "OtherDx"
  )

但我的代码似乎无法完成这项工作，而且还包含 ID 4。提取这些数据的最佳方法是什么？

最佳答案

您可以使用 if_all()。此条件 if_all(Dx1:Dx2, `==`, "OtherDx") 等效于 Dx1 == "OtherDx"& Dx2 == "OtherDx"，并且是如果要识别的Dx越多，越简洁。

library(dplyr)

df %>%
  group_by(ID) %>% 
  filter(all(if_all(Dx1:Dx2, `==`, "OtherDx"))) %>%
  ungroup()

# A tibble: 6 × 4
     ID  time Dx1     Dx2
  <dbl> <int> <chr>   <chr>
1     2     1 OtherDx OtherDx
2     2     2 OtherDx OtherDx
3     2     3 OtherDx OtherDx
4     5     1 OtherDx OtherDx
5     5     2 OtherDx OtherDx
6     5     3 OtherDx OtherDx

关于r - 根据多个条件筛选和提取行，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/73400912/

r - 根据多个条件筛选和提取行

上一篇：获取S3路径的linux命令

下一篇：go - 在 Go 中从 GitHub App PEM 私钥生成 JWT