我有一个看起来像这样的数据框,但它很大。
df = data.frame(gene=c("A","B","F","A","D","E","B","C","D","G"),
group=c("group1","group1","group1","group2","group2","group2","group3","group3","group3","group3"))
df
gene group
A group1
B group1
F group1
A group2
D group2
E group2
B group3
C group3
D group3
G group3
根据列基因,我想找到包含基因“A”的组和不包含基因 A 的组之间的独特差异。
我希望我的数据在“过滤”后看起来像这样
gene group
F group1
E group2
因为 F 是包含基因 A 的组中唯一存在的基因,并且它不存在于任何其他组中。
最佳答案
我们可以过滤
“gene”包含“A”但不包含“A”的行,然后执行anti_join
library(dplyr)
tmp1 <- df %>%
filter(group %in% group[gene %in% 'A'])
tmp2 <- df %>%
group_by(group) %>%
filter(!'A' %in% gene) %>%
ungroup
anti_join(tmp1, tmp2, by = 'gene') %>%
filter(gene != 'A')
-输出
gene group
1 F group1
2 E group2
关于r - 根据 dplyr 中的条件查找组之间的差异,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69428714/