根据另一列中的条件匹配删除重复行

标签 r dplyr

在下面的数据中,我试图删除 mid 列中的重复行。我想保留 mid 重复但 kpi 匹配 B 的行。这应该跨越组 county

我只是在这里显示重复项,但 dput 数据不仅仅是重复项

# A tibble: 34 x 3
   county mid kpi  
   <chr>  <chr>      <chr>
 1 Athens 1          A    
 2 Athens 1          B    
 3 Athens 2.13       A    
 4 Athens 2.13       B    
 5 Athens 2.3        A    
 6 Athens 2.3        B    
 7 Athens 2.4        A    
 8 Athens 2.4        B    
 9 Athens 3.3        A    
10 Athens 3.3        B    

从上表中,我想保留重复项中的所有 B 值。我不能简单地使用 filter(kpi %in% B) 因为下面的数据有 A 和 B 值,它们不重复,我想保留它们。

structure(list(county = c("Athens", "Athens", "Athens", "Athens", 
"Athens", "Athens", "Athens", "Athens", "Athens", "Athens", "Athens", 
"Athens", "Athens", "Athens", "Athens", "Athens", "Athens", "Athens", 
"Athens", "Athens", "Athens", "Athens", "Athens", "Athens", "Athens", 
"Athens", "Athens", "Athens", "Athens", "Athens", "Athens", "Athens", 
"Athens", "Athens"), measure_id = c("1", "1", "2.13", "2.13", 
"2.3", "2.3", "2.4", "2.4", "3.3", "3.3", "2.12.1", "2.12.1", 
"2.14.3", "2.14.3", "2.3.1", "2.3.1", "2.3.2", "2.3.2", "2.5.1", 
"2.5.1", "2.5.4", "2.5.4", "2.5.5", "2.5.5", "2.6.4", "2.6.4", 
"2.7.4", "2.7.4", "2.8.1", "2.8.1", "2.8.2", "2.8.2", "2.9.1", 
"2.9.1"), kpi = c("A", "B", "A", "B", "A", "B", "A", "B", "A", 
"B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", 
"A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B")), spec = structure(list(
    cols = list(county = structure(list(), class = c("collector_character", 
    "collector")), mid = structure(list(), class = c("collector_character", 
    "collector")), kpi = structure(list(), class = c("collector_character", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), delim = "\t"), class = "col_spec"), problems = <pointer: 0x0000015517989d70>, row.names = c(NA, 
-34L), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
))  

最佳答案

我们可以在识别重复项后使用 anti_join!

df1 <- df %>% 
  filter(duplicated(mid)) %>% 
  mutate(kpi= replace(kpi, kpi=="B", "A")) 

anti_join(df, df1, by=c("county", "mid", "kpi"))

输出:

  county mid     kpi  
   <chr>  <chr>   <chr>
 1 Athens 1.1     A    
 2 Athens 1.2     A    
 3 Athens 1.3     A    
 4 Athens 1.4     A    
 5 Athens 1.5     A    
 6 Athens 1.6     A    
 7 Athens 2.1.1   A    
 8 Athens 2.1.2   A    
 9 Athens 2.1.3   A    
10 Athens 2.1.4   A    
11 Athens 2.2.1   A    
12 Athens 2.2.2   A    
13 Athens 2.2.3   A    
14 Athens 2.2.4   A    
15 Athens 2.3.1   B    
16 Athens 2.3.2   B    
17 Athens 2.3.3   A    
18 Athens 2.3.4   A    
19 Athens 2.3.5   A    
20 Athens 2.3.6   A    
21 Athens 2.11    A    
22 Athens 2.16    A    
23 Athens 2.3     B    
24 Athens 2.4     B    
25 Athens 2.5.2   A    
26 Athens 2.5.3   A    
27 Athens 2.5.3.A A    
28 Athens 2.5.3.B A    
29 Athens 2.5.5   B    
30 Athens 2.6.1   A    
31 Athens 2.6.2   A    
32 Athens 2.6.3   A    
33 Athens 2.6.4   B    
34 Athens 2.6.5   A    
35 Athens 2.6.6   A    
36 Athens 2.6.7   B    
37 Athens 2.7.2   A    
38 Athens 2.7.3   A    
39 Athens 2.7.3.A A    
40 Athens 2.7.3.B A    
41 Athens 2.7.4   B    
42 Athens 2.7.5   A    
43 Athens 2.7.6   A    
44 Athens 2.9.1   B    
45 Athens 2.9.2   A    
46 Athens 2.12.1  B    
47 Athens 2.12.2  A    
48 Athens 2.15.1  A    
49 Athens 2.15.2  A    
50 Athens 2.15.3  A    
51 Athens 2.19    A    
52 Athens 3.8     A    
53 Athens 1       B    
54 Athens 2.1     A    
55 Athens 2.2     A    
56 Athens 2.5.1   B    
57 Athens 2.5.4   B    
58 Athens 2.7.1   A    
59 Athens 2.8.1   B    
60 Athens 2.8.2   B    
61 Athens 2.13    B    
62 Athens 2.13.A  A    
63 Athens 2.13.B  A    
64 Athens 2.13.C  A    
65 Athens 2.13.D  A    
66 Athens 2.14.3  B    
67 Athens 2.17    A    
68 Athens 2.18    A    
69 Athens 3.1     A    
70 Athens 3.2     A    
71 Athens 3.3     B  

关于根据另一列中的条件匹配删除重复行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67627831/

相关文章:

r 派生指示变量

r - 按所有变量计数/计数与 dplyr 不同

r - 分组因素的条件过滤器 - dplyr

R:*_join (dplyr) 的标准评估

删除在 R 中重复的两行

删除数据框中所有行中具有相同值的所有列

去除极坐标图边缘的多余空间和环

r - 如何从数组R中的某个数字开始?

r - 如何从带有列表列的 tibble 中提取单个元素,该列表列是具有不等维度的列表列表?

r - 使用 left_join 合并两个数据帧会在 'right' 列中产生 NA