在下面的数据中,我试图删除 mid
列中的重复行。我想保留 mid
重复但 kpi
匹配 B 的行。这应该跨越组 county
我只是在这里显示重复项,但 dput 数据不仅仅是重复项
# A tibble: 34 x 3
county mid kpi
<chr> <chr> <chr>
1 Athens 1 A
2 Athens 1 B
3 Athens 2.13 A
4 Athens 2.13 B
5 Athens 2.3 A
6 Athens 2.3 B
7 Athens 2.4 A
8 Athens 2.4 B
9 Athens 3.3 A
10 Athens 3.3 B
从上表中,我想保留重复项中的所有 B 值。我不能简单地使用 filter(kpi %in% B)
因为下面的数据有 A 和 B 值,它们不重复,我想保留它们。
structure(list(county = c("Athens", "Athens", "Athens", "Athens",
"Athens", "Athens", "Athens", "Athens", "Athens", "Athens", "Athens",
"Athens", "Athens", "Athens", "Athens", "Athens", "Athens", "Athens",
"Athens", "Athens", "Athens", "Athens", "Athens", "Athens", "Athens",
"Athens", "Athens", "Athens", "Athens", "Athens", "Athens", "Athens",
"Athens", "Athens"), measure_id = c("1", "1", "2.13", "2.13",
"2.3", "2.3", "2.4", "2.4", "3.3", "3.3", "2.12.1", "2.12.1",
"2.14.3", "2.14.3", "2.3.1", "2.3.1", "2.3.2", "2.3.2", "2.5.1",
"2.5.1", "2.5.4", "2.5.4", "2.5.5", "2.5.5", "2.6.4", "2.6.4",
"2.7.4", "2.7.4", "2.8.1", "2.8.1", "2.8.2", "2.8.2", "2.9.1",
"2.9.1"), kpi = c("A", "B", "A", "B", "A", "B", "A", "B", "A",
"B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B",
"A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B")), spec = structure(list(
cols = list(county = structure(list(), class = c("collector_character",
"collector")), mid = structure(list(), class = c("collector_character",
"collector")), kpi = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = "\t"), class = "col_spec"), problems = <pointer: 0x0000015517989d70>, row.names = c(NA,
-34L), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
))
最佳答案
我们可以在识别重复项后使用 anti_join!
df1 <- df %>%
filter(duplicated(mid)) %>%
mutate(kpi= replace(kpi, kpi=="B", "A"))
anti_join(df, df1, by=c("county", "mid", "kpi"))
输出:
county mid kpi
<chr> <chr> <chr>
1 Athens 1.1 A
2 Athens 1.2 A
3 Athens 1.3 A
4 Athens 1.4 A
5 Athens 1.5 A
6 Athens 1.6 A
7 Athens 2.1.1 A
8 Athens 2.1.2 A
9 Athens 2.1.3 A
10 Athens 2.1.4 A
11 Athens 2.2.1 A
12 Athens 2.2.2 A
13 Athens 2.2.3 A
14 Athens 2.2.4 A
15 Athens 2.3.1 B
16 Athens 2.3.2 B
17 Athens 2.3.3 A
18 Athens 2.3.4 A
19 Athens 2.3.5 A
20 Athens 2.3.6 A
21 Athens 2.11 A
22 Athens 2.16 A
23 Athens 2.3 B
24 Athens 2.4 B
25 Athens 2.5.2 A
26 Athens 2.5.3 A
27 Athens 2.5.3.A A
28 Athens 2.5.3.B A
29 Athens 2.5.5 B
30 Athens 2.6.1 A
31 Athens 2.6.2 A
32 Athens 2.6.3 A
33 Athens 2.6.4 B
34 Athens 2.6.5 A
35 Athens 2.6.6 A
36 Athens 2.6.7 B
37 Athens 2.7.2 A
38 Athens 2.7.3 A
39 Athens 2.7.3.A A
40 Athens 2.7.3.B A
41 Athens 2.7.4 B
42 Athens 2.7.5 A
43 Athens 2.7.6 A
44 Athens 2.9.1 B
45 Athens 2.9.2 A
46 Athens 2.12.1 B
47 Athens 2.12.2 A
48 Athens 2.15.1 A
49 Athens 2.15.2 A
50 Athens 2.15.3 A
51 Athens 2.19 A
52 Athens 3.8 A
53 Athens 1 B
54 Athens 2.1 A
55 Athens 2.2 A
56 Athens 2.5.1 B
57 Athens 2.5.4 B
58 Athens 2.7.1 A
59 Athens 2.8.1 B
60 Athens 2.8.2 B
61 Athens 2.13 B
62 Athens 2.13.A A
63 Athens 2.13.B A
64 Athens 2.13.C A
65 Athens 2.13.D A
66 Athens 2.14.3 B
67 Athens 2.17 A
68 Athens 2.18 A
69 Athens 3.1 A
70 Athens 3.2 A
71 Athens 3.3 B
关于根据另一列中的条件匹配删除重复行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67627831/