我正在寻找在 Stackoverflow 上找不到的解决方案。 我有一个包含数百万行的数据框,如下所示:
+----------------------+----------------------------------+-----------+-----------+-----------+----------
| session | session_b | A | B | C | D |
+----------------------+----------------------------------+-----------+-----------+-----------+----------
| 162f2f8f7c5x8f6de8f8 | e5c44c77b9cae93afa9457e535c81451 | 588238268 | 587606411 | 581149505 | 581149505 |
| 162f2f8f7c5x8f6de8f8 | e5c44c77b9cae93afa9457e535c81451 | 591266911 | 591257117 | 568939090 | 587606411 |
+----------------------+----------------------------------+-----------+-----------+-----------+----------
我的目标是从 A 到 D 检查每一行的重复值。如果存在重复值,我想保留非重复值。下面是上表的结果。
+----------------------+----------------------------------+-----------+-----------+---------+--------+---
| session | session_b | A | B | C | D |
+----------------------+---------------------------------+-----------+----------+-----------+-----------
| 162f2f8f7c5x8f6de8f8 | e5c44c77b9cae93afa9457e535c81451| 588238268 | 587606411| |
| 162f2f8f7c5x8f6de8f8 | e5c44c77b9cae93afa9457e535c81451| 591266911 | 591257117| 568939090 | 587606411 |
+----------------------+---------------------------------+-----------+----------+-----------+-----------
最佳答案
如果我们想要替换所有重复项,请使用 duplicated
by row 以及 apply
和 MARGIN = 1
df1[c('A', 'B', 'C', 'D')] <- t(apply(df1[c('A', 'B', 'C', 'D')], 1,
function(x) replace(x, duplicated(x)|duplicated(x, fromLast = TRUE), NA)))
df1
# session session_b A B C D
#1 162f2f8f7c5x8f6de8f8 e5c44c77b9cae93afa9457e535c81451 588238268 587606411 NA NA
#2 162f2f8f7c5x8f6de8f8 e5c44c77b9cae93afa9457e535c81451 591266911 591257117 568939090 587606411
数据
df1 <- structure(list(session = c("162f2f8f7c5x8f6de8f8", "162f2f8f7c5x8f6de8f8"
), session_b = c("e5c44c77b9cae93afa9457e535c81451", "e5c44c77b9cae93afa9457e535c81451"
), A = c(588238268L, 591266911L), B = c(587606411L, 591257117L
), C = c(581149505L, 568939090L), D = c(581149505L, 587606411L
)), class = "data.frame", row.names = c(NA, -2L))
关于r - 保留 R 中一行中的非重复值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61110914/