我有一个 data.frame
,其中包含多个单独的事件 (id
)。 重复的
行
之前已被删除。
df <- data.frame(id=as.integer(c(123,123,123,124,124,124,125,125,125,126,126,126)),
date=as.Date(c("2014-03-12", "2014-03-12", "2015-09-16",
"2015-10-24", "2016-12-11", "2016-12-11",
"2017-08-06", "2017-11-26", "2018-01-29",
"2015-09-16", "2015-09-16", "2015-09-16")),
fruit=as.character(c("Apple", "Orange", "Passion fruit", "Banana",
"Lemon", "Strawberry", "Banana", "Apple",
"Passion fruit", "Orange", "Bluberry", "Pineapple")),
row=rep(c(1, 2, 3)))
id date fruit row
1 123 2014-03-12 Apple 1
2 123 2014-03-12 Orange 2
3 123 2015-09-16 Passion fruit 3
4 124 2015-10-24 Banana 1
5 124 2016-12-11 Lemon 2
6 124 2016-12-11 Strawberry 3
7 125 2017-08-06 Banana 1
8 125 2017-11-26 Apple 2
9 125 2018-01-29 Passion fruit 3
10 126 2015-09-16 Orange 1
11 126 2015-09-16 Blueberry 2
12 126 2015-09-16 Pineapple 3
我只需要选择每个包含重复项的最早日期,也就是说,如果最早日期出现多次,我需要保留所有出现的情况。
所需输出:
df
id date fruit row
1 123 2014-03-12 Apple 1
2 123 2014-03-12 Orange 2
3 126 2015-09-16 Orange 1
4 126 2015-09-16 Blueberry 2
5 126 2015-09-16 Pineapple 3
最佳答案
我们可以按“id”分组,使用 min
创建条件,并使用 duplicated
检查重复项
library(dplyr)
df %>%
filter(date == min(date) & (duplicated(date)|
duplicated(date, fromLast = TRUE)), .by = id)
-输出
id date fruit row
1 123 2014-03-12 Apple 1
2 123 2014-03-12 Orange 2
3 126 2015-09-16 Orange 1
4 126 2015-09-16 Bluberry 2
5 126 2015-09-16 Pineapple 3
关于r - 如何对组中最早日期的重复项进行子集化?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/76330594/