r - 如何对组中最早日期的重复项进行子集化？

我有一个 data.frame，其中包含多个单独的事件 (id)。 重复的 行之前已被删除。

df <- data.frame(id=as.integer(c(123,123,123,124,124,124,125,125,125,126,126,126)),
                 date=as.Date(c("2014-03-12", "2014-03-12", "2015-09-16", 
                                "2015-10-24", "2016-12-11", "2016-12-11", 
                                "2017-08-06", "2017-11-26", "2018-01-29", 
                                "2015-09-16", "2015-09-16", "2015-09-16")),
                 fruit=as.character(c("Apple", "Orange", "Passion fruit", "Banana", 
                                      "Lemon", "Strawberry",  "Banana", "Apple",
                                      "Passion fruit", "Orange", "Bluberry", "Pineapple")),
                 row=rep(c(1, 2, 3)))

        id       date         fruit row
    1  123 2014-03-12         Apple   1
    2  123 2014-03-12        Orange   2
    3  123 2015-09-16 Passion fruit   3
    4  124 2015-10-24        Banana   1
    5  124 2016-12-11         Lemon   2
    6  124 2016-12-11    Strawberry   3
    7  125 2017-08-06        Banana   1
    8  125 2017-11-26         Apple   2
    9  125 2018-01-29 Passion fruit   3
    10 126 2015-09-16        Orange   1
    11 126 2015-09-16      Blueberry  2
    12 126 2015-09-16     Pineapple   3

我只需要选择每个包含重复项的最早日期，也就是说，如果最早日期出现多次，我需要保留所有出现的情况。

所需输出:

df
    id       date         fruit row
1  123 2014-03-12         Apple   1
2  123 2014-03-12        Orange   2
3  126 2015-09-16        Orange   1
4  126 2015-09-16      Blueberry  2
5  126 2015-09-16     Pineapple   3

最佳答案

我们可以按“id”分组，使用 min 创建条件，并使用 duplicated 检查重复项

library(dplyr)
df %>% 
  filter(date == min(date) & (duplicated(date)|
     duplicated(date, fromLast = TRUE)), .by = id)

-输出

   id       date     fruit row
1 123 2014-03-12     Apple   1
2 123 2014-03-12    Orange   2
3 126 2015-09-16    Orange   1
4 126 2015-09-16  Bluberry   2
5 126 2015-09-16 Pineapple   3

关于r - 如何对组中最早日期的重复项进行子集化？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/76330594/

r - 如何对组中最早日期的重复项进行子集化？

上一篇：python - 如何定义嵌套在类下的 Pydantic 模型

下一篇：python - 为 Pydantic 模型字段指定不同的输入类型(以逗号分隔的字符串输入作为字符串列表)