r 分组和计数

我正在处理如下数据集

      Id     Date           Color
      10     2008-11-17     Red
      10     2008-11-17     Red
      10     2008-11-17     Blue
      10     2010-01-26     Red
      10     2010-01-26     Green
      10     2010-01-26     Green
      10     2010-01-26     Red
      29     2007-07-31     Red
      29     2007-07-31     Red
      29     2007-07-31     Blue
      29     2007-07-31     Green
      29     2007-07-31     Red

我的目标是创建一个这样的数据集

     Color      Representation      Count            Min   Max
     Red        1 + 1 + 1  = 3      2 + 2 + 3 = 7    2     3
     Blue       1 + 1      = 2      1 + 1            1     1
     Green      1 +  1     = 2      2 + 1            1     2

代表

1^st Row , 2^nd 列(Representation)中的值为 3，因为 Red 基于 ID 和 Date 的唯一组合表示了 3 次。例如，1^st 和 2^nd 行是相同的，Id(10) 和 Date(2008-11-17) 所以这个组合表示一次(1_{(10，2008-11-17)})。第 4^th 和 7^th 行是相同的 Id(10) 和 Date(2010-01-26) 组合，因此这种唯一组合表示一次 (1<子>(10, 2010-01-26)) 。第 8^th、9^th、12^th 是 Id(29) 和 Date(2007-07-31) 的相同组合，类似这表示一次 (1_{(29, 2007-07-31)})。因此，第 1 行第 2 列中的值为 3。

1_{(10, 2008-11-17)} + 1_{(10, 2010-10-26)} + 1_{(29, 2007-07- 31)} =3

计数

第 1^st 行，第 3^rd 列(计数)中的值为 7，因为 ID 10 在 2008-11-17 (2 _{10, 2008-11-17})，在 2010- 上 ID 10 再次提到 Red 两次01-26 (2 _{10, 2010-01-26}) 和 3 次 ID 29 on 2007-07-31 2 _{29,2007-07-31}

2_{(10, 2008-11-17)} + 2_{(10, 2010-10-26)} + 3_{(29, 2007-07- 31)}

非常感谢您对完成这个独特的频率/计数表的任何帮助。

数据集

Id   = c(10,10,10,10,10,10,10,29,29,29,29,29) 
Date = c("2008-11-17", "2008-11-17", "2008-11-17","2010-01-26","2010-01-26","2010-01-26","2010-01-26",
         "2007-07-31","2007-07-31","2007-07-31","2007-07-31","2007-07-31") 
Color = c("Red", "Red", "Blue", "Red", "Green", "Green", "Red", "Red", "Red", "Blue", "Green", "Red") 
df = data.frame(Id, Date, Color)

最佳答案

使用 dplyr:

library(dplyr)
dat %>% group_by(Color) %>%
    summarize(Representation = n_distinct(Id, Date), Count = n())
# # A tibble: 3 × 3
#    Color Representation Count
#   <fctr>          <int> <int>
# 1   Blue              2     2
# 2  Green              2     3
# 3    Red              3     7

关于r 分组和计数，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/40118936/

上一篇：django - BASE_DIR 返回设置路径而不是项目路径(django 1.10)

下一篇：scala - 调试 session