r - 在R中按组对所有行重复检查

标签 r

我有数据集,我想创建一个变量check来检查组day中是否有任何行的ID变量不同。

df <- data.frame(ID=c("id1", "id1","id2", "id2","id3","id3","id3"),
             day=c("01/02/2008","01/02/2008","10/02/2009","08/03/2009","11/08/2007","11/08/2007","11/08/2008"),
             it =c("ul","tr","cb","ul","ul","tc","tr"))
df$day <- as.Date(as.character(df$day), format = "%m/%d/%Y")
  ID        day it
1 id1 2008-01-02 ul
2 id1 2008-01-02 tr
3 id2 2009-10-02 cb
4 id2 2009-08-03 ul
5 id3 2007-11-08 ul
6 id3 2007-11-08 tc
7 id3 2008-11-08 tr

我使用此代码时出现的问题是重复列出的id3,因为它有2行相同,但是我希望所有行都必须相同,否则我不计算在内。
c <- df[duplicated(df$ID) & duplicated(df$day),]
df1 <- df[df$ID %in% c$ID,]
   ID        day it
1 id1 2008-01-02 ul
2 id1 2008-01-02 tr
5 id3 2007-11-08 ul
6 id3 2007-11-08 tc
7 id3 2008-11-08 tr

我想要这样的输出
   ID        day it check
1 id1 2008-01-02 ul   Yes
2 id1 2008-01-02 tr   Yes
3 id2 2009-10-02 cb    No
4 id2 2009-08-03 ul    No
5 id3 2007-11-08 ul    No
6 id3 2007-11-08 tc    No
7 id3 2008-11-08 tr    No

最佳答案

within(df,  # attach the columns of df in a separate environment using within()
       check <- unlist( by(df,   # group df by ID using by()
                           INDICES = ID, # check for unique of days with length = 1, if so return true else false
                           FUN = function(x) rep( length( unique( x$day ) ) == 1, length(x$day) ) ) 
                        ))
#    ID        day it check
# 1 id1 01/02/2008 ul  TRUE
# 2 id1 01/02/2008 tr  TRUE
# 3 id2 10/02/2009 cb FALSE
# 4 id2 08/03/2009 ul FALSE
# 5 id3 11/08/2007 ul FALSE
# 6 id3 11/08/2007 tc FALSE
# 7 id3 11/08/2008 tr FALSE

数据:
df <- data.frame(ID=c("id1", "id1","id2", "id2","id3","id3","id3"),
                 day=c("01/02/2008","01/02/2008","10/02/2009","08/03/2009","11/08/2007","11/08/2007","11/08/2008"),
                 it =c("ul","tr","cb","ul","ul","tc","tr"))

关于r - 在R中按组对所有行重复检查,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42590628/

相关文章:

R - tbl/collect 有时很慢

r - 使用 ggfortify 和 ggrepel 进行 PCA

r - 通过列名的字符向量对data.table进行排序

r - 设置网格中轴标签的格式

r - 用新值填充数据框行

regex - gsubfn : differences between perl and tclk?

r - 相当快地保存非常大的 R 数据帧

python - 组织模式、源代码块、结果向量

r - 打印数据框,列居中对齐

R 包 : RCurl and curl packages install failure on Linux