我有数据集,我想创建一个变量check
来检查组day
中是否有任何行的ID
变量不同。
df <- data.frame(ID=c("id1", "id1","id2", "id2","id3","id3","id3"),
day=c("01/02/2008","01/02/2008","10/02/2009","08/03/2009","11/08/2007","11/08/2007","11/08/2008"),
it =c("ul","tr","cb","ul","ul","tc","tr"))
df$day <- as.Date(as.character(df$day), format = "%m/%d/%Y")
ID day it
1 id1 2008-01-02 ul
2 id1 2008-01-02 tr
3 id2 2009-10-02 cb
4 id2 2009-08-03 ul
5 id3 2007-11-08 ul
6 id3 2007-11-08 tc
7 id3 2008-11-08 tr
我使用此代码时出现的问题是重复列出的
id3
,因为它有2行相同,但是我希望所有行都必须相同,否则我不计算在内。c <- df[duplicated(df$ID) & duplicated(df$day),]
df1 <- df[df$ID %in% c$ID,]
ID day it
1 id1 2008-01-02 ul
2 id1 2008-01-02 tr
5 id3 2007-11-08 ul
6 id3 2007-11-08 tc
7 id3 2008-11-08 tr
我想要这样的输出
ID day it check
1 id1 2008-01-02 ul Yes
2 id1 2008-01-02 tr Yes
3 id2 2009-10-02 cb No
4 id2 2009-08-03 ul No
5 id3 2007-11-08 ul No
6 id3 2007-11-08 tc No
7 id3 2008-11-08 tr No
最佳答案
within(df, # attach the columns of df in a separate environment using within()
check <- unlist( by(df, # group df by ID using by()
INDICES = ID, # check for unique of days with length = 1, if so return true else false
FUN = function(x) rep( length( unique( x$day ) ) == 1, length(x$day) ) )
))
# ID day it check
# 1 id1 01/02/2008 ul TRUE
# 2 id1 01/02/2008 tr TRUE
# 3 id2 10/02/2009 cb FALSE
# 4 id2 08/03/2009 ul FALSE
# 5 id3 11/08/2007 ul FALSE
# 6 id3 11/08/2007 tc FALSE
# 7 id3 11/08/2008 tr FALSE
数据:
df <- data.frame(ID=c("id1", "id1","id2", "id2","id3","id3","id3"),
day=c("01/02/2008","01/02/2008","10/02/2009","08/03/2009","11/08/2007","11/08/2007","11/08/2008"),
it =c("ul","tr","cb","ul","ul","tc","tr"))
关于r - 在R中按组对所有行重复检查,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42590628/