我的数据示例的结构如下:
Individ <- data.frame(Participant = c("Bill", "Bill", "Bill", "Bill", "Bill", "Bill", "Jane", "Jane", "Jane", "Jane",
"Jane", "Jane", "Jane", "Jane", "Jane", "Jane", "Jane", "Jane", "Bill", "Bill", "Bill", "Bill", "Bill", "Bill"),
Time = c(1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6),
Condition = c("Placebo", "Placebo", "Placebo", "Placebo", "Placebo", "Placebo", "Expr", "Expr", "Expr", "Expr", "Expr", "Expr",
"Placebo", "Placebo", "Placebo", "Placebo", "Placebo", "Placebo", "Expr", "Expr", "Expr", "Expr", "Expr", "Expr"),
Location = c("Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home", "Home",
"Away", "Away", "Away", "Away", "Away", "Away"),
Power = c(400, 250, 180, 500, 300, 450, 600, 512, 300, 500, 450, 200, 402, 210, 130, 520, 310, 451, 608, 582, 390, 570, 456, 205))
当Condition
等于Placebo并且Location
等于Home时,我希望找到每个Participant
的尾行。这将用于检查最后一个时间点的Power
,因此我可以提前检查剩余的 10 行。因此,找到行号非常重要。
我知道我可以使用以下方法找到每个参与者
的最后一行:
ddply(Individ,.(Participant, Time, Condition),function(x) tail(x,1))
但是,我的实际数据帧长度为 400 万行,有超过 50 个参与者,并且在不同的时间
点收集了Power
。有没有一种方法可以快速做到这一点,而且计算成本不高?
干杯!
最佳答案
使用data.table
,我们可以将“data.frame”转换为“data.table”(setDT(Individ)
),并按“Participant”分组,在“i”中使用逻辑条件 ('Condition == 'Placebo' & Location =='Home') 并对最后一个观察结果进行子集 (
tail(.SD, 1L)or
.SD[.N]`)
library(data.table)
setDT(Individ)[Condition=='Placebo' & Location=='Home',
tail(.SD, 1L) ,.(Participant)]
# Participant Time Condition Location Power
#1: Bill 6 Placebo Home 450
#2: Jane 6 Placebo Home 451
如果我们需要行号,可以通过.I
获取
setDT(Individ)[Condition=='Placebo' & Location=='Home',
c(rn = .I[.N],tail(.SD, 1L)) ,.(Participant)]
# Participant rn Time Condition Location Power
#1: Bill 6 6 Placebo Home 450
#2: Jane 18 6 Placebo Home 451
关于r - 如何找到满足设定条件的数据框的尾行?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35403375/