我想知道是否有一种简单的方法可以使用 ddply
实现我在下面描述的内容。我的数据框描述了一个有两个条件的实验。参与者必须在选项 A 和 B 之间做出选择,我们记录了他们做出决定所花的时间,以及他们的回答是否准确。
我使用 ddply
按条件创建平均值。 nAccurate
列总结了每个条件下准确响应的数量。我也想知道他们花了多少时间来决定并在RT
栏中表达出来。但是,我想计算平均响应时间仅当参与者得到正确响应时(即Accuracy==1
)。目前,下面的代码只能计算所有响应(准确的和不准确的)的平均 react 时间。是否有一种简单的方法来修改它以获得仅在准确试验中计算的平均响应时间?
请参阅下面的示例代码,谢谢!
library(plyr)
# Create sample data frame.
Condition = c(rep(1,6), rep(2,6)) #two conditions
Response = c("A","A","A","A","B","A","B","B","B","B","A","A") #whether option "A" or "B" was selected
Accuracy = rep(c(1,1,0),4) #whether the response was accurate or not
RT = c(110,133,121,122,145,166,178,433,300,340,250,674) #response times
df = data.frame(Condition,Response, Accuracy,RT)
head(df)
Condition Response Accuracy RT
1 1 A 1 110
2 1 A 1 133
3 1 A 0 121
4 1 A 1 122
5 1 B 1 145
6 1 A 0 166
# Calculate averages.
avg <- ddply(df, .(Condition), summarise,
N = length(Response),
nAccurate = sum(Accuracy),
RT = mean(RT))
# The problem: response times are calculated over all trials. I would like
# to calculate mean response times *for accurate responses only*.
avg
Condition N nAccurate RT
1 6 4 132.8333
2 6 4 362.5000
最佳答案
使用plyr
,您可以按如下方式进行:
ddply(df,
.(Condition), summarise,
N = length(Response),
nAccurate = sum(Accuracy),
RT = mean(RT[Accuracy==1]))
这给出:
Condition N nAccurate RT
1: 1 6 4 127.50
2: 2 6 4 300.25
如果您使用data.table
,那么这是另一种方式:
library(data.table)
setDT(df)[, .(N = .N,
nAccurate = sum(Accuracy),
RT = mean(RT[Accuracy==1])),
by = Condition]
关于r - 如何使用 ddply 对特定列的数据进行子集化?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32612324/