r - 如果一列中的值重复,如何将另一列的某些值提取到新列中?

标签 r dplyr

对糟糕的标题措辞表示歉意。我有一些如下所示的数据(按 id 分组),其中“问题”列包含多次重复:

enter image description here

structure(list(study_id = c("02ipnnqgeovkrxz", "02ipnnqgeovkrxz", 
"02ipnnqgeovkrxz", "02ipnnqgeovkrxz", "02ipnnqgeovkrxz", "02ipnnqgeovkrxz", 
"0bsilzm5iabdnoj", "0bsilzm5iabdnoj", "0bsilzm5iabdnoj", "0bsilzm5iabdnoj", 
"0bsilzm5iabdnoj", "0bsilzm5iabdnoj", "1171bwmljjct6me", "1171bwmljjct6me", 
"1171bwmljjct6me", "1171bwmljjct6me", "1171bwmljjct6me", "1171bwmljjct6me"
), question = c("37tlJa09k7zwKFL ", "37tlJa09k7zwKFL", "3WTpbAzIQmbnlpb ", 
"3WTpbAzIQmbnlpb", "3eEVJgaAP6c9FPL ", "3eEVJgaAP6c9FPL", "7QhOyTdA1MjKmX3 ", 
"7QhOyTdA1MjKmX3", "8eMvvNHEh1CAqk5 ", "8eMvvNHEh1CAqk5", "e3u9ZmoNISb0vfn ", 
"e3u9ZmoNISb0vfn", "3IDmpN1FZDQqhcF ", "3IDmpN1FZDQqhcF", "3WRNXeyBSwuXvh3 ", 
"3WRNXeyBSwuXvh3", "6QnjC0CHjV1kmvX ", "6QnjC0CHjV1kmvX"), response = c("0.839", 
"word", "0.739", "word", "1.353", "picture", "1.418", "word", 
"1.563", "word", "6.377", "word", "1.795", "picture", "1.876", 
"picture", "0.96", "picture")), row.names = c(NA, -18L), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), groups = structure(list(study_id = c("02ipnnqgeovkrxz", 
"02ipnnqgeovkrxz", "02ipnnqgeovkrxz", "02ipnnqgeovkrxz", "02ipnnqgeovkrxz", 
"02ipnnqgeovkrxz", "0bsilzm5iabdnoj", "0bsilzm5iabdnoj", "0bsilzm5iabdnoj", 
"0bsilzm5iabdnoj", "0bsilzm5iabdnoj", "0bsilzm5iabdnoj", "1171bwmljjct6me", 
"1171bwmljjct6me", "1171bwmljjct6me", "1171bwmljjct6me", "1171bwmljjct6me", 
"1171bwmljjct6me"), question = c("37tlJa09k7zwKFL", "37tlJa09k7zwKFL ", 
"3eEVJgaAP6c9FPL", "3eEVJgaAP6c9FPL ", "3WTpbAzIQmbnlpb", "3WTpbAzIQmbnlpb ", 
"7QhOyTdA1MjKmX3", "7QhOyTdA1MjKmX3 ", "8eMvvNHEh1CAqk5", "8eMvvNHEh1CAqk5 ", 
"e3u9ZmoNISb0vfn", "e3u9ZmoNISb0vfn ", "3IDmpN1FZDQqhcF", "3IDmpN1FZDQqhcF ", 
"3WRNXeyBSwuXvh3", "3WRNXeyBSwuXvh3 ", "6QnjC0CHjV1kmvX", "6QnjC0CHjV1kmvX "
), .rows = list(2L, 1L, 6L, 5L, 4L, 3L, 8L, 7L, 10L, 9L, 12L, 
    11L, 14L, 13L, 16L, 15L, 18L, 17L)), row.names = c(NA, -18L
), class = c("tbl_df", "tbl", "data.frame"), .drop = TRUE))

我正在尝试重新格式化数据,以便在每个分组 ID 内,“问题”列的每一行都是唯一的。对同一问题的多个回答被分成另一列:

enter image description here

“问题”列代表参与者看到的唯一项目,并且不应在 id 内重复(因为受试者只看到每个项目一次)。响应列代表他们对该项目(图片/文字)的响应 - 但现在他们的 react 时间也集中到此列中。我基本上是想获取 react 时间并将它们放入一个新列中(仍然与适当的 ID 和问题相对应)。

一个 tidyverse 解决方案会很棒,但任何指导将不胜感激!我尝试了“传播”/“总结”的几种变体,但似乎无法正确使用。

最佳答案

尝试这个基本解决方案:

#Data manipulation
df$study_id <- trimws(df$study_id)
df$question <- trimws(df$question)
df$response <- trimws(df$response)
df$Index <- as.numeric(df$response)
df$Index2 <- ifelse(is.na(df$Index),'response','rt')
df$Index <- NULL
df <- as.data.frame(df)
#Reshape
DataG <- reshape(df, idvar=c('study_id','question'), timevar='Index2', direction="wide")
DataG <- DataG[,c(1,2,4,3)]
rownames(DataG)<-NULL

         study_id        question response.response response.rt
1 02ipnnqgeovkrxz 37tlJa09k7zwKFL              word       0.839
2 02ipnnqgeovkrxz 3WTpbAzIQmbnlpb              word       0.739
3 02ipnnqgeovkrxz 3eEVJgaAP6c9FPL           picture       1.353
4 0bsilzm5iabdnoj 7QhOyTdA1MjKmX3              word       1.418
5 0bsilzm5iabdnoj 8eMvvNHEh1CAqk5              word       1.563
6 0bsilzm5iabdnoj e3u9ZmoNISb0vfn              word       6.377
7 1171bwmljjct6me 3IDmpN1FZDQqhcF           picture       1.795
8 1171bwmljjct6me 3WRNXeyBSwuXvh3           picture       1.876
9 1171bwmljjct6me 6QnjC0CHjV1kmvX           picture        0.96

关于r - 如果一列中的值重复,如何将另一列的某些值提取到新列中?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62638601/

相关文章:

r - 过滤最接近目标值的数字并消除重复的观测值

r - 如何从多个绘图输出中选择单个图

json - 使用 tidyjson 从 JSON 中提取数组

r - 使用 R (dplyr) 将三分位数转换为二进制

r - 选择仅包含外部列表中的值的列

r - 大表破坏性过滤的解决方案

r - 在 dbplyr 中传递要作为函数参数应用的函数

r - 分组ggplot中的备用刻度标签以避免R中的重叠

r - 添加包含列表最小值的数据框列

使用 R 只读 tif 的一部分