r - 在问答序列中找到最后注视的人

标签 r dplyr tidyr

我有Q期间凝视行为的数据问题和A回答序列;记录每个说话者的注视 A , B ,和C在列中A_aoi , B_aoi ,和C_aoi ,凝视持续时间记录在列 A_aoi_dur 中, B_aoi_dur ,和C_aoi_dur :

df <- data.frame(
  Speaker = c("ID01.A", NA, "ID01.B", "ID33.B", "ID33.A", "ID33.C"),
  Utterance = c("Who did it?", NA, "Peter did.", "So you're coming?", "erm", "Yes, sure."),
  Sequ = c(1,1,1,2,2,2),
  Q = c("q_wh", "", "", "q_decl", "", ""),
  A_aoi = c("C*B*B", "B", "B*", "B*C", "*C", "B*"),
  A_aoi_dur = c("1,2,3,4,5", "1", "1,2", "1,2,3", "1,2", "1,2"),
  B_aoi = c("C*A", "*A", "A*", "A*C", "C", "*C"),
  B_aoi_dur = c("1,2,3", "1,2", "1,2", "1,2,3", "1", "1,2"),
  C_aoi = c("A*A", "A", "A*", "B*C*B", "*B", "B*A"),
  C_aoi_dur = c("1,2,3", "1", "1,2", "1,2,3,4,5", "1,2", "1,2,3")
)

我需要找出提问者在完成问题后最后注视着哪个人

我一直在尝试通过以下操作序列到达那里,但遇到了困难:

library(tidyr)
library(dplyr)
library(stringr)
df %>%
  # for each `Sequ`...
  group_by(Sequ) %>%
  mutate(
    # Who is the question by?
    Quest_by = sub(".*(.)$", "\\1", first(Speaker)),
    # Who is the answer by?
    Answ_by = sub(".*(.)$", "\\1", last(Speaker))
  ) %>%
  # rename to create column names that are processable by `names_pattern` for `pivot_longer`:
  rename_with(~ str_c(., "_AOI"), ends_with("_aoi")) %>%
  # collect all AOI gazes by A, B, and C into one column: 
  pivot_longer(cols = contains("_aoi"), 
               names_to = c("Gaze_by", ".value"),  # 
               names_pattern = "^(.*)_([^_]+$)"
  ) %>%
  # rename `AOI` and `dur` columns:
  rename(Gaze_to = AOI, Gaze_dur = dur) %>%
  # edit `Gaze_by` and `Gaze_to` values for upcoming analysis:
  mutate(
    # simplify `Gaze_by` values to speaker labels:
    Gaze_by = sub("^(.).*", "\\1", Gaze_by),
    # insert comma into `Gaze_to` as splitting pattern for `separate_rows` command below:
    Gaze_to = str_replace_all(Gaze_to, "(?<=.)(?=.)", ",")
    ) %>%
  # assign each `Gaze_to` and `Gaze_dur` value its own row based on comma as splitting pattern: 
  separate_rows(c(Gaze_to, Gaze_dur), sep = ",", convert = TRUE)

所需输出:(以这种形式或类似形式)

  Speaker         Utterance Sequ      Q Q_by  Answ_by  Last_Gaze_to  Last_Gaze_dur
1  ID01.A       Who did it?    1   q_wh    A        B             B              5
2    <NA>              <NA>    1         
3  ID01.B        Peter did.    1         
4  ID33.B So you're coming?    2 q_decl    B        C             C              3
5  ID33.A               erm    2         
6  ID33.C        Yes, sure.    2   

编辑: 我想出了这个解决方案(其中 df0 是上述操作的结果):

df0 %>%
  filter(Quest_by == Gaze_by) %>%
  group_by(Q, Sequ) %>%
  mutate(Last_Gaze_to = last(Gaze_to),
         Last_Gaze_dur = last(Gaze_dur)) %>%
  ungroup() %>%
  group_by(Line) %>%
  slice_head() %>%
  select(-matches("^G")) %>%
  ungroup() %>%
  mutate(across(c(5:9),
                ~ifelse(Q == "", NA, .)))
# A tibble: 6 × 9
   Line Speaker Utterance          Sequ Q      Quest_by Answ_by Last_Gaze_to Last_Gaze_dur
  <int> <chr>   <chr>             <dbl> <chr>  <chr>    <chr>   <chr>                <int>
1     1 ID01.A  Who did it?           1 q_wh   A        B       B                        5
2     2 NA      NA                    1 NA     NA       NA      NA                      NA
3     3 ID01.B  Peter did.            1 NA     NA       NA      NA                      NA
4     4 ID33.B  So you're coming?     2 q_decl B        C       C                        3
5     5 ID33.A  erm                   2 NA     NA       NA      NA                      NA
6     6 ID33.C  Yes, sure.            2 NA     NA       NA      NA                      NA

感谢所有不厌其烦地研究这个困难问题的人。改进解决方案的建议已被采纳!

最佳答案

这是一个尝试。我对“凝视”等没有任何经验......

我花了一些时间和一些帮助(请参阅此处Conditionally take value from column1 if the column1 name == first(value) from column2 BY GROUP感谢@tmfmnk。

我希望这些代码可以有所帮助。为了可读性,我保留了代码原样。我相信人们可以对其进行微调。我尝试做的主要部分是在 block 中。

library(tidyverse)

df %>% 
  # block1
  separate(Speaker, c("SpeakerID", "SpeakerName")) %>% 
  group_by(Sequ) %>% 
  mutate(Q_by = first(SpeakerName)) %>% 
  mutate(Answer_by = last(na.omit(SpeakerName))) %>%
  rename_with(~str_replace(.x, '_aoi', ''), matches('aoi')) %>% 
  # block 2
  rowwise() %>%
  mutate(Last_Gaze_to = get(Q_by)) %>%
  group_by(Sequ, Q_by) %>%
  mutate(Last_Gaze_to = str_extract(last(Last_Gaze_to), '[A-Z]')) %>% 
  rename(A_aoi = A, B_aoi = B, C_aoi = C) %>% 
  rename_with(~str_replace(.x, '_dur', ''), matches('dur')) %>% 
  # block 3
  rowwise() %>% 
  mutate(Last_Gaze_dur = get(Q_by)) %>% 
  group_by(Sequ, Q_by) %>% 
  mutate(Last_Gaze_dur = first(Last_Gaze_dur)) %>% 
  mutate(Last_Gaze_dur = str_sub(Last_Gaze_dur, -1,-1)) %>% 
  # block 4
  ungroup() %>% 
  group_by(Sequ) %>% 
  mutate(across(c(Q_by, Answer_by, Last_Gaze_to, Last_Gaze_dur), ~ifelse(duplicated(.), NA,first(.)))) %>% 
  rename(A_aoi_dur = A, B_aoi_dur=B, C_aoi_dur=C)

输出:

oups:   Sequ [2]
  SpeakerID SpeakerName Utterance          Sequ Q        A_aoi A_aoi_dur B_aoi B_aoi_dur C_aoi C_aoi_dur Q_by  Answer_by Last_Gaze_to Last_Gaze_dur
  <chr>     <chr>       <chr>             <dbl> <chr>    <chr> <chr>     <chr> <chr>     <chr> <chr>     <chr> <chr>     <chr>        <chr>        
1 ID01      A           Who did it?           1 "q_wh"   C*B*B 1,2,3,4,5 C*A   1,2,3     A*A   1,2,3     A     B         B            5            
2 NA        NA          NA                    1 ""       B     1         *A    1,2       A     1         NA    NA        NA           NA           
3 ID01      B           Peter did.            1 ""       B*    1,2       A*    1,2       A*    1,2       NA    NA        NA           NA           
4 ID33      B           So you're coming?     2 "q_decl" B*C   1,2,3     A*C   1,2,3     B*C*B 1,2,3,4,5 B     C         C            3            
5 ID33      A           erm                   2 ""       *C    1,2       C     1         *B    1,2       NA    NA        NA           NA           
6 ID33      C           Yes, sure.            2 ""       B*    1,2       *C    1,2       B*A   1,2,3     NA    NA        NA           NA  

关于r - 在问答序列中找到最后注视的人,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/70472910/

相关文章:

r - 如何按dplyr中的固定行数分组? [复制]

r - 查找每组中的最低日期

r - 如何根据 R 中引用其他列的一列的值创建多个计算列?

r - 当使用 tidyr 分隔符和正则表达式正向前瞻时,有没有办法自动删除分隔符?

r - 在 R 中生成具有预定义 pdf 总和或 cdf 总和的 RNG 向量

r - 如何在R中定义 'local'变量?

r - 如何使用 dplyr 和数据框在 R 中创建百分位数?

r - 如何获得位于R中上一行和下一行的值之间的平均值?

r - R 中公式的串联

使用 dplyr 进行滚动回归