r - 使用 dplyr 基于 R 中的其他两列自定义变异新列

我的目标是创建一个新的 df 列，其值基于其他两列。我的数据集涉及一项研究的招募。我想要一个专栏来定义一个人是否参与特定轮次的研究，如果是的话，这是他们的第一次参与、第二次、第三次等等(最多 8 轮)。目前，我正在 dplyr 中使用 mutate(case_when)) 并使用 lag() 尝试执行此操作。然而，如果一个人错过了一轮研究，后来又重新开始，那么它的作用就会不正确。数据集如下所示:

    person |  round  |  in_round  |
       A        1           1
       A        2           1
       A        3           1
       A        4           1
       A        5           1
       A        6           0
       A        7           0
       A        8           0
       B        1           0
       B        2           0
       B        3           1
       B        4           1
       B        5           1
       B        6           1
       B        7           0
       B        8           1

我需要一个单独的列，为每个人使用 round 和 in_round 来生成以下内容:

    person |  round  |  in_round  |  round_status
       A        1           1         recruited
       A        2           1        follow_up_1
       A        3           1        follow_up_2
       A        4           1        follow_up_3
       A        5           1        follow_up_4
       A        6           0           none
       A        7           0           none
       A        8           0           none
       B        1           0           none
       B        2           0           none
       B        3           1         recruited
       B        4           1        follow_up_1
       B        5           1        follow_up_2
       B        6           1        follow_up_3
       B        7           0            none
       B        8           1        follow_up_4

总结:

其中in_round == 0，round_status ==“none”
第一次in_round == 1，round_status ==“已招募”
随后的时间in_round == 1，round_status ==“follow_up_X”(取决于该个体之前出现的波数)。

最佳答案

试试这个:

df %>% 
  group_by(person) %>%
  arrange(round) %>%
  mutate(cum_round = cumsum(in_round),
         round_status = case_when(
    in_round == 0 ~ "none",
    cum_round == 1 ~ "recruited",
    TRUE ~ paste0("follow_up_", cum_round - 1)
  ))

关于r - 使用 dplyr 基于 R 中的其他两列自定义变异新列，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/60512398/

r - 使用 dplyr 基于 R 中的其他两列自定义变异新列

上一篇：javascript - JavaScript 中数组中的对象未更新

下一篇：sql-server - 提供程序 : SSL Provider, 错误:0 - 现有连接被远程主机强行关闭