我有一个器官移植数据集。数据是按捐赠者组织的,我正在查看肺部 - 所以每个捐赠者都有两个肺部。数据整理如下:
library(tidyverse)
data <- tribble(
~donor_id, ~sequence, ~organ_placed,
1, 5, "L",
1, 10, "R",
2, 13, "B",
3, 4, "L",
3,69,NA,
3,70,NA,
3,71,NA,
3, 72, NA,
)
donor_id
= 捐赠者 ID序列
= 捐赠者报价的编号。 IE。它传给第 1 个人,然后是 2,然后是 3...等等。organ_placed
= 如果器官匹配,则列出放置的器官。例如。对于供体 1,左肺放置在第 5 个供体上,右肺放置在第 10 个供体上。
我正在尝试弄清楚如何为捐赠者 3 编写案例:左肺被放置在第 3 个报价上,但右肺从未被放置 - 匹配报价持续到 72 个报价然后停下来。
我希望新数据如下所示:
desired_data <- tribble(
~donor_id, ~sequence, ~organ_placed,~outcome,
1, 5, "L","Left Single",
1, 10, "R","Right Single",
2, 13, "B","Bilateral",
3, 4, "L","Left Single",
3,69,NA,NA,
3,70,NA,NA,
3,71,NA,NA,
3, 72, NA,"Right Discarded"
)
我认为这类似于group_by(donor_id)
,然后sequence
的slice_max
有NA
organ_placed
,但我需要它来编码它是被丢弃的右肺。
data <- data %>%
mutate(outcome = case_when(
organ_placed=="L" ~ "Left Single",
organ_placed=="R" ~ "Right Single",
organ_placed=="B" ~ "Bilateral",
**what would go here? to group by donor id and find the maxslice has an NA,
and L (or R) already occurred in that donor?**
))
感谢您的帮助!
最佳答案
我们可以这样做:
基本上,在这种方法中,我们使用 case_when
简单地遍历所有条件:
library(dplyr)
data %>%
group_by(donor_id) %>%
mutate(outcome = case_when(organ_placed == "L" ~ "Left Single",
organ_placed == "R" ~ "Right Single",
organ_placed == "B" ~ "Bilateral",
(is.na(organ_placed) &
row_number() == max(row_number())) &
first(organ_placed) == "L" ~ "Right Discarded",
(is.na(organ_placed) &
row_number() == max(row_number())) &
first(organ_placed) == "R" ~ "Left Discarded",
TRUE ~ NA_character_))
donor_id sequence organ_placed outcome
<dbl> <dbl> <chr> <chr>
1 1 5 L Left Single
2 1 10 R Right Single
3 2 13 B Bilateral
4 3 4 L Left Single
5 3 69 NA NA
6 3 70 NA NA
7 3 71 NA NA
8 3 72 NA Right Discarded
关于R:如何根据分组变量并以前面的行为条件来编码新变量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/75475934/