r - 从数据集中仅提取第一次出现的行的有效方法是什么？

我有一个包含患者遭遇的数据框，并且只想为每个患者提取最旧的遭遇(这可以使用顺序遭遇 ID 来完成)。我想出的代码有效，但我确信使用 dplyr 有更有效的方法来执行此任务。你会推荐什么方法？

4 名患者 10 次接触的示例:

encounter_ID <- c(1021, 1022, 1013, 1041, 1007, 1002, 1003, 1043, 1085, 1077)
patient_ID <- c(855,721,821,855,423,423,855,721,423,855)
gender <- c(0,0,1,0,1,1,0,0,1,0)
df <- data.frame(encounter_ID, patient_ID, gender)

结果(期望和获得):

    encounter_ID    patient_ID  gender
    1003            855         0
    1022            721         0
    1013            821         1
    1002            423         1

我的方法

1)提取唯一患者的列表

list.patients <- unique(df$patient_ID)

2)创建一个空的数据框来接收我们每个病人第一次遇到的输出

one.encounter <- data.frame()

3)遍历列表中的每个患者以提取他们的第一次遭遇并填充我们的数据框

for (i in 1:length(list.patients)) {
one.patient <- df %>% filter(patient_ID==list.patients[i])
one.patient.ordered <- one.patient[order(one.patient$encounter_ID),]
first.encounter <- head(one.patient.ordered, n=1)
one.encounter <- rbind(one.encounter, first.encounter)
}

最佳答案

这是一个基本的 R 解决方案，可以在没有 dplyr 的情况下有效地做到这一点
duplicated将遇到的具有特定患者 ID 的第一行编码为 FALSE ，以及与 TRUE 具有相同患者 ID 的所有后续行(在这里，我通过在 ! 之前添加 duplicated 来扭转这一点)，因此如果您已通过遇到_ID 对数据框进行排序，则可以使用它来仅选择第一次遇到

df <- df[order(df$encounter_ID),] #order dataframe by encounter id
#subset to rows that are not duplicates of a previous encounter for that patient
first <- df[!duplicated(df$patient_ID),]

关于r - 从数据集中仅提取第一次出现的行的有效方法是什么？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/51773889/

r - 从数据集中仅提取第一次出现的行的有效方法是什么？

上一篇：.net-core - dotnet 核心 2.1 : "Found conflicts between different versions of" when referencing a web project from an xunit project

下一篇：vba - VBA 中的 WorksheetFunction.IsNumber() 和 IsNumeric() 有什么区别？