我有一个假设考试的数据集。
id <- c(1,1,3,4,5,6,7,7,8,9,9)
test_date <- c("2012-06-27","2012-07-10","2013-07-04","2012-03-24","2012-07-22", "2013-09-16","2012-06-21","2013-10-18", "2013-04-21", "2012-02-16", "2012-03-15")
result_date <- c("2012-07-29","2012-09-02","2013-08-01","2012-04-25","2012-09-01","2013-10-20","2012-07-01","2013-10-31", "2013-05-17", "2012-03-17", "2012-04-20")
data1 <- as_data_frame(id)
data1$test_date <- test_date
data1$result_date <- result_date
colnames(data1)[1] <- "id"
“id”表示参加特定考试的学生的 ID。 “test_date”是学生参加考试的日期,“result_date”是学生成绩发布的日期。我有兴趣找出哪些学生在考试结果发布之前重新参加了考试,例如那些知道自己表现不佳并重新参加考试而不费心去了解自己分数的学生。例如,“id”为 1 的学生于“2012-07-10”第二次参加考试,该日期早于他第一次考试的结果日期“2012-07-29”。
我尝试过:
data1%>%
group_by(id) %>%
arrange(id, test_date) %>%
filter(n() >= 2) %>% #To only get info on students who have taken the exam more than once and then merge it back in with the original data set using a join function
所以本质上,我想创建一个名为“re_test”的新列,如果学生在收到之前考试的结果之前重新参加考试,则该列等于 1,否则等于 0(那些在看到分数后重新参加考试的人或那些在看到成绩后重新参加考试的人)没有重考)。
我尝试通过从第一个 result_date 中减去第二个 test_date 来查找日期为正或负的情况:
mutate(data1, re_test = result_date - lead(test_date, default = first(test_date)))
但是,这会导致不同 ID 的学生混淆。我尝试拆分,但 mutate 无法在数据帧列表上工作,所以现在我陷入困境:
split(data1, data1$id)
补充一下,这是所需结果的一部分:
data2 <- as_data_frame(id <- c(1,1,3,4))
data2$test_date_result <- c("2012-06-27","2012-07-10", "2013-07-04","2012-03-24")
data2$result_date_result <- c("2012-07-29","2012-09-02","2013-08-01","2012-04-25")
data2$re_test <- c(1, 0, 0, 0)
对于冗长的内容表示歉意,希望我说得足够清楚。
提前非常感谢!
最佳答案
library(reshape2)
library(dplyr)
# first melt so that we can sequence by date
data1m <- data1 %>%
melt(id.vars = "id", measure.vars = c("test_date", "result_date"), value.name = "event_date")
# any two tests in a row is a flag - use dplyr::lag to comapre the previous
data1mc <- data1m %>%
arrange(id, event_date) %>%
group_by(id) %>%
mutate (multi_test = (variable == "test_date" & lag(variable == "test_date"))) %>%
filter(multi_test)
# id variable event_date multi_test
# 1 1 test_date 2012-07-10 TRUE
# 2 9 test_date 2012-03-15 TRUE
## join back to the original
data1 %>%
left_join (data1mc %>% select(id, event_date, multi_test),
by=c("id" = "id", "test_date" = "event_date"))
关于r - 如何对角减去R中的不同列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43702223/