我需要根据如下数据计算新列:
structure(list(english_score = c(3L, 4L, 3L, 3L, 4L, 3L, 4L,
2L, 4L, 2L, 3L, 3L, 2L, 2L, 3L, 4L, 3L, 3L, 4L, 3L, 4L, 3L, 2L
), math_score = c(4L, 4L, 3L, 4L, 4L, 4L, 3L, 2L, 3L, 3L, 4L,
2L, 4L, 2L, 4L, 2L, 3L, 3L, 2L, 2L, 2L, 4L, 2L), science_score = c(3L,
4L, 4L, 4L, 3L, 4L, 4L, 3L, 3L, 2L, 3L, 4L, 4L, 4L, 4L, 4L, 4L,
2L, 3L, 2L, 3L, 3L, 4L)), row.names = c(NA, -23L), class = c("data.table",
"data.frame"), .internal.selfref = <pointer: 0x000002478ee34d50>)
我想制作这样的东西:
structure(list(english_score = c(3L, 4L, 3L, 3L, 4L, 3L, 4L,
2L, 4L, 2L, 3L, 3L, 2L, 2L, 3L, 4L, 3L, 3L, 4L, 3L, 4L, 3L, 2L
), math_score = c(4L, 4L, 3L, 4L, 4L, 4L, 3L, 2L, 3L, 3L, 4L,
2L, 4L, 2L, 4L, 2L, 3L, 3L, 2L, 2L, 2L, 4L, 2L), science_score = c(3L,
4L, 4L, 4L, 3L, 4L, 4L, 3L, 3L, 2L, 3L, 4L, 4L, 4L, 4L, 4L, 4L,
2L, 3L, 2L, 3L, 3L, 4L), english_level = c("Level C", "Level D",
"Level C", "Level C", "Level D", "Level C", "Level D", "Level B",
"Level D", "Level B", "Level C", "Level C", "Level B", "Level B",
"Level C", "Level D", "Level C", "Level C", "Level D", "Level C",
"Level D", "Level C", "Level B"), math_level = c("Level D", "Level D",
"Level C", "Level D", "Level D", "Level D", "Level C", "Level B",
"Level C", "Level C", "Level D", "Level B", "Level D", "Level B",
"Level D", "Level B", "Level C", "Level C", "Level B", "Level B",
"Level B", "Level D", "Level B"), science_level = c("Level C",
"Level D", "Level D", "Level D", "Level C", "Level D", "Level D",
"Level C", "Level C", "Level B", "Level C", "Level D", "Level D",
"Level D", "Level D", "Level D", "Level D", "Level B", "Level C",
"Level B", "Level C", "Level C", "Level D")), row.names = c(NA,
-23L), class = c("data.table", "data.frame"), .internal.selfref = <pointer:
0x000002478ee34d50>)
到目前为止,我的方法是使用函数来计算新变量的水平...
myfunction<-function(x){case_when(x<2~"Level A",
x>1 & x<3~"Level B",
x>2 & x<4~"Level C",
x>3~"Level D")}
....然后,创建新变量并一一指定它们的名称。
DT[, english_level:=lapply(.SD, myfunction), .SDcols='english_score']
DT[, math_level:=lapply(.SD, myfunction), .SDcols='math_score']
DT[, science_level:=lapply(.SD, myfunction), .SDcols='science_score']
如何简化此过程,最好使用 data.table?
最佳答案
我会这样做(我将你的数据称为DT
,因为utils::data()
是一个基本的R函数):
score_cols <- grep("_score$", names(DT), value = TRUE)
level_cols <- sub("_score", "_level", score_cols)
DT[,
(level_cols) := lapply(.SD, myfunction),
.SDcols = score_cols
]
此外,您的 myfunction()
使用 dplyr::case_when()
。这可以工作,但某些 dplyr 函数与 data.table 发生冲突( Between()
、first()
和 last()
与我当前拥有的版本)。您可以将其替换为 data.table::fcase()
。
myfunction <- function(x) {
fcase(
x == 1, "Level A",
x == 2, "Level B",
x == 3, "Level C",
x == 4, "Level D"
)
}
这应该也比 dplyr
版本更快。
此外,使用此特定函数,您实际上可以通过将字母表中的第 n 个字母指定为等级来替换键入逻辑时的大小写:
assign_letter_grade <- function(n) {
paste("Level", LETTERS[n])
}
关于r - 如何使用 data.table 高效地创建新变量并分配列名?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/75329709/