我想使用 R data.table 的公式创建多个变量。我有一个变量列表,对于每个变量,我想执行计算并创建一个新变量,将相同的字符串粘贴到每个列名称上。我可以让它一次适用于一个变量,但它不适用于 lapply 或循环。我怀疑我丢失了 R data.table 和引号或变量名与字符串的内容。我需要使用“..”还是用 eval() 换行? dplyr(或任何 tidyverse)解决方案也可以解决该问题。
这里是 mtcars 的示例代码:
library(data.table)
mtcars.dt <- setDT(mtcars)
myVars <- c("mpg", "hp", "qsec")
# Doesn't work:
for( myVar in myVars){
mtcars.dt[, paste0(myVar, ".disp.ratio") := myVar / disp]
}
# Doesn't work:
lapply(myVars, function(myVar) mtcars.dt[, paste0(myVar, ".disp.ratio") := myVar / disp])
# Works:
mtcars.dt[, mpg.disp.ratio := mpg / disp]
# Doesn't work
for (myVar in myVars){
mtcars.dt[, paste0(myVar, ".disp.lm.adj") :=
myVar -
lm(data = .SD, formula = myVar ~ disp)$coefficients[2] * (disp - mean(disp))]
}
# Doesn't work
lapply(myVars, function(x) mtcars.dt[, paste0(x, ".disp.lm.adj") :=
x -
lm(data = .SD, formula = x ~ disp)$coefficients[2] * (disp - mean(disp))])
# Works
mtcars.dt[, mpg.disp.lm.adj :=
mpg -
lm(data = .SD, formula = mpg ~ disp)$coefficients[2] * (disp - mean(disp))]
对于比率计算,我得到以下错误:
Error in myVar/disp : non-numeric argument to binary operator
对于 lm 调整,我收到以下错误:
Error in model.frame.default(formula = myVar ~ disp, data = .SD, drop.unused.levels = TRUE) :
variable lengths differ (found for 'disp')
最佳答案
我们可以使用get
library(data.table)
for( myVar in myVars){
mtcars.dt[, paste0(myVar, ".disp.ratio") := get(myVar) / disp]
}
或者转换为symbol
后用eval
换行
for( myVar in myVars){
mtcars.dt[, paste0(myVar, ".disp.ratio") := eval(as.name(myVar)) / disp]
}
或者另一种选择是在.SDcols
中指定,循环遍历.SD
(Data.table的子集,进行转换并通过赋值创建新变量( :=
)
mtcars.dt[, paste0(myVars, ".disp.ratio") := lapply(.SD, `/`, disp),
.SDcols = myVars]
对于第二种情况,我们可以使用paste
创建公式
for (myVar in myVars) {
mtcars.dt[, paste0(myVar, ".disp.lm.adj") :=
get(myVar) -
lm(data = .SD, formula = paste(myVar, "~ disp"))$coefficients[2] *
(disp - mean(disp))]
}
关于r data.table lapply 或 for 循环创建变量或生成列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59978580/