有几篇文章与获取 R 回归公式中的变量列表相关 - 基本答案是使用 all.vars
。例如,
> all.vars(log(resp) ~ treat + factor(dose))
[1] "resp" "treat" "dose"
这很好,因为它删除了所有函数和运算符(以及重复项,未显示)。但是,当公式包含 $
运算符或下标(例如 in
> form = log(cows$weight) ~ factor(bulls[[3]]) * herd$breed
> all.vars(form)
[1] "cows" "weight" "bulls" "herd" "breed"
这里将数据框名称cows
、bulls
、herd
标识为变量,与实际变量的名称解耦或丢失。相反,我真正想要的是这样的结果:
> mystery.fcn(form)
[1] "cows$weight" "bulls[[3]]" "herd$breed"
最优雅的方法是什么?我有一个提案将作为答案发布,但也许有人有更优雅的解决方案,并且会赢得更多选票!
最佳答案
一种有效的方法,虽然有点乏味,是将运算符 $
等替换为变量名的合法字符,将字符串转回公式,应用 all。 vars
,并取消破坏结果:
All.vars = function(expr, retain = c("\\$", "\\[\\[", "\\]\\]"), ...) {
# replace operators with unlikely patterns _Av1_, _Av2_, ...
repl = paste("_Av", seq_along(retain), "_", sep = "")
for (i in seq_along(retain))
expr = gsub(retain[i], repl[i], expr)
# piece things back together in the right order, and call all.vars
subs = switch(length(expr), 1, c(1,2), c(2,1,3))
vars = all.vars(as.formula(paste(expr[subs], collapse = "")), ...)
# reverse the mangling of names
retain = gsub("\\\\", "", retain) # un-escape the patterns
for (i in seq_along(retain))
vars = gsub(repl[i], retain[i], vars)
vars
}
使用retain
参数来指定我们希望保留而不是视为运算符的模式。默认值是 $
、[[
和 ]]
(全部都经过适当转义)以下是一些结果:
> form = log(cows$weight) ~ factor(bulls[[3]]) * herd$breed
> All.vars(form)
[1] "cows$weight" "bulls[[3]]" "herd$breed"
将 retain
更改为还包括 (
和 )
:
> All.vars(form, retain = c("\\$", "\\(", "\\)", "\\[\\[", "\\]\\]"))
[1] "log(cows$weight)" "factor(bulls[[3]])" "herd$breed"
这些点被传递给 all.vars
,它实际上与 all.names
相同,但具有不同的默认值。这样我们也可以获得retain
中没有的函数和运算符:
> All.vars(form, functions = TRUE)
[1] "~" "log" "cows$weight" "*"
[5] "factor" "bulls[[3]]" "herd$breed"
关于r - 当有下标时从公式中提取变量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30770941/