我对这段代码应该如何工作感到困惑:
foo <- factor(c("a", "b", "a", "c", "a", "a", "c", "c"))
#[1] a b a c a a c c
#Levels: a b c
factor(foo, exclude = "a")
#[1] a b a c a a c c
#Levels: a b c
Warning message:
In as.vector(exclude, typeof(x)) : NAs introduced by coercion
它不应该显示所有 a
替换为 NA
的因子吗?如果没有,如何实现?
最佳答案
此错误自 R-3.4.0 起已得到修复。以下答案仅作为历史引用。
正如我在评论中所说,目前排除
仅适用于
factor(as.character(foo), exclude = "a")
而不是
factor(foo, exclude = "a")
注意,R 3.3.1 下的文档 ?factor
根本不令人满意:
exclude: a vector of values to be excluded when forming the set of
levels. This should be of the same type as ‘x’, and will be
coerced if necessary.
以下内容没有给出任何警告或错误,但也没有执行任何操作:
## foo is a factor with `typeof` being "integer"
factor(foo, exclude = 1L)
factor(foo, exclude = factor("a", levels = levels(foo)))
#[1] a b a c a a c c
#Levels: a b c
实际上,该文档看起来非常矛盾,因为它还写道:
The encoding of the vector happens as follows. First all the
values in ‘exclude’ are removed from ‘levels’.
所以看起来开发人员确实希望 exclude
成为一个“字符”。
这更有可能是 factor
内部的错误。问题相当明显,当输入向量 x
属于“factor”类时,factor(x, ...)
中的以下行会造成困惑:
exclude <- as.vector(exclude, typeof(x))
在这种情况下,typeof(x)
是“整数”。如果 exclude
是字符串,则在尝试将字符串转换为整数时将生成 NA
。
我真的不知道为什么factor
里面有这样一行。如果此行不存在,则后续两行只是在做正确的事情:
x <- as.character(x)
levels <- levels[is.na(match(levels, exclude))]
因此,补救措施/修复就是消除这一行:
my_factor <- function (x = character(), levels, labels = levels, exclude = NA,
ordered = is.ordered(x), nmax = NA)
{
if (is.null(x))
x <- character()
nx <- names(x)
if (missing(levels)) {
y <- unique(x, nmax = nmax)
ind <- sort.list(y)
y <- as.character(y)
levels <- unique(y[ind])
}
force(ordered)
#exclude <- as.vector(exclude, typeof(x))
x <- as.character(x)
levels <- levels[is.na(match(levels, exclude))]
f <- match(x, levels)
if (!is.null(nx))
names(f) <- nx
nl <- length(labels)
nL <- length(levels)
if (!any(nl == c(1L, nL)))
stop(gettextf("invalid 'labels'; length %d should be 1 or %d",
nl, nL), domain = NA)
levels(f) <- if (nl == nL)
as.character(labels)
else paste0(labels, seq_along(levels))
class(f) <- c(if (ordered) "ordered", "factor")
f
}
现在我们来测试一下:
my_factor(foo, exclude = "a")
#[1] <NA> b <NA> c <NA> <NA> c c
#Levels: b c
my_factor(as.character(foo), exclude = "a")
#[1] <NA> b <NA> c <NA> <NA> c c
#Levels: b c
关于r - "exclude"`factor()` 中的参数不起作用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39817076/