当我遇到这种奇怪的情况时,我想通过将两个水平组合为一个来修改因子变量中的水平。基本上,我的新关卡已创建,但所有剩余的关卡似乎都已移至下一个关卡。这是我的示例数据、使用的代码和输出。
library(tidyverse)
data <- structure(list(factor1 = structure(c(1L, 1L, 2L, 3L, 1L, 2L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 3L, 1L, 1L, 1L, 4L), .Label = c("0", "1", "2", "3",
"4", "5", "6", "7"), class = "factor")), row.names = c(NA, -30L
), class = c("tbl_df", "tbl", "data.frame"), .Names = "factor1")
data_out <- data %>% mutate(factor1 = ifelse(factor1 %in% c('0', '1'),
factor1, '>1'))
structure(list(factor1 = c("1", "1", "2", ">1", "1", "2", "1",
"1", "2", "2", "2", "2", "2", "1", "2", "1", "1", "1", "1", "1",
"1", "1", "1", "1", "1", ">1", "1", "1", "1", ">1")), .Names = "factor1",
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -30L))
这是理想的行为吗?这当然不是我的情况。如何解释和纠正?
最佳答案
我猜这个问题与构建因子的方式有关。我仍然不清楚一个因素如何通过 mutate
从级别 {"0", "1"} 变为级别 {"1","2", ">1"} .
R 因子实际上是以 1 为基数的整数向量,其属性是它们的水平。因此,您的“0”级别最初实际上是整数 1,而您的“1”级别实际上是整数 2。显然,mutate
函数适合创建一个具有附加级别的新因子,该级别打印为“>1”,但也将“0”级别重新分配给新的“1”级别和“1” ”水平提升至“2”水平。对我来说,这看起来像是 mutate
部分的危险行为。我认为它应该给你一个级别为“0”、“1”、“>1”的新因素,或者它应该抛出一个错误。
错误来自 ifelse
,尽管 mutate
通过将新列也纳入一个因素来解决问题。如果您将 data
强制转换为数据框,那么您会看到:
data$factor2 <- ifelse( data$factor1 %in% c('0', '1'),
data$factor1, '>1')
data
#-------- same issue except
factor1 factor2
1 0 1
2 0 1
3 1 2
4 2 >1
.... delete the other 26 rows
> str(data)
'data.frame': 30 obs. of 2 variables:
$ factor1: Factor w/ 8 levels "0","1","2","3",..: 1 1 2 3 1 2 1 1 2 2 ...
$ factor2: chr "1" "1" "2" ">1" ...
这会让你留在 dplyr
包中:
recode_factor(data$factor1, `0` = "0", `1` = "1", .default=">1")
[1] 0 0 1 >1 0 1 0 0 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 >1 0 0 0 >1
Levels: 0 1 >1
关于r - 使用 ifelse 修改因子变量的水平,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49285401/