我有一个包含 88 个级别(县)的分类变量,我想将它们聚合到五个更大的地理区域中。有没有比大量 ifelse 语句(如下所示)更优雅的方法来做到这一点?
survey.responses$admin<-ifelse(survey.responses$CNTY=="Lake","Northeast",
ifelse(survey.responses$CNTY=="Traverse","Northwest",
ifelse(survey.responses$CNTY=="Ramsey","Central",
ifelse(survey.responses$CNTY=="Cottonwood","South","out of state")
除非想象一下 CNTY 有 88 个级别!有什么想法吗?
最佳答案
两种快速方法,对于较大的集合,我建议使用合并
方法。
数据
dat <- data.frame(cnty = c("Lake", "Traverse", "Ramsey", "Cottonwood"),
stringsAsFactors = FALSE)
合并/加入。我更喜欢这种方式有几个原因,最重要的是,维护匹配的 CSV 并将
read.csv
将 CSV 放入ref
查找表中非常容易。我会故意省略“Lake”以显示不匹配时会发生什么。ref <- data.frame(cnty = c("Cottonwood", "Ramsey", "Traverse", "SomeOther"), admin = c("South", "Central", "Northwest", "NeverNeverLand"), stringsAsFactors = FALSE) out <- merge(dat, ref, by = "cnty", all.x = TRUE) out # cnty admin # 1 Cottonwood South # 2 Lake <NA> # 3 Ramsey Central # 4 Traverse Northwest
默认值的分配方式如下:
out$admin[is.na(out$admin)] <- "out of state" out # cnty admin # 1 Cottonwood South # 2 Lake out of state # 3 Ramsey Central # 4 Traverse Northwest
如果您正在使用
tidyverse
的其他组件,可以通过以下方式完成:library(dplyr) left_join(dat, ref, by = "cnty") %>% mutate(admin = if_else(is.na(admin), "out of state", admin))
查找。这对于小事情来说效果很好,但也许不适合你。 (我再次评论了“Lake”以显示不匹配的情况。)
c(Cottonwood="South", # Lake="Northeast", Ramsey="Central", Traverse="Northwest")[dat$cnty] # <NA> Traverse Ramsey Cottonwood # NA "Northwest" "Central" "South"
关于r - 有没有一种更优雅的方法将 88 级变量折叠为 5 级变量?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58740548/