r - 有没有一种更优雅的方法将 88 级变量折叠为 5 级变量？

我有一个包含 88 个级别(县)的分类变量，我想将它们聚合到五个更大的地理区域中。有没有比大量 ifelse 语句(如下所示)更优雅的方法来做到这一点？

survey.responses$admin<-ifelse(survey.responses$CNTY=="Lake","Northeast",
                         ifelse(survey.responses$CNTY=="Traverse","Northwest", 
                         ifelse(survey.responses$CNTY=="Ramsey","Central",
                         ifelse(survey.responses$CNTY=="Cottonwood","South","out of state")

除非想象一下 CNTY 有 88 个级别!有什么想法吗？

最佳答案

两种快速方法，对于较大的集合，我建议使用合并方法。

数据

dat <- data.frame(cnty = c("Lake", "Traverse", "Ramsey", "Cottonwood"),
                  stringsAsFactors = FALSE)

合并/加入。我更喜欢这种方式有几个原因，最重要的是，维护匹配的 CSV 并将 read.csv 将 CSV 放入 ref 查找表中非常容易。我会故意省略“Lake”以显示不匹配时会发生什么。

ref <- data.frame(cnty = c("Cottonwood", "Ramsey", "Traverse", "SomeOther"),
                  admin = c("South", "Central", "Northwest", "NeverNeverLand"),
                  stringsAsFactors = FALSE)
out <- merge(dat, ref, by = "cnty", all.x = TRUE)
out
#         cnty     admin
# 1 Cottonwood     South
# 2       Lake      <NA>
# 3     Ramsey   Central
# 4   Traverse Northwest

默认值的分配方式如下:

out$admin[is.na(out$admin)] <- "out of state"
out
#         cnty        admin
# 1 Cottonwood        South
# 2       Lake out of state
# 3     Ramsey      Central
# 4   Traverse    Northwest

如果您正在使用 tidyverse 的其他组件，可以通过以下方式完成:

library(dplyr)
left_join(dat, ref, by = "cnty") %>%
  mutate(admin = if_else(is.na(admin), "out of state", admin))

查找。这对于小事情来说效果很好，但也许不适合你。 (我再次评论了“Lake”以显示不匹配的情况。)

c(Cottonwood="South", # Lake="Northeast",
  Ramsey="Central", Traverse="Northwest")[dat$cnty]
#        <NA>    Traverse      Ramsey  Cottonwood 
#          NA "Northwest"   "Central"     "South"

关于r - 有没有一种更优雅的方法将 88 级变量折叠为 5 级变量？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/58740548/

r - 有没有一种更优雅的方法将 88 级变量折叠为 5 级变量？

上一篇：python-3.x - 如何获得每个组/模型的系数/截距，以便为每个组绘制拟合线？

下一篇：r - 将循环结果附加到向量中