r - 如何在另一个数据框中使用变量+值查找替换单元格值？

我正在使用一个大型调查数据框架，其中每个问题的答案都是一个数字。对于数字调查问题，例如年龄，数字就是数字。但对于多项选择题，数字是与保存在单独的查找数据框中的文本相对应的代码。

如何将每个变量的所有数字替换为查找数据框中相应的标签？

示例数据:

df_numeric <- 
  tibble::tribble(
    ~gender, ~age, ~city, ~yearly_income, ~fav_colour,  ~over_100_more_vars,
          1,   22,     1,          55000,           1,                "...",
          2,   31,     2,         122000,           2,                "...",
          1,   41,     1,         101000,           2,                "...",
          2,   19,     5,          76000,           1,                "...",
          1,   64,     7,          32000,           6,                "...")
    
df_lookup <- 
  tibble::tribble(
           ~variable, ~number,        ~label,
            "gender",       1,        "Male",
            "gender",       2,      "Female",
              "city",       1,    "New York", 
              "city",       2,      "Sydney",
              "city",       5,      "London",
              "city",       7,       "Paris",
        "fav_colour",       1,         "Red",
        "fav_colour",       2,        "Blue",
        "fav_colour",       6,      "Purple",
   "one_of_100_more",       1,       "Label",
   "one_of_100_more",       2,       "Label",
   "two_of_100_more",       1,       "Label",
               "etc",       1,         "etc")

我理想中想做的是:检查 df_numeric 中的变量名称，在 df_lookup 中查找该变量，然后对于该特定变量，将每个“数字”替换为其相应的“标签”，然后移动到下一个变量，用标签替换它的数字，继续下一个......它应该看起来像这样

df_output <- 
  tibble::tribble(
    ~gender, ~age,      ~city, ~yearly_income, ~fav_colour,  ~over_100_more_vars,
    "Male",   22,  "New York",          55000,       "Red",                "...",
  "Female",   31,    "Sydney",         122000,      "Blue",                "...",
    "Male",   41,  "New York",         101000,      "Blue",                "...",
  "Female",   19,    "London",          76000,       "Red",                "...",
    "Male",   64,     "Paris",          32000,    "Purple",                "...")

重要警告:

有数百个变量，因此在代码中写出每个变量的名称是不可行的(例如 this answer )。
我们只需要替换性别、城市等字符变量。无需替换年龄和收入等数字变量的值，因为这些变量的格式已经正确。这些已经采用正确格式的数字变量不在 df_lookup 中。

最佳答案

新版本 我会提供这个 tidyverse 解决方案(当前版本包含年龄处理):

library(tidyverse) 
df_numeric %>% 
  mutate(across(-yearly_income, as.character)) %>% 
  pivot_longer(-c("yearly_income", "age") ) %>% 
  left_join(mutate(df_lookup, number = as.character(number)), by = c("name" = "variable", "value" = "number")) %>% 
  select(-value) %>% 
  pivot_wider(id_cols = c("yearly_income", "age"), values_from = label, names_from = name)

# A tibble: 5 x 6
  yearly_income age   gender city     fav_colour over_100_more_vars
          <dbl> <chr> <chr>  <chr>    <chr>      <chr>             
1         55000 22    Male   New York Red        <NA>              
2        122000 31    Female Sydney   Blue       <NA>              
3        101000 41    Male   New York Blue       <NA>              
4         76000 19    Female London   Red        <NA>              
5         32000 64    Male   Paris    Purple     <NA>

关于r - 如何在另一个数据框中使用变量+值查找替换单元格值？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/77214876/

r - 如何在另一个数据框中使用变量+值查找替换单元格值？

上一篇：python-3.x - 以相反的顺序分割字符串

下一篇：inheritance - 在 GitLab CI 中使用 `extends` 时，哪些作业属性会被合并或覆盖？