这是我希望我的数据框的样子:
record color size height weight
1 blue large heavy
1 red
2 green small tall thin
但是,数据(df)显示如下:
record vars
1 color = "blue", size = "large"
2 color = "green", size = "small"
2 height = "tall", weight = "thin"
1 color = "red", weight = "heavy"
df 的代码
structure(list(record = c(1L, 2L, 2L, 1L), vars = structure(c(1L,
2L, 4L,
3L), .Label = c("color = \"blue\", size = \"large\"",
"color = \"green\", size = \"small\"", "color = \"red\", weight =
\"heavy\"",
"height = \"tall\", weight = \"thin\""), class = "factor")), class =
"data.frame", row.names = c(NA,
-4L))
对于每条记录,我想用“,”分隔符分隔 vars 列,并使用指定的变量名称创建一个新列...如果特定变量有多个值,则应重复记录
我知道要使用 tidyverse 执行此操作,我需要使用 dplyr::group_by 和 dplyr::separate,但是我不清楚如何将新变量名称合并到“into”参数中以进行分隔。我是否需要某种类型的正则表达式来识别等号“=”之前的任何文本作为“into”中的新变量名称?非常欢迎任何建议!
df %>%
group_by(record) %>%
separate(col = vars, into = c(regex expression?? / character vector?), sep = ",")
最佳答案
由于列几乎已经写成定义列表的 R 代码,您可以解析/评估它们,然后 unnest_wider
library(tidyverse)
df %>%
mutate(vars = map(vars, ~ eval(parse_expr(paste('list(', .x, ')'))))) %>%
unnest_wider(vars)
# record color size height weight
# <int> <chr> <chr> <chr> <chr>
# 1 1 blue large NA NA
# 2 2 green small NA NA
# 3 2 NA NA tall thin
关于r - 将一列分成多个变量,在 R 中具有唯一的列名,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59506519/