我有一个数据框,其中一列包含用“;”分隔的多个信息,如下所示:
DF = data.frame(a = c(1,1,1,2,2), b = c('aaa','aaa','aba','abc','ccc'),
extra_info = c(
'animal=horse;color=orange;shape=circle',
'animal=monkey;shape=square;value=532',
'animal=horse;color=blue;shape=square;value=321',
'animal=dog;color=green;value=678',
'color=pink;shape=triangle'
))
我无法使用read.table,因为我已经使用不同的函数来读取数据(而且extra_info列中每一行的内容都不同,并且列会困惑)。我想要做的是将所有这些信息分成不同的列,并相应地分配适当的名称,例如:
a b animal color shape value
1 aaa horse orange circle NA
1 aaa monkey NA square 532
1 aba horse blue square 321
2 abc dog green NA 678
2 ccc NA pink triangle NA
到目前为止,我已经尝试过:
new_cols = DF %>% separate(extra_info, c(LETTERS[1:4]), sep = ";")
new_cols %>% separate(A, c("key","value"), sep = '=') %>%
separate(B, c("key","value"), sep = '=') %>%
separate(C, c("key","value"), sep = '=') %>%
separate(D, c("key","value"), sep = '=') %>%
pivot_wider(names_from = c("key"), values_from = c("value"))
但它没有按预期工作。
最佳答案
这是一种方法,我将键值对的语法更改为有效的 JSON 语法,并使用 jsonlite::fromJSON
解析它:
library(purrr)
library(dplyr)
library(stringr)
library(jsonlite)
DF %>%
mutate(
json = str_replace_all(extra_info, pattern = "\\b", replacement = '"'),
json = str_replace_all(json, pattern = fixed("="), replacement = ":"),
json = str_replace_all(json, pattern = fixed(";"), replacement = ","),
json = paste("{", json, "}"),
) %>%
pull(json) %>%
map(jsonlite::fromJSON) %>%
map(as.data.frame) %>%
bind_rows %>%
cbind(DF, .)
# a b extra_info animal color shape value
# 1 1 aaa animal=horse;color=orange;shape=circle horse orange circle <NA>
# 2 1 aaa animal=monkey;shape=square;value=532 monkey <NA> square 532
# 3 1 aba animal=horse;color=blue;shape=square;value=321 horse blue square 321
# 4 2 abc animal=dog;color=green;value=678 dog green <NA> 678
# 5 2 ccc color=pink;shape=triangle <NA> pink triangle <NA>
关于r - 使用从 R 上的另一列中提取的信息创建新列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67288451/