我有一个包含两列的数据框。第二列是文件名。
df <- data.frame(paragraph = "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua.",
filename = "./data/RevCon_2015_C1_Austria_05_06.txt", stringsAsFactors = FALSE)
如何从第二列中提取某些字符串(使用 stringr
)并将它们添加(使用 dplyr::mutate
)作为附加变量( session 、年份、国家/地区)等),这样我得到以下结果:
df2 <- data.frame(paragraph = "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua.",
filename = "./data/RevCon_2015_C1_Austria_05_06.txt", conference = "RevCon", year = "2015", country= "Austria", date = "06.05.2015", stringsAsFactors = FALSE)
最佳答案
我们可以使用 tidyr::separate
执行以下操作:
library(tidyverse);
df %>%
mutate(tmp = gsub("(\\./data/|\\.txt)", "", filename)) %>%
separate(
tmp,
into = c("conference", "year", "ignored", "country", "month", "day")) %>%
mutate(date = paste(day, month, year, sep = "/")) %>%
select(-ignored, -month, -day)
# paragraph filename conference year
#1 Lorem ipsum [...] ./data/RevCon_2015_C1_Austria_05_06.txt RevCon 2015
# country date
#1 Austria 06/05/2015
请注意,这假设文件名
遵循以下模式:./data/{conference}_{year}_{ignored}_{country}_{month}_{日}.txt
示例数据
df <- data.frame(
paragraph = "Lorem ipsum [...]",
filename = "./data/RevCon_2015_C1_Austria_05_06.txt",
stringsAsFactors = FALSE)
关于r - 从文件名中提取字符串并使用 mutate 创建新列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50005523/