我有一个具有以下列结构的表格:
Name Type
Urgent Care (Revenue Code: 0456) Per Case
IV Therapy (Revenue Codes 0260, 0269) Per Visit
Oncology Treatment (Revenue Codes: 0280, 0289) Per Visit
我想从名称列中提取数字收入代码,因此表格如下所示:
Name Rev Code Type
Urgent Care 0456 Per Case
IV Therapy 0260, 0269 Per Visit
Oncology Treatment 0280, 0289 Per Visit
名称列中的原始数据不一致,因为单词“Code”后面跟着一个“;” 、空格、“-”等。所以我尝试使用正则表达式来搜索第一个数字,然后拆分那里的列。
我尝试使用正则表达式从tidyr包中搜索第一个数字和separate():
library(tidyr)
separate(mydata, Name, into = c("Name", "Rev Code"), sep = "[[:digit:]]")
这会在正确的位置拆分列,但“修订版代码”列最终会变成空白? 我对 R 比较陌生,非常感谢任何帮助!
数据:
structure(list(
Name = c("Urgent Care (Revenue Code: 0456)", "IV Therapy (Revenue Codes 0260, 0269)",
"Oncology Treatment (Revenue Codes: 0280, 0289)"),
Type = c("Per Case", "Per Visit", "Per Visit")),
.Names = c("Name", "Type"), row.names = 1:3, class = "data.frame")
最佳答案
read.table(header=TRUE, stringsAsFactors=FALSE, sep=",", text='Name,Type
"Urgent Care (Revenue Code: 0456)", "Per Case"
"IV Therapy (Revenue Codes 0260, 0269)","Per Visit"
"Oncology Treatment (Revenue Codes: 0280, 0289)", "Per Visit"') -> df
library(stringi)
library(dplyr)
library(purrr)
extract_codes <- function(x) {
stri_match_all_regex(x, "[[:digit:]]+") %>% # extract the numbers
map(~paste0(as.vector(.), collapse=", ")) # paste them back together
}
mutate(df, `Rev Code`=extract_codes(Name))
关于r - 使用 R 从列中的字符串中提取数值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40046678/