vecA <- c("Population 1222",
"Population 90over",
"population under78",
"population 99101",
"Population 1254",
"Population 78 92")
问题
我想到达对应于以下内容的 vecB
:
vecB <- c("Population 12 - 22",
"Population 90 over",
"population under 78",
"population 99 - 101",
"Population 12 - 54",
"Population 78 - 92")
主要特征
vecB
具有以下特征:
- 在前两位数字后插入空格、破折号和空格 (
-
) - 如果存在空格,则仅插入破折号 (
-
) - 对于
underDigitDigit
之类的组合,仅插入空格:underDigitDigit
尝试
我正在考虑利用 gsub 中的组,在线:
gsub("^([[:alpha:]]*[[:blank:]])(\\d{2})(.*)$", "\\2", vecA)
但这并不适用于所有情况:
> t(t(gsub("^([[:alpha:]]*[[:blank:]])(\\d{2})(.*)$", "\\2", vecA)))
[,1]
[1,] "12"
[2,] "90"
[3,] "population under78"
[4,] "99"
[5,] "12"
[6,] "78"
t()
仅用于演示目的; regex101 link .
最佳答案
这是我的建议 - 分两步进行:1)首先在数字之间添加连字符,然后 2)在单词“over”/“under”和数字之间添加空格:
vecA <- c("Population 1222",
"Population 90over",
"population under78",
"population 99101",
"Population 1254",
"Population 78 92")
v <- gsub("^([[:alpha:]]+[[:blank:]]+)([[:digit:]]{2})\\s*([[:digit:]])", "\\1\\2 - \\3", vecA)
gsub("^([[:alpha:]]+[[:blank:]]+)(?|(over|under)(\\d+)|(\\d+)(over|under))", "\\1\\2 \\3", v, perl=T)
code demo 的输出:
[1] "Population 12 - 22" "Population 90 over" "population under 78"
[4] "population 99 - 101" "Population 12 - 54" "Population 78 - 92"
第二个正则表达式包含分支重置模式 (?|...|...)
以在替代子模式中保留相同的组 ID,因此需要 perl=T
。
关于regex - 根据特定元素的位置在字符串向量中插入连字符或破折号,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35178857/