我有一些氨基酸修饰,例如:
example <- c('_(Acetyl (Protein N-term))DDDIAAM(Oxidation (M))CK_')
我想将这样的序列拆分成类似于以下的状态:
example2 <- c('_','(Acetyl (Protein N-term))','D','D','D','I','A','A','M','(Oxidation (M))','C','K','_')
但是我不知道如何拆分这样的字符串,同时将内容保留在括号内,是否有任何函数或代码可以帮助我做到这一点?
谢谢, 李李
最佳答案
更新
借鉴@benson23的想法通过插入特殊字符,例如 @
,我们可以使用 strsplit
尝试下面的代码+ 嵌套 (g)sub
的
unlist(
lapply(
unlist(
strsplit(
sub(
"(.*)\\)", "\\1)@",
sub(
"\\(", "@(",
gsub("(\\))([^()]+)(\\()", "\\1@\\2@\\3", example)
)
), "@"
)
),
function(s) {
if (startsWith(s, "(")) {
s
} else {
strsplit(s, "")
}
}
)
)
这是一个庞大的实现,用于查找配对的括号并进行拆分
# split string by characters
v <- unlist(strsplit(example, ""))
# positions of "(" and ")"
a <- which(v == "(")
b <- which(v == ")")
# split as per the position of ")"
lst1 <- split(v, cumsum(replace(rep(0, length(v)), 1 + by(b, findInterval(b, a), max), 1)))
# split as per the position of "("
lst2 <- unlist(lapply(lst1, function(x) split(x, cumsum(x == "(") > 0)), recursive = FALSE)
# output
res <- unlist(
lapply(
lst2,
function(s) {
if (s[1] == "(") {
paste0(s, collapse = "")
} else {
s
}
}
),
use.names = FALSE
)
测试
让我们尝试一个有点棘手的例子 example <- c("_(Acetyl (Protein (N-term)) XXX) DDDIAAM(Oxidation (M))CK_")
,我们会看到res
作为
[1] "_" "(Acetyl (Protein (N-term)) XXX)"
[3] " " "D"
[5] "D" "D"
[7] "I" "A"
[9] "A" "M"
[11] "(Oxidation (M))" "C"
[13] "K"
关于r - 如何在R中分割字符同时保留括号内的内容?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73231526/