假设我有以下文本:
txt <- as.character("this is just a test! i'm not sure if this is O.K. or if it will work? who knows. regex is sorta new to me.. There are certain cases that I may not figure out?? sad! ^_^")
我想将句子的第一个字母字符大写。
我想出了要匹配的正则表达式:^|[[:alnum:]]+[[:alnum:]]+[.!?]+[[:space:]]*[[ :space:]]+[[:alnum:]]
调用 gregexpr
返回:
> gregexpr("^|[[:alnum:]]+[[:alnum:]]+[.!?]+[[:space:]]*[[:space:]]+[[:alnum:]]", txt)
[[1]]
[1] 1 16 65 75 104 156
attr(,"match.length")
[1] 0 7 7 8 7 8
attr(,"useBytes")
[1] TRUE
哪些是匹配的正确子串索引。
但是,我该如何实现才能正确地将我需要的字符大写?我假设我必须 strsplit
然后...?
最佳答案
看来你的regex
对你的例子不起作用,所以我从this question 偷了一个.
txt <- as.character("this is just a test! i'm not sure if this is O.K. or if it will work? who knows. regex is sorta new to me.. There are certain cases that I may not figure out?? sad! ^_^")
print(txt)
gsub("([^.!?\\s])([^.!?]*(?:[.!?](?!['\"]?\\s|$)[^.!?]*)*[.!?]?['\"]?)(?=\\s|$)", "\\U\\1\\E\\2", txt, perl=T, useBytes = F)
关于r - 将句子的第一个单词大写(regex、gsub、gregexpr),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22976472/