我有以下正则表达式可以在任何空格或标点符号上拆分。如何从 :punct:
中排除 1 个或多个标点字符?假设我想排除撇号和逗号。我知道我可以明确使用 [all punctuation marks in here]
而不是 [[:punct:]]
但我希望有一种排除方法。
X <- "I'm not that good at regex yet, but am getting better!"
strsplit(X, "[[:space:]]|(?=[[:punct:]])", perl=TRUE)
[1] "I" "'" "m" "not" "that" "good" "at" "regex" "yet"
[10] "," "" "but" "am" "getting" "better" "!"
最佳答案
我不清楚你想要的结果是什么,但你可以使用负类 like this answer .
R> strsplit(X, "[[:space:]]|(?=[^,'[:^punct:]])", perl=TRUE)[[1]]
[1] "I'm" "not" "that" "good" "at" "regex" "yet,"
[8] "but" "am" "getting" "better" "!"
关于正则表达式;删除所有标点符号,除了,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13372438/