我想使用[,.!?;~]
分割字符串,但我想保留 [,.!?;~]
到它的位置例如:
This is the example, but it is not enough
至
[This is the example,, but it is not enough] // length=2
[0]=This is the example,
[1]=but it is not enough
正如您所看到的,逗号仍然在原来的位置。我用这个正则表达式 (?<=([,.!?;~])+)
做到了这一点。 但是 我想在 [,.!?;~]
之后是否出现一些特殊单词(例如:但是) ,然后不要分割字符串的该部分。例如:
I want this sentence to be split into this form, but how to do. So if anyone can help, that will be great
至
[0]=I want this sentence to be split into this form, but how to do.
[1]=So if anyone can help,
[2]=that will be great
正如你所看到的,这部分(形式,但是)没有被分割成第一句话。
最佳答案
我用过:
- 正向回顾
(?<=a)b
保留分隔符。 - 负前瞻
a(?!b)
排除停用词。
请注意我如何附加正则表达式 (?!\\s*(but|and|if))
在您提供的正则表达式之后。您可以将所有需要排除的停用词(例如,but、and、if)放在括号内,并用 pipe symbol
分隔。 .
另请注意,分隔符仍然在原来的位置。
输出
Count of tokens = 3
I want this sentence to be split into this form, but how to do.
So if anyone can help,
that will be great
代码
import java.lang.*;
public class HelloWorld {
public static void main(String[] args) {
String str = "I want this sentence to be split into this form, but how to do. So if anyone can help, that will be great";
//String delimiters = "\\s+|,\\s*|\\.\\s*";
String delimiters = "(?<=,)";
// analyzing the string
String[] tokensVal = str.split("(?<=([,.!?;~])+)(?!\\s*(but|and|if))");
// prints the number of tokens
System.out.println("Count of tokens = " + tokensVal.length);
for (String token: tokensVal) {
System.out.println(token);
}
}
}
关于java - 如何编写正则表达式来分割这种格式的字符串?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39030707/