我在 TutorialsPoint 上查看一段代码,从那以后有些事情一直困扰着我……看看这段代码:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches
{
public static void main( String args[] ){
// String to be scanned to find the pattern.
String line = "This order was placed for QT3000! OK?";
String pattern = "(.*)(\\d+)(.*)";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
while(m.find( )) {
System.out.println("Found value: " + m.group(1));
System.out.println("Found value: " + m.group(2));
System.out.println("Found value: " + m.group(3));
}
}
}
此代码成功打印:
Found value: This was placed for QT300
Found value: 0
Found value: ! OK?
但是根据正则表达式 "(.*)(\\d+)(.*)"
,为什么它不返回其他可能的结果,例如:
Found value: This was placed for QT30
Found value: 00
Found value: ! OK?
或
Found value: This was placed for QT
Found value: 3000
Found value: ! OK?
如果此代码不适合这样做,那么我如何编写一个可以找到所有可能匹配项的代码?
最佳答案
这是因为 greediness的 *
然后是backtracking .
字符串:
This order was placed for QT3000! OK?
正则表达式:
(.*)(\\d+)(.*)
我们都知道.*
是贪心的,尽可能匹配所有字符。所以第一个.*
匹配所有字符直到最后一个字符 ?
然后它回溯以提供匹配。我们正则表达式中的下一个模式是 \d+
,所以它回溯到一个数字。一旦它找到一个数字,\d+
匹配该数字,因为此处满足条件( \d+
匹配一个或多个数字)。现在第一个(.*)
捕获 This order was placed for QT300
和以下 (\\d+)
捕获数字 0
位于 !
之前符号。
现在下一个模式(.*)
捕获所有剩余字符 !<space>OK?
. m.group(1)
指的是存在于组索引 1 和 m.group(2)
中的字符指的是索引 2,就这样继续下去。
查看演示 here .
得到你想要的输出。
String line = "This order was placed for QT3000! OK?";
String pattern = "(.*)(\\d{2})(.*)";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
while(m.find( )) {
System.out.println("Found value: " + m.group(1));
System.out.println("Found value: " + m.group(2));
System.out.println("Found value: " + m.group(3));
}
输出:
Found value: This order was placed for QT30
Found value: 00
Found value: ! OK?
(.*)(\\d{2})
, 回溯最多两位数以提供匹配。
把你的模式改成这个,
String pattern = "(.*?)(\\d+)(.*)";
要得到这样的输出,
Found value: This order was placed for QT
Found value: 3000
Found value: ! OK?
?
在*
之后强制 *
进行非贪婪匹配。
使用额外的捕获组从单个程序中获取输出。
String line = "This order was placed for QT3000! OK?";
String pattern = "((.*?)(\\d{2}))(?:(\\d{2})(.*))";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(line);
while(m.find( )) {
System.out.println("Found value: " + m.group(1));
System.out.println("Found value: " + m.group(4));
System.out.println("Found value: " + m.group(5));
System.out.println("Found value: " + m.group(2));
System.out.println("Found value: " + m.group(3) + m.group(4));
System.out.println("Found value: " + m.group(5));
}
输出:
Found value: This order was placed for QT30
Found value: 00
Found value: ! OK?
Found value: This order was placed for QT
Found value: 3000
Found value: ! OK?
关于Java Regular Expression Matcher 没有找到所有可能的匹配项,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28038364/