我试图在字符串中递归地捕获多个组,同时还使用对正则表达式中的组的反向引用。即使我使用模式和匹配器以及“while(matcher.find())”循环,它仍然只捕获最后一个实例而不是所有实例。就我而言,唯一可能的标签是
- 标签之外的任何文本(以便我可以将其格式化为“正常”文本,并且我将通过捕获一个组中标签之前的任何文本,同时捕获另一组中的标签本身来实现此目的,并且作为我遍历所有出现的事件,删除从原始字符串中捕获的所有内容;如果最后留下任何文本,我会将其格式化为“正常”文本)
- 标签的“名称”,以便我知道我将如何拥有 设置标签内文本的格式
- 标签的文本内容将根据标签名称及其关联规则进行格式化
这是我的示例代码:
String currentText = "the man said:<pof>“This one, at last, is bone of my bones</pof><poi>and flesh of my flesh;</poi><po>This one shall be called ‘woman,’</po><poil>for out of man this one has been taken.”</poil>";
String remainingText = currentText;
//first check if our string even has any kind of xml tag, because if not we will just format the whole string as "normal" text
if(currentText.matches("(?su).*<[/]{0,1}(?:sm|po)[f|l|s|i|3]{0,1}[f|l]{0,1}>.*"))
{
//an opening or closing tag has been found, so let us start our pattern captures
//I am using a backreference \\2 to make sure the closing tag is the same as the opening tag
Pattern pattern1 = Pattern.compile("(.*)<((sm|po)[f|l|s|i|3]{0,1}[f|l]{0,1})>(.*?)</\\2>",Pattern.UNICODE_CHARACTER_CLASS);
Matcher matcher1 = pattern1.matcher(currentText);
int iteration = 0;
while(matcher1.find()){
System.out.print("Iteration ");
System.out.println(++iteration);
System.out.println("group1:"+matcher1.group(1));
System.out.println("group2:"+matcher1.group(2));
System.out.println("group3:"+matcher1.group(3));
System.out.println("group4:"+matcher1.group(4));
if(matcher1.group(1) != null && matcher1.group(1).isEmpty() == false)
{
m_xText.insertString(xTextRange, matcher1.group(1), false);
remainingText = remainingText.replaceFirst(matcher1.group(1), "");
}
if(matcher1.group(4) != null && matcher1.group(4).isEmpty() == false)
{
switch (matcher1.group(2)) {
case "pof": [...]
case "pos": [...]
case "poif": [...]
case "po": [...]
case "poi": [...]
case "pol": [...]
case "poil": [...]
case "sm": [...]
}
remainingText = remainingText.replaceFirst("<"+matcher1.group(2)+">"+matcher1.group(4)+"</"+matcher1.group(2)+">", "");
}
}
System.out.println 仅在我的控制台中输出一次,结果如下:
Iteration 1:
group1:the man said:<pof>“This one, at last, is bone of my bones</pof><poi>and flesh of my flesh;</poi><po>This one shall be called ‘woman,’</po>;
group2:poil
group3:po
group4:for out of man this one has been taken.”
组 3 将被忽略,唯一有用的组是 1、2 和 4(组 3 是组 2 的一部分)。为什么这只捕获最后一个标签实例“poil”,而不捕获前面的“pof”、“poi”和“po”标签?
我希望看到的输出是这样的:
Iteration 1:
group1:the man said:
group2:pof
group3:po
group4:“This one, at last, is bone of my bones
Iteration 2:
group1:
group2:poi
group3:po
group4:and flesh of my flesh;
Iteration 3:
group1:
group2:po
group3:po
group4:This one shall be called ‘woman,’
Iteration 3:
group1:
group2:poil
group3:po
group4:for out of man this one has been taken.”
最佳答案
我刚刚找到了这个问题的答案,它只需要在第一个捕获中使用非贪婪量词,就像我在第四个捕获组中一样。这完全按照需要工作:
Pattern pattern1 = Pattern.compile("(.*?)<((sm|po)[f|l|s|i|3]{0,1}[f|l]{0,1})>(.*?)</\\2>",Pattern.UNICODE_CHARACTER_CLASS);
关于java - JAVA中使用反向引用的递归组捕获正则表达式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32042005/