我想分割以下字符串:
String line ="DOB,1234567890,11,07/05/12,\"first,last\",100,\"is,a,good,boy\"";
转换为以下标记:
DOB
1234567890
11
07/05/12
first,last
100
is,a,good,boy
我尝试使用以下正则表达式:
import java.util.*;
import java.lang.*;
import java.util.regex.*;
import org.apache.commons.lang.StringUtils;
class SplitString{
public static final String quotes = "\".[[((a-z)|(A-Z))]+( ((a-z)|(A-Z)).,)*.((a-z)|(A-Z))].\"" ;
public static final String ISSUE_UPLOAD_FILE_PATTERN = "((a-z)|(A-Z))+ [(((a-z)|(A-Z)).,)* + ("+quotes+".,) ].((a-z)|(A-Z)) + ("+quotes+")";
public static void main(String[] args){
String line ="DOB,1234567890,11,07/05/12,\"first,last\",100,\"is,a,good,boy\"";
String delimiter = ",";
Pattern p = Pattern.compile(ISSUE_UPLOAD_FILE_PATTERN);
Pattern pattern = Pattern.compile(ISSUE_UPLOAD_FILE_PATTERN);
String[] output = pattern.split(line);
System.out.println(" pattern: "+pattern);
for(String a:output){
System.out.println(" output: "+a);
}
}
}
我在正则表达式中遗漏了什么吗?
最佳答案
这是代码的更新版本,可为您提供预期的输出:
public static final String ISSUE_UPLOAD_FILE_PATTERN = "(?<=(^|,))(([^\",]+)|\"([^\"]*)\")(?=($|,))";
public static void main(String[] args) {
String line = "DOB,1234567890,11,07/05/12,\"first,last\",100,\"is,a,good,boy\"";
Matcher matcher = Pattern.compile(ISSUE_UPLOAD_FILE_PATTERN).matcher(line);
while (matcher.find()) {
if (matcher.group(3) != null) {
System.out.println(matcher.group(3));
} else {
System.out.println(matcher.group(4));
}
}
}
正则表达式的工作原理如下:
(?<=(^|,))
:检查匹配之前的字符是否是字符串的开头或 ,
(([^\",]+)|\"([^\"]*)\")
:匹配 "<any number of (not")>"
或 any number of (not" or ,)
(?=($|,))
:检查匹配后的字符是否为字符串结尾或 ,
结果将是第 3 组或第 4 组,具体取决于匹配的部分。
关于java正则表达式分割模式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11409287/