我想解析 CSV(逗号分隔)文件中的一行,如下所示:
Bosh,Mark,mark@gmail.com,"3, Institute","83, 1, 2",1,21
我必须解析文件,而不是我想要的撇号之间的逗号 ';',就像这样:
Bosh,Mark,mark@gmail.com,"3; Institute","83; 1; 2",1,21
我使用了以下 Java 代码,但它无法很好地解析它:
Pattern regex = Pattern.compile("(\"[^\\]]*\")");
Matcher matcher = regex.matcher(line);
if (matcher.find()) {
String replacedMatch = matcher.group();
String gr1 = matcher.group(1);
gr1.trim();
replacedMatch = replacedMatch.replace(",", ";");
line = line.replace(matcher.group(), replacedMatch);
}
输出是:
Bosh,Mark,mark@gmail.com,"3; Institute";"83; 1; 2",1,21
有人知道如何解决这个问题吗?
最佳答案
这是我将引号内的 ,
替换为 ;
的解决方案。它假设,如果 "
出现在带引号的字符串中,那么它会被另一个 "
转义。此属性确保从开始计数到当前字符,如果引号 "
的数量是奇数,则该字符在带引号的字符串内。
// Test string, with the tricky case """", which resolves to
// a length 1 string of single quote "
String line = "Bosh,\"\"\"\",mark@gmail.com,\"3, Institute\",\"83, 1, 2\",1,21";
Pattern pattern = Pattern.compile("\"[^\"]*\"");
Matcher matcher = pattern.matcher(line);
int start = 0;
StringBuilder output = new StringBuilder();
while (matcher.find()) {
// System.out.println(m.group() + "\n " + m.start() + " " + m.end());
output
.append(line.substring(start, matcher.start())) // Append unrelated contents
.append(matcher.group().replaceAll(",", ";")); // Append replaced string
start = matcher.end();
}
output.append(line.substring(start)); // Append the rest of unrelated contents
// System.out.println(output);
尽管我找不到任何会像您在 line = line.replace(matcher.group(), replacedMatch);
中那样替换匹配组的方法失败的情况,但我觉得更安全从头开始重建字符串。
关于java - 正则表达式组成,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11259594/