正如我在 Antlr greedy-option 中所述我对可能在字符串文字中包含字符串文字的语言有一些问题,例如:
START: "img src="test.jpg""
Mr. Bart Kiers mentioned in my thread that it is not possible to create a grammar which could solve my problem. Therefore I decided to change the language to:
START: "img src='test.jpg'"
before starting the lexer (and parser).
File-input could be:
START: "aaa"aaa" "aaa"aaaaa" :END_START START: "aaa"aaa" "aaa"aa a aa" :END_START START: "aaab"bbaaaa" :END_START
So I have got a solution, but it is not correct. I have two questions regarding to my problem (below the code). My code would be:
public static void main(String[] args) {
try{
FileInputStream fis = new FileInputStream("src/file.txt");
String preparedCode = preparingCode(fis);
ANTLRStringStream in = new ANTLRStringStream(preparedCode);
TestLexer lex = new TestLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lex);
TestParser parser = new TestParser(tokens);
parser.rule();
}catch(IOException ex){
ex.printStackTrace();
} catch (RecognitionException e) {
System.out.println(e.getMessage());
System.exit(0);
}
}
static String preparingCode(FileInputStream input){
DataInputStream data = new DataInputStream(input);
StringBuilder oldCode = new StringBuilder();
StringBuffer newCode = new StringBuffer(oldCode.length());
Pattern pattern = Pattern.compile("(START:\\s\")(.+)(\"\\n:END_START)");
String strLine;
try{
while ((strLine = data.readLine()) != null)
oldCode.append(strLine + "\n");
}
catch(IOException ex){
ex.printStackTrace();
}
Matcher matcher = pattern.matcher(oldCode);
while (matcher.find()) {
//eliminate quotes inside a string literal
String stringLiteral = matcher.group(2).replaceAll("\"", "'");
String replace = matcher.group(1) + stringLiteral + matcher.group(3);
matcher.appendReplacement(newCode, Matcher.quoteReplacement(replace));
}
matcher.appendTail(newCode);
System.out.println(newCode);
return newCode.toString();
}
我的问题是:
哪种模式是正确的?重要的是字符串文字可以在多行上定义,例如“aaaa”\n“bbb”,但总是以“\n:END_START”行结束。我的愿望是以下结果:
START: "aaa'aaa' 'aaa'aaaaa" :END_START START: "aaa'aaa' 'aa'aa a aa" :END_START START: "aaab'bbaaaa" :END_START
I played around with the pattern flag Pattern.DOTALL
Pattern pattern = Pattern.compile("(START:\s\")(.+)(\"\n:END_START)", Pattern.DOTALL);
但这不是解决方案,因为在这种情况下它匹配所有内容...
- 如果我使用正确的模式,还有其他有效的方法来修复它吗?
修复第一个问题
我必须使用带有模式标志 Pattern.DOTALL 的非贪婪方法:
Pattern pattern = Pattern.compile("(START:\\s\")(.+?)(\"\\n:END_START)", Pattern.DOTALL);
最佳答案
修复第一个问题
我必须使用带有模式标志 Pattern.DOTALL 的非贪婪方法:
Pattern pattern = Pattern.compile("(START:\\s\")(.+?)(\"\\n:END_START)", Pattern.DOTALL);
代码:
public static void main(String[] args) {
try{
FileInputStream fis = new FileInputStream("src/file.txt");
String preparedCode = preparingCode(fis);
ANTLRStringStream in = new ANTLRStringStream(preparedCode);
TestLexer lex = new TestLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lex);
TestParser parser = new TestParser(tokens);
parser.rule();
}catch(IOException ex){
ex.printStackTrace();
} catch (RecognitionException e) {
System.out.println(e.getMessage());
System.exit(0);
}
}
static String preparingCode(FileInputStream input){
DataInputStream data = new DataInputStream(input);
StringBuilder oldCode = new StringBuilder();
StringBuffer newCode = new StringBuffer(oldCode.length());
Pattern pattern = Pattern.compile("(START:\\s\")(.+?)(\"\\n:END_START)", Pattern.DOTALL);
String strLine;
try{
while ((strLine = data.readLine()) != null)
oldCode.append(strLine + "\n");
}
catch(IOException ex){
ex.printStackTrace();
}
Matcher matcher = pattern.matcher(oldCode);
while (matcher.find()) {
System.out.println("++++"+matcher.group(2));
//eliminate quotes inside a string literal
String stringLiteral = matcher.group(2).replaceAll("\"", "'");
String replace = matcher.group(1) + stringLiteral + matcher.group(3);
matcher.appendReplacement(newCode, Matcher.quoteReplacement(replace));
}
matcher.appendTail(newCode);
System.out.println(newCode);
return newCode.toString();
}
那么还有其他方法可以解决这个问题吗?
关于java - 高效地替换 ANTLRInputStream (ANTLRStringStream) 文件输入中的字符串或字符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/10013170/