java - 高效地替换 ANTLRInputStream (ANTLRStringStream) 文件输入中的字符串或字符

标签 java pattern-matching antlr3

正如我在 Antlr greedy-option 中所述我对可能在字符串文字中包含字符串文字的语言有一些问题,例如:

START: "img src="test.jpg""

Mr. Bart Kiers mentioned in my thread that it is not possible to create a grammar which could solve my problem. Therefore I decided to change the language to:

START: "img src='test.jpg'"

before starting the lexer (and parser).

File-input could be:

START: "aaa"aaa"
 "aaa"aaaaa"
:END_START

START: "aaa"aaa"
 "aaa"aa
 a
 aa"
:END_START

START: "aaab"bbaaaa"
:END_START

So I have got a solution, but it is not correct. I have two questions regarding to my problem (below the code). My code would be:

public static void main(String[] args) {

    try{
        FileInputStream fis = new FileInputStream("src/file.txt");
        String preparedCode = preparingCode(fis);

        ANTLRStringStream in = new ANTLRStringStream(preparedCode);

        TestLexer lex = new TestLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lex);
        TestParser parser = new TestParser(tokens);

        parser.rule();
    }catch(IOException ex){
        ex.printStackTrace();
    } catch (RecognitionException e) {
        System.out.println(e.getMessage());
        System.exit(0);
    }
}

static String preparingCode(FileInputStream input){
    DataInputStream data = new DataInputStream(input);
    StringBuilder oldCode = new StringBuilder();
    StringBuffer newCode = new StringBuffer(oldCode.length());

    Pattern pattern = Pattern.compile("(START:\\s\")(.+)(\"\\n:END_START)");
    String strLine;
    try{
      while ((strLine = data.readLine()) != null)   
          oldCode.append(strLine + "\n");
    }
    catch(IOException ex){
      ex.printStackTrace();
    }

    Matcher matcher = pattern.matcher(oldCode);

    while (matcher.find()) {
      //eliminate quotes inside a string literal
      String stringLiteral = matcher.group(2).replaceAll("\"", "'");

      String replace = matcher.group(1) + stringLiteral + matcher.group(3);
      matcher.appendReplacement(newCode, Matcher.quoteReplacement(replace));
    }
    matcher.appendTail(newCode);

    System.out.println(newCode);

    return newCode.toString();
}


我的问题是:

  • 哪种模式是正确的?重要的是字符串文字可以在多行上定义,例如“aaaa”\n“bbb”,但总是以“\n:END_START”行结束。我的愿望是以下结果:
START: "aaa'aaa'
 'aaa'aaaaa"
:END_START

START: "aaa'aaa'
 'aa'aa
 a
 aa"
:END_START

START: "aaab'bbaaaa"
:END_START

I played around with the pattern flag Pattern.DOTALL

Pattern pattern = Pattern.compile("(START:\s\")(.+)(\"\n:END_START)", Pattern.DOTALL);
但这不是解决方案,因为在这种情况下它匹配所有内容...




- 如果我使用正确的模式,还有其他有效的方法来修复它吗?



修复第一个问题
我必须使用带有模式标志 Pattern.DOTALL 的非贪婪方法:

Pattern pattern = Pattern.compile("(START:\\s\")(.+?)(\"\\n:END_START)", Pattern.DOTALL);

最佳答案

修复第一个问题
我必须使用带有模式标志 Pattern.DOTALL 的非贪婪方法:

Pattern pattern = Pattern.compile("(START:\\s\")(.+?)(\"\\n:END_START)", Pattern.DOTALL);

代码:

 public static void main(String[] args) {

    try{
        FileInputStream fis = new FileInputStream("src/file.txt");
        String preparedCode = preparingCode(fis);

        ANTLRStringStream in = new ANTLRStringStream(preparedCode);

        TestLexer lex = new TestLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lex);
        TestParser parser = new TestParser(tokens);

        parser.rule();
    }catch(IOException ex){
        ex.printStackTrace();
    } catch (RecognitionException e) {
        System.out.println(e.getMessage());
        System.exit(0);
    }
}

static String preparingCode(FileInputStream input){
    DataInputStream data = new DataInputStream(input);
    StringBuilder oldCode = new StringBuilder();
    StringBuffer newCode = new StringBuffer(oldCode.length());

    Pattern pattern = Pattern.compile("(START:\\s\")(.+?)(\"\\n:END_START)", Pattern.DOTALL);
    String strLine;
    try{
      while ((strLine = data.readLine()) != null)   
          oldCode.append(strLine + "\n");
    }
    catch(IOException ex){
      ex.printStackTrace();
    }

    Matcher matcher = pattern.matcher(oldCode);

    while (matcher.find()) {
        System.out.println("++++"+matcher.group(2));
      //eliminate quotes inside a string literal
      String stringLiteral = matcher.group(2).replaceAll("\"", "'");

      String replace = matcher.group(1) + stringLiteral + matcher.group(3);
      matcher.appendReplacement(newCode, Matcher.quoteReplacement(replace));
    }
    matcher.appendTail(newCode);

    System.out.println(newCode);

    return newCode.toString();
}

那么还有其他方法可以解决这个问题吗?

关于java - 高效地替换 ANTLRInputStream (ANTLRStringStream) 文件输入中的字符串或字符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/10013170/

相关文章:

java - 从文本文件用 Java 创建菜单语句

java - 将圆形对象添加到数组列表并显示在屏幕上

scala - 如何在模式匹配中提取序列的余数

function - 用于定义/调用多参数函数的 ANTLR 语法

ANTLR 的 AST 树语法 + 列表

java - 拥有大量的 childEventListener 是一个好习惯吗?

java - 检索 MongoDB 数据并将其存储在列表中

variables - Makefile 'match' 特殊变量的名称是什么?

python - 关键词多词时高效搜索关键词

java - 使用antlr3的简单标准表达式解析器