parsing - Antlr 解析器运算符优先级

标签 parsing antlr grammar antlr4

考虑以下语法。我对运算符优先级有疑问,例如:res=2*a+b 具有与 res=2*(a+b) 类似的解析树。我知道问题出在哪里,但我没有想到没有相互左递归的“漂亮”解决方案。你能帮我一点忙吗?该语法与自定义访问者一起使用。

grammar Math;

expression: expression add=('+'|'-') expression # expressionAddExpression
            | expression mult='*' expression    # expressionMultExpression
            |'(' expression ')'  # bracketExpression
            | number                            # numberExpression
            ;
    number: INT                                                                 #int
            | '(' number ')'                                                    #bracketNumber
            | VARIABLE                                                          #var

            ;
    VARIABLE: [A-Za-z][A-Za-z0-9]*;



INT: [0-9]+;

最佳答案

来自The Definitive ANTLR 4 Reference , 5.4 处理优先级、左递归和关联性:

expr : expr '*' expr // match subexpressions joined with '*' operator
     | expr '+' expr // match subexpressions joined with '+' operator
     | INT // matches simple integer atom
     ;

The problem is that this rule is ambiguous for some input phrases. ...

This is a question of operator precedence, and conventional grammars simply have no way to specify precedence. Most grammar tools, such as Bison, use extra notation to specify the operator precedence.

Instead, ANTLR resolves ambiguities in favor of the alternative given first, implicitly allowing us to specify operator precedence.

所以只需将乘法放在加法之前即可。

文件Question.g4:

grammar Question;

question
@init {System.out.println("Question last update 1213");}
    :   line+ EOF
    ;

line
    :   expression NL
        {System.out.println("Expression found : " + $expression.text); }
    ;

expression
    :   expression mult='*' expression          # expressionMultExpression
    |   expression add=( '+' | '-' ) expression # expressionAddExpression
    |   VARIABLE '=' expression                 # expressionAssign
    |   '(' expression ')'                      # parenthesisedExpression
    |   atom                                    # atomExpression
    ;

atom
    :   INT                                     #int
    |   VARIABLE                                #var
    ;

VARIABLE : LETTER ( LETTER | DIGIT )*;
INT      : DIGIT+;

NL      : [\r\n] ;
WS      : [ \t] -> channel(HIDDEN) ; // -> skip ;

fragment LETTER : [a-zA-Z] ;
fragment DIGIT  : [0-9] ;

文件input.txt:

res = 2 * a + b
res = 2 * ( a + b )

执行:

$ grun Question question -tokens -diagnostics input.txt 
[@0,0:2='res',<VARIABLE>,1:0]
[@1,3:3=' ',<WS>,channel=1,1:3]
[@2,4:4='=',<'='>,1:4]
[@3,5:5=' ',<WS>,channel=1,1:5]
[@4,6:6='2',<INT>,1:6]
[@5,7:7=' ',<WS>,channel=1,1:7]
[@6,8:8='*',<'*'>,1:8]
[@7,9:9=' ',<WS>,channel=1,1:9]
[@8,10:10='a',<VARIABLE>,1:10]
...
[@32,36:35='<EOF>',<EOF>,3:0]
Question last update 1213
Expression found : res = 2 * a + b
Expression found : res = 2 * ( a + b )

$ grun Question question -gui input.txt

enter image description here

关于parsing - Antlr 解析器运算符优先级,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47031635/

相关文章:

swift - 计算解析方程中的复数

C# 动态方法 - IL 与表达式树

c - ANTLR:意想不到的角色

c++ - int a[] = {1,2,};为什么允许在初始化列表中使用尾随逗号?

parsing - 如何编写可定制的语法?

parsing - 在 Haskell 中派生 Read(和 Show)时避免反斜杠编码 utf8 字符

c++ - Boost::Spirit 在尝试解析带引号的字符串文字时无法编译

syntax - BNF vs EBNF vs ABNF:选择哪个?

c# - 我怎样才能将其重构为更易于管理的代码?

c# - 如何为单独的 ANTLR 词法分析器和解析器添加虚构标记?