parsing - 分号插入 ala google go with flex

我有兴趣在我的 flex 文件中添加 Google Go 中的分号插入。

来自 Go 文档:

Semicolons

Like C, Go's formal grammar uses semicolons to terminate statements; unlike C, those semicolons do not appear in the source. Instead the lexer uses a simple rule to insert semicolons automatically as it scans, so the input text is mostly free of them.

The rule is this. If the last token before a newline is an identifier (which includes words like int and float64), a basic literal such as a number or string constant, or one of the tokens
break continue fallthrough return ++ -- ) }
the lexer always inserts a semicolon after the token. This could be summarized as, “if the newline comes after a token that could end a statement, insert a semicolon”.

A semicolon can also be omitted immediately before a closing brace, so a statement such as
go func() { for { dst <- <-src } }()
needs no semicolons. Idiomatic Go programs have semicolons only in places such as for loop clauses, to separate the initializer, condition, and continuation elements. They are also necessary to separate multiple statements on a line, should you write code that way.

One caveat. You should never put the opening brace of a control structure (if, for, switch, or select) on the next line. If you do, a semicolon will be inserted before the brace, which could cause unwanted effects. Write them like this
if i < f() {
    g()
}
not like this
if i < f()  // wrong! 
{           // wrong!
    g()     // wrong!
}           // wrong!

我将如何着手执行此操作(如何在流中插入标记，如何查看匹配的最后一个标记以查看它是否是个好主意，等等)？

我也在使用 bison，但 Go 似乎只使用他们的词法分析器来插入分号。

最佳答案

您可以通过在必要时插入分号的函数传递词法分析器结果标记。在检测到需要插入时，可以将下一个标记放回输入流，基本上在下一轮再次对它进行词法分析。

下面是一个在换行符之前插入分号的示例，当它跟在 WORD 之后时。野牛文件“insert.y”是这样的:

%{
#include <stdio.h>

void yyerror(const char *str) {
  printf("ERROR: %s\n", str);
}

int main() {
  yyparse();
  return 0;
}
%} 
%union {
  char *string;
}
%token <string> WORD
%token SEMICOLON NEWLINE
%%
input: 
     | input WORD          {printf("WORD: %s\n", $2); free($2);}
     | input SEMICOLON     {printf("SEMICOLON\n");}
     ;
%%

词法分析器由 flex 生成:

%{
#include <string.h>
#include "insert.tab.h"
int f(int token);
%}
%option noyywrap
%%
[ \t]          ;
[^ \t\n;]+     {yylval.string = strdup(yytext); return f(WORD);}
;              {return f(SEMICOLON);}
\n             {int token = f(NEWLINE); if (token != NEWLINE) return token;}
%%
int insert = 0;

int f(int token) {
  if (insert && token == NEWLINE) {
    unput('\n');
    insert = 0;
    return SEMICOLON;
  } else {
    insert = token == WORD;
    return token;
  }
}

用于输入

abc def
ghi
jkl;

它打印

WORD: abc
WORD: def
SEMICOLON
WORD: ghi
SEMICOLON
WORD: jkl
SEMICOLON

Unput 一个非常量标记需要一些额外的工作 - 我尽量让示例保持简单，只是为了给出想法。

关于parsing - 分号插入 ala google go with flex，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/10826744/

parsing - 分号插入 ala google go with flex

Semicolons

上一篇：go - 解码动态 XML

下一篇：python - Python 的 "is"运算符在 Go 中的等价物是什么？