parsing - 分号插入 ala google go with flex

标签 parsing go lex flex-lexer lexer

我有兴趣在我的 flex 文件中添加 Google Go 中的分号插入。

来自 Go 文档:

Semicolons

Like C, Go's formal grammar uses semicolons to terminate statements; unlike C, those semicolons do not appear in the source. Instead the lexer uses a simple rule to insert semicolons automatically as it scans, so the input text is mostly free of them.

The rule is this. If the last token before a newline is an identifier (which includes words like int and float64), a basic literal such as a number or string constant, or one of the tokens

break continue fallthrough return ++ -- ) }

the lexer always inserts a semicolon after the token. This could be summarized as, “if the newline comes after a token that could end a statement, insert a semicolon”.

A semicolon can also be omitted immediately before a closing brace, so a statement such as

go func() { for { dst <- <-src } }()

needs no semicolons. Idiomatic Go programs have semicolons only in places such as for loop clauses, to separate the initializer, condition, and continuation elements. They are also necessary to separate multiple statements on a line, should you write code that way.

One caveat. You should never put the opening brace of a control structure (if, for, switch, or select) on the next line. If you do, a semicolon will be inserted before the brace, which could cause unwanted effects. Write them like this

if i < f() {
    g()
}

not like this

if i < f()  // wrong! 
{           // wrong!
    g()     // wrong!
}           // wrong!

我将如何着手执行此操作(如何在流中插入标记,如何查看匹配的最后一个标记以查看它是否是个好主意,等等)?

我也在使用 bison,但 Go 似乎只使用他们的词法分析器来插入分号。

最佳答案

您可以通过在必要时插入分号的函数传递词法分析器结果标记。在检测到需要插入时,可以将下一个标记放回输入流,基本上在下一轮再次对它进行词法分析。

下面是一个在换行符之前插入分号的示例,当它跟在 WORD 之后时。野牛文件“insert.y”是这样的:

%{
#include <stdio.h>

void yyerror(const char *str) {
  printf("ERROR: %s\n", str);
}

int main() {
  yyparse();
  return 0;
}
%} 
%union {
  char *string;
}
%token <string> WORD
%token SEMICOLON NEWLINE
%%
input: 
     | input WORD          {printf("WORD: %s\n", $2); free($2);}
     | input SEMICOLON     {printf("SEMICOLON\n");}
     ;
%%

词法分析器由 flex 生成:

%{
#include <string.h>
#include "insert.tab.h"
int f(int token);
%}
%option noyywrap
%%
[ \t]          ;
[^ \t\n;]+     {yylval.string = strdup(yytext); return f(WORD);}
;              {return f(SEMICOLON);}
\n             {int token = f(NEWLINE); if (token != NEWLINE) return token;}
%%
int insert = 0;

int f(int token) {
  if (insert && token == NEWLINE) {
    unput('\n');
    insert = 0;
    return SEMICOLON;
  } else {
    insert = token == WORD;
    return token;
  }
}

用于输入

abc def
ghi
jkl;

它打印

WORD: abc
WORD: def
SEMICOLON
WORD: ghi
SEMICOLON
WORD: jkl
SEMICOLON

Unput 一个非常量标记需要一些额外的工作 - 我尽量让示例保持简单,只是为了给出想法。

关于parsing - 分号插入 ala google go with flex,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/10826744/

相关文章:

c# - 带有千位分隔符和强制小数位的 String.Format decimal

parsing - 如何选择 <dei :DocumentType from XBRL using Cheerio (or

go - 将同一个关键字分发给多个 goroutine

concurrency - goroutine 是如何工作的?

php - 词法分析器的命名约定是什么?

java - 用java解析阿拉伯文/中文/日文RSS提要

jquery - 如何使用没有外部名称的 .each 解析 json 数组?

go - 仅通过 golang 中的第一个元素拆分字符串

c++ - 如何从 C 风格转向 C++ 风格的 flex 解析器

lex - 使用 Lex/Yacc 识别汉字中的标识符