我正在尝试从已解析的消息构造一个对象。 我正在使用 Antlr4 和 C++ 我的问题是我需要在词法分析/解析过程中跳过空格,但是当我在 Listener 中构造消息对象时我必须取回它们。 这是我的语法
grammar MessageTest;
WS: ('\t' | ' ' | '\r' | '\n' )+ -> skip;
message:
messageInfo
startOfMessage
messageText+
| EOF;
messageInfo:
senderName
filingTime
receiverName
;
senderName: WORD;
filingTime: DIGITS;
receiverName: WORD;
messageText: ( WORD | DIGITS | ALLOWED_SYMBOLS)+;
startOfMessage: START_OF_MESSAGE_SYMBOL ;
START_OF_MESSAGE_SYMBOL:':';
WORD: LETTER+;
DIGITS: DIGIT+;
LPAREN: '(';
RPAREN: ')';
ALLOWED_SYMBOLS: '-'| '.' | ',' | '/' | '+' | '?';
fragment LETTER: [A-Z];
fragment DIGIT: [0-9];
所以这个语法运行良好,我的解析树对于以下消息示例是正确的:JOHN0120JANE:HI HOW ARE YOU?
我得到了这个解析树:
message (
messageInfo (
senderName (
"JOHN"
)
filingTime (
"0120"
)
receiverName (
"JANE"
)
)
startOfMessage (
":"
)
messageText (
"HI"
"HOW"
"ARE"
"YOU"
"?"
)
)
问题是当我试图检索整个 messageText
时:
HI HOW ARE YOU?
我改为从 MessageTextContext
HIHOWAREYOU?
我做错了什么?
最佳答案
getText()
检索函数从不考虑跳过或隐藏的标记。但是通过使用存储在生成的标记中的索引,很容易获得输入的原始文本(即使只是对应于特定解析规则的范围)。解析规则上下文包含一个开始节点和一个结束节点,因此很容易从上下文转到原始输入,如下所示:
std::string MySQLRecognizerCommon::sourceTextForContext(ParserRuleContext *ctx, bool keepQuotes) {
return sourceTextForRange(ctx->start, ctx->stop, keepQuotes);
}
//----------------------------------------------------------------------------------------------------------------------
std::string MySQLRecognizerCommon::sourceTextForRange(tree::ParseTree *start, tree::ParseTree *stop, bool keepQuotes) {
Token *startToken = antlrcpp::is<tree::TerminalNode *>(start) ? dynamic_cast<tree::TerminalNode *>(start)->getSymbol()
: dynamic_cast<ParserRuleContext *>(start)->start;
Token *stopToken = antlrcpp::is<tree::TerminalNode *>(stop) ? dynamic_cast<tree::TerminalNode *>(start)->getSymbol()
: dynamic_cast<ParserRuleContext *>(stop)->stop;
return sourceTextForRange(startToken, stopToken, keepQuotes);
}
//----------------------------------------------------------------------------------------------------------------------
std::string MySQLRecognizerCommon::sourceTextForRange(Token *start, Token *stop, bool keepQuotes) {
CharStream *cs = start->getTokenSource()->getInputStream();
size_t stopIndex = stop != nullptr ? stop->getStopIndex() : std::numeric_limits<size_t>::max();
std::string result = cs->getText(misc::Interval(start->getStartIndex(), stopIndex));
if (keepQuotes || result.size() < 2)
return result;
char quoteChar = result[0];
if ((quoteChar == '"' || quoteChar == '`' || quoteChar == '\'') && quoteChar == result.back()) {
if (quoteChar == '"' || quoteChar == '\'') {
// Replace any double occurence of the quote char by a single one.
replaceStringInplace(result, std::string(2, quoteChar), std::string(1, quoteChar));
}
return result.substr(1, result.size() - 2);
}
return result;
}
此代码专为与 MySQL 一起使用而定制(例如,wrt. 引号字符),但很容易适应任何其他用例。关键部分是使用标记(例如从解析规则上下文中获取)并从字符输入流中获取原始输入。
代码取自 MySQL Workbench code base .
关于c++ - 从监听器中检索 antlr4 解析器中跳过的空格,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57044724/