我正在尝试在Rust中编写非常简单的解析器。我有一个while
循环,它仅通过比较源中的当前位置和源的长度来检查是否到达输入的末尾,而不是在到达输入的末尾时停止(更具体地说,将end_of_input
设置为true
),它继续前进并到达我在当前位置读取字符的部分。由于该位置与源的长度相同,因此我感到 panic ,并且看不到逻辑上的错误之处(当我添加break
时,最后一个标记未添加到标记向量中)。
这是一个minimal reproducible example。我正在使用regex
1.3.9。
use regex::Regex;
#[derive(Debug)]
enum TokenType {
CommandOperator,
Keyword(String),
Str(String),
Const(String)
}
#[derive(Debug)]
struct Token {
token_type: TokenType,
start_position: usize,
end_position: usize,
}
impl Token {
fn new(token_type: TokenType, start_position: usize, end_position: usize) -> Token {
Token { token_type, start_position, end_position }
}
}
fn keyword(string: &str) -> String {
String::from(string)
}
fn has_previous_char(string: &str, position: usize, character: char) -> bool {
match string.chars().nth(position - 1) {
Some(c) => c == character,
None => false
}
}
fn char_is_whitespace(character: char) -> bool {
character == ' ' || character == '\n' || character == '\t'
}
fn parse(input: String) -> Vec<Token> {
let mut tokens: Vec<Token> = Vec::new();
let mut position: usize = 0;
let mut end_of_input = false;
let keywords = [
keyword("import"),
keyword("as"),
];
while !end_of_input {
if position >= input.chars().count() {
end_of_input = true;
}
let character = match input.chars().nth(position) {
Some(c) => c,
_ => panic!("Fatal error: No character available at the position {}", position)
};
if character == '@' && (position == 0 || has_previous_char(&input, position, '\n')) {
tokens.push(Token::new(TokenType::CommandOperator, position, position));
position += 1;
continue;
}
if character == ' ' || character == '\n' || character == '\t' {
position += 1;
continue;
}
if character == '"' {
let mut str = String::from("");
position += 1;
for string_character in input[position..].chars() {
if string_character == '"' {
if input.chars().nth(position - 1) == Some('\\') {
str.push_str("\"");
position += 2;
} else {
tokens.push(Token::new(TokenType::Str(str.clone()), position - str.chars().count() - 1, position));
position += 1;
break;
}
} else {
str.push_str(&string_character.to_string());
position += 1;
}
}
}
let alphanumeric_re = Regex::new(r"[[:alpha:]]").unwrap();
let constant_re = Regex::new(r"[A-Za-z0-9_]").unwrap();
if alphanumeric_re.is_match(&character.to_string()) {
for word in keywords.iter() {
if &input[position..position + word.chars().count()] == word {
tokens.push(Token::new(TokenType::Keyword(word.clone()), position, position + word.chars().count() - 1));
position += word.chars().count();
break;
}
}
}
if constant_re.is_match(&character.to_string()) {
let mut constant = String::from("");
for constant_character in input[position..].chars() {
let constant_character_as_string = &constant_character.to_string();
if char_is_whitespace(character) || !constant_re.is_match(constant_character_as_string) {
if constant.chars().count() > 0 {
tokens.push(Token::new(TokenType::Const(constant.clone()), position - constant.chars().count(), position - 1));
}
break;
} else if constant_re.is_match(constant_character_as_string) {
constant.push_str(constant_character_as_string);
position += 1;
}
}
}
println!("{:#?}", tokens);
}
tokens
}
fn main() {
let input = String::from("@import \"intro.txt\" as intro1\n@import \"goals.txt\" as goals_section");
println!("{:#?}", parse(input));
}
我试图在
println!
循环的第一次检查中放入while
语句,它的是 true
。
最佳答案
while !end_of_input {
在循环执行返回到循环顶部的任何时间运行。您的片段
if position >= input.chars().count() {
end_of_input = true;
}
将分配
end_of_input
,但不会停止input.chars().nth(position)
从指向字符串末尾的
position
执行。似乎您可能希望将这条线设为
if position >= input.chars().count() {
break;
}
关于while-loop - 在while循环中 panic ,而不是在满足条件时停止,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62414740/