java - 为什么我的文本解析器会进入无限循环,尽管循环已明确中断?

标签 java parsing

我一直在开发一个实用程序,用于解析 Paradox Interactive 在其大战略游戏中使用的格式的文本文件,以便与我也在开发的基于视觉的修改工具一起使用。我写出了一个大部分实现的、粗糙的、早期版本的解析器,它基本上按预期工作。这是我第二次尝试编写文本解析器(第一次,最终工作得很好,解析了 XML 的子集)。

我在 9 号快速编写了我的解析器,并花了整个周末尝试调试它,但我所有的努力都失败了。我已将问题追溯到 nextChar() 的第三行。它抛出一个 ArrayIndexOutOfBounds 错误,错误的数字非常小(-2 百万)。添加边界检查后,程序就......继续。它根据需要读取所有信息,只是永远不会退出解析循环。

格式基本上是这样的:

car = {
    model_year = 1966
    model_name = "Chevy"
    components = {
        "engine", "frame", "muffler"
    }
}

虽然我还没有像我计划的那样添加对嵌套列表的支持,所以我的测试字符串是:

car = {
    model_year = 1966
    model_name = "Chevy"
}

为了我的理解和任何会看到我的代码的人,我尝试在我认为可能有必要的地方慷慨地评论我的代码,但如果需要任何澄清,我很乐意提供。

我的代码:

/**
 * Parses text files in the format used by Paradox Interactive in their computer games EUIV, CK2, and Stellaris.
 * 
 * @author DJMethaneMan 
 * @date 12/9/2016
 */
public class Parser
{
    private int pos, line, len, depth;
    public String text;
    private char[] script; //TODO: Initialize in the parse method

    public Parser()
    {
        pos = 0;
        line = 1;
        len = 0;
        depth = 0;
        text = "car = {\n" +
               "    model_year = 1966 \n" +
               "    model_name = \"Chevy\"\n" +
               "}\u0003";
        //text = "Hello World";
        //Car c = new Car();
        //parse(text, c);
    }

    public static void main()
    {
        Car c = new Car();
        Parser p = new Parser();
        p.parse(p.text, c);
        System.out.println("The model name is " + c.model_name);
        System.out.println("The model year is " + c.model_year);
    }

    //TODO: Work
    public void parse(String text, Parseable parsed)
    {
        char[] script = text.toCharArray();
        this.script = script;
        boolean next_char = false;
        PARSE_LOOP:while(true)
        {
            char c;
            if(next_char)
            {
                c = nextChar();
            }
            else
            {
                c = script[0];
                next_char = true;
            }

            switch(c)
            {
                case 'A':
                case 'a':
                case 'B':
                case 'b':
                case 'C':
                case 'c':
                case 'D':
                case 'd':
                case 'E':
                case 'e':
                case 'F':
                case 'f':
                case 'G':
                case 'g':
                case 'H':
                case 'h':
                case 'I':
                case 'i':
                case 'J':
                case 'j':
                case 'K':
                case 'k':
                case 'L':
                case 'l':
                case 'M':
                case 'm':
                case 'N':
                case 'n':
                case 'O':
                case 'o':
                case 'P':
                case 'p':
                case 'Q':
                case 'q':
                case 'R':
                case 'r':
                case 'S':
                case 's':
                case 'T':
                case 't':
                case 'U':
                case 'u':
                case 'V':
                case 'v':
                case 'W':
                case 'w':
                case 'X':
                case 'x':
                case 'Y':
                case 'y':
                case 'Z':
                case 'z':
                case '_'://TODO: HERE
                    if(depth > 0) //
                    {
                        parsed.parseRead(buildWordToken(true), this);//Let the class decide how to handle this information. Best solution since I do not know how to implement automatic deserialization.
                    }
                    continueUntilChar('=', false); //A value must be assigned because it is basically a key value pair with {} or a string or number as the value
                    skipWhitespace();//Skip any trailing whitespace straight to the next token.
                    break;
                case '{':
                    depth++;
                    break;
                case '}':
                    depth--;
                    break;
                case '\n':
                    line++;
                    break;
                case ' ':
                case '\t':
                    skipWhitespace();
                    break;
                case '\u0003': //End of Text Character... Not sure if it will work in a file...
                    break PARSE_LOOP;
            }
        }
    }

    //Returns a string from the next valid token
    public String parseString()
    {
        String retval = "";
        continueUntilChar('=', false);
        continueUntilChar('"', false);
        retval = buildWordToken(false);
        continueUntilChar('"', false); //Don't rewind because we want to skip over the quotation and not append it.
        return retval;
    }

    //Returns a double from the next valid token
    public double parseNumber()
    {
        double retval = 0;
        continueUntilChar('=', false); //False because we don't want to include the = in any parsing...
        skipWhitespace(); //In case we encounter whitespace.
        try
        {
            retval = Double.parseDouble(buildNumberToken(false));
        }
        catch(Exception e)
        {
            System.out.println("A token at line " + line + " is not a valid number but is being passed as such.");
        }
        return retval;
    }


    /**********************************Utility Methods for Parsing****************************************/

    protected void continueUntilChar(char target, boolean rewind)
    {
        while(true)
        {
            char c = nextChar();
            if(c == target)
            {
                break;
            }
        }
        if(rewind)
        {
            pos--;
        }
    }

    protected void skipWhitespace()
    {
        while(true)
        {
            char c = nextChar();
            if(!Character.isWhitespace(c))
            {
                break;
            }
        }
        pos--;//Rewind because by default parse increments pos by 1 one when fetching nextChar each iteration.
    }

    protected String buildNumberToken(boolean rewind)
    {
        StringBuilder token = new StringBuilder();
        String retval = "INVALID_NUMBER";
        char token_start = script[pos];
        System.out.println(token_start + " is a valid char for a word token."); //Print it.
        token.append(token_start);
        while(true)
        {
            char c = nextChar();
            if(Character.isDigit(c) || (c == '.' && (Character.isDigit(peek(1)) || Character.isDigit(rewind(1))))) //Makes sure things like 1... and ...1234 don't get parsed as numbers.
            {
                token.append(c);
                System.out.println(c + " is a valid char for a word token."); //Print it for debugging
            }
            else
            {
                break;
            }
        }
        return retval;
    }

    protected String buildWordToken(boolean rewind)
    {
        StringBuilder token = new StringBuilder(); //Used to build the token
        char token_start = script[pos]; //The char the parser first found would make this a valid token
        token.append(token_start); //Add said char since it is part of the token
        System.out.println(token_start + " is a valid char for a word token."); //Print it.
        while(true)
        {
            char c = nextChar();
            if(Character.isAlphabetic(c) || Character.isDigit(c) || c == '_')//Make sure it is a valid token for a word
            {
                System.out.println(c + " is a valid char for a word token."); //Print it for debugging
                token.append(c); //Add it to the token since its valid
            }
            else
            {
                if(rewind)//If leaving the method will make this skip over a valid token set this to true.
                {
                    //Rewind by 1 because the main loop in parse() will still check pos++ and we want to check the pos of the next char after the end of the token.
                    pos--;
                    break; //Leave the loop and return the token.
                }
                else //Otherwise
                {
                    break; //Just leave the loop and return the token.
                }
            }
        }
        return token.toString(); //Get the string value of the token and return it.
    }

    //Returns the next char in the script by amount but does not increment pos.
    protected char peek(int amount)
    {
        int lookahead = pos + amount; //pos + 1;
        char retval = '\u0003'; //End of text character
        if(lookahead < script.length)//Make sure lookahead is in bounds.
        {
            retval = script[lookahead]; //Return the char at the lookahead.
        }
        return retval; //Return it.
    }

    //Returns the previous char in the script by amount but does not decrement pos.
    //Basically see peek only this is the exact opposite.
    protected char rewind(int amount)
    {
        int lookbehind = pos - amount; //pos + 1;
        char retval = '\u0003';
        if(lookbehind > 0)
        {
            retval = script[lookbehind];
        }
        return retval;
    }

    //Returns the next character in the script.
    protected char nextChar()
    {
        char retval = '\u0003';
        pos++;
        if(pos < script.length && !(pos < 0))
        {
            retval = script[pos]; //It says this is causing an ArrayIndexOutOfBoundsException with the following message. Shows a very large (small?) negative number.
        }
        return retval;
    }
}

//TODO: Extend
interface Parseable
{
    public void parseRead(String token, Parser p);
    public void parseWrite(ParseWriter writer);
}


//TODO: Work on
class ParseWriter
{

}

class Car implements Parseable
{
    public String model_name;
    public int model_year;

    @Override
    public void parseRead(String token, Parser p)
    {
        if(token.equals("model_year"))
        {
            model_year = (int)p.parseNumber();
        }
        else if(token.equals("model_name"))
        {
            model_name = p.parseString();
        }
    }

    @Override
    public void parseWrite(ParseWriter writer)
    {
        //TODO: Implement along with the ParseWriter
    }
}

最佳答案

使用带标签的break语句break PARSE_LOOP;通常被认为是不好的做法。您本质上是在编写一个“goto”语句:每当满足 break PARSE_LOOP; 条件时,它就会跳回到 while 循环的开头(因为那是您编写 PARSE_LOOP: 的地方) >)。这可能就是你无限循环的原因。我也不明白为什么你要重新启动一个已经无限的 while 循环(while true)。

将代码更改为:

 public void parse(String text, Parseable parsed)
        {
            char[] script = text.toCharArray();
            this.script = script;
            boolean next_char = false;
            boolean parsing = true;

            while(parsing)
            {
                char c;
                if(next_char)
                {
                    c = nextChar();
                }
                else
                {
                    c = script[0];
                    next_char = true;
                }

                switch(c)
                {
                    case 'A':
                    case 'a':
                    case 'B':
                    case 'b':
                    case 'C':
                    case 'c':
                    case 'D':
                    case 'd':
                    case 'E':
                    case 'e':
                    case 'F':
                    case 'f':
                    case 'G':
                    case 'g':
                    case 'H':
                    case 'h':
                    case 'I':
                    case 'i':
                    case 'J':
                    case 'j':
                    case 'K':
                    case 'k':
                    case 'L':
                    case 'l':
                    case 'M':
                    case 'm':
                    case 'N':
                    case 'n':
                    case 'O':
                    case 'o':
                    case 'P':
                    case 'p':
                    case 'Q':
                    case 'q':
                    case 'R':
                    case 'r':
                    case 'S':
                    case 's':
                    case 'T':
                    case 't':
                    case 'U':
                    case 'u':
                    case 'V':
                    case 'v':
                    case 'W':
                    case 'w':
                    case 'X':
                    case 'x':
                    case 'Y':
                    case 'y':
                    case 'Z':
                    case 'z':
                    case '_'://TODO: HERE
                        if(depth > 0) //
                        {
                            parsed.parseRead(buildWordToken(true), this);//Let the class decide how to handle this information. Best solution since I do not know how to implement automatic deserialization.
                        }
                        continueUntilChar('=', false); //A value must be assigned because it is basically a key value pair with {} or a string or number as the value
                        skipWhitespace();//Skip any trailing whitespace straight to the next token.
                        break;
                    case '{':
                        depth++;
                        break;
                    case '}':
                        depth--;
                        break;
                    case '\n':
                        line++;
                        break;
                    case ' ':
                    case '\t':
                        skipWhitespace();
                        break;
                    case '\u0003': //End of Text Character... Not sure if it will work in a file...
                        parsing = false;
                        break;
                }
            }
        }

关于java - 为什么我的文本解析器会进入无限循环,尽管循环已明确中断?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41107323/

相关文章:

java - 如何在 Linux 上的 Jenkins 中修复 java.awt.HeadlessException

Python:包含/复制其他脚本的行?

java - 仅在添加 WebView 和 TextView 时才在 LinearLayout 中动态创建 WebView 和 TextView

java - Jersey 2 + Swagger 返回空列表 API

parsing - ANTLR语法: parser- and lexer literals

Java - 解析 HTML - 获取文本

java - 如何使用 Jackson 将 JSON 数组中的嵌套值解析到列表中

python - 简单的解析器,但不是计算器

java - Pentaho在同一个jar中启动一个java程序

java - 组织.postgresql.util.PSQLException : ERROR: column "id" does not exist - Java Web Service