java - 速度优化树数据解析器

我正在处理输入格式如下的作业，我必须尽快解析它:

5 (
 5 (
  3 (
  )
 )
 3 (
  3 (
  )
  3 (
  )
 )
 5 (
  2 (
  )
  4 (
  )
 )
)

是“Employees”的树状结构，编号为后续任务(语言索引)。

每个员工可以有任意数量的下属和一个上级(根节点是“老板”)。

这是我的解析器:(最初我使用的是 Scanner，它又短又简单，但速度慢了大约两倍)

// Invocation
// Employee boss = collectEmployee(null, 0, reader);

private Employee collectEmployee(final Employee parent, int indent, final Reader r) throws IOException
{
    final StringBuilder sb = new StringBuilder();
    boolean nums = false;
    while (true) {
        char c = (char) r.read();
        if (c == 10 || c == 13) continue; // newline
        if (c == ' ') {
            if (nums) break;
        } else {
            nums = true;
            sb.append(c);
        }
    }
    final int lang = Integer.parseInt(sb.toString());
    final Employee self = new Employee(lang, parent);

    r.skip(1); // opening paren
    int spaces = 0;
    while (true) {
        r.mark(1);
        int i = r.read();
        char c = (char) i;
        if (c == 10 || c == 13) continue; // newline
        if (c == ' ') {
            spaces++;
        } else {
            if (spaces == indent) {
                break; // End of this employee
            } else {
                spaces = 0; // new line.
                r.reset();
                self.add(collectEmployee(self, indent + 1, r));
            }
        }
    }
    return self; // the root employee for this subtree
}

我需要再削减几个代码周期，这样它才能通过严格的要求。我已经对它进行了分析，这部分确实是降低应用程序速度的原因。输入文件最大可达 30 MiB，因此任何小的改进都会产生很大的不同。

任何想法表示赞赏。谢谢。

(为了完整起见，扫描仪实现在这里 - 它可以让您了解我如何解析它)

private Employee collectEmployee(final Employee parent, final Scanner sc)
{
    final int lang = Integer.parseInt(sc.next());
    sc.nextLine(); // trash the opening parenthesis

    final Employee self = new Employee(lang, parent);

    while (sc.hasNextInt()) {
        Employee sub = collectEmployee(self, sc);
        self.add(sub);
    }

    sc.nextLine(); // trash the closing parenthesis

    return self;
}

最佳答案

您正在使用 StringBuilder 进行大量数据推送 — 保留您在遇到十进制字符 ('0'-'9') (num = num * 10 + (c - '0')) 并在遇到非十进制数时存储/重置。这样你也可以摆脱 Integer.parseInt。
您似乎正在使用/检查层次结构的缩进，但您的输入格式包含大括号，这使其成为基于 S-Expression 的语法 — 因此您的解析器做的工作比需要的多得多(您可以忽略空格和使用一堆 Employees 处理大括号)。
我会考虑使用 JMH基准测试并使用 perf-asm(如果可用)运行以查看您的代码将时间花在哪里。真的，这是一个非常宝贵的工具。

关于java - 速度优化树数据解析器，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/28812244/

java - 速度优化树数据解析器

上一篇：c++ - C++ 中的 DFS : return node if it contains searched key

下一篇：javascript - 为什么算法性能不同