在一个巨大的目录中转换 XML 时出现 java.lang.OutOfMemoryError

我想使用 XSLT2 在具有很多级别的巨大目录中转换 XML 文件。有超过 100 万个文件，每个文件为 4 到 10 kB。一段时间后，我总是收到 java.lang.OutOfMemoryError: Java 堆空间。

我的命令是: java -Xmx3072M -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEna 放血-XX:MaxPermSize=512M ...

向-Xmx 添加更多内存不是一个好的解决方案。

这是我的代码:

for (File file : dir.listFiles()) {
    if (file.isDirectory()) {
        pushDocuments(file);
    } else {
        indexFiles.index(file);
    }
}

public void index(File file) {
    ByteArrayOutputStream outputStream = new ByteArrayOutputStream();

    try {
        xslTransformer.xslTransform(outputStream, file);
        outputStream.flush();
        outputStream.close();
    } catch (IOException e) {
        System.err.println(e.toString());
    }
}

通过 net.sf.saxon.s9api 进行 XSLT 转换

public void xslTransform(ByteArrayOutputStream outputStream, File xmlFile) {
    try {
        XdmNode source = proc.newDocumentBuilder().build(new StreamSource(xmlFile));
        Serializer out = proc.newSerializer();
        out.setOutputStream(outputStream);
        transformer.setInitialContextNode(source);
        transformer.setDestination(out);
        transformer.transform();

        out.close();
    } catch (SaxonApiException e) {
        System.err.println(e.toString());
    }
}

最佳答案

我对 Saxon s9api 接口(interface)的通常建议是重用 XsltExecutable 对象，但为每个转换创建一个新的 XsltTransformer。 XsltTransformer 缓存您已阅读的文档，以防再次需要它们，这在本例中不是您想要的。

作为替代方案，您可以在每次转换后调用 xsltTransformer.getUnderlyingController().clearDocumentPool()。

(请注意，您可以在 saxonica.plan.io 上向 Saxon 提问，这让我们 [Saxonica] 很有可能注意到并回答他们。您也可以在这里提问并将他们标记为“saxon”，这意味着我们可能会在某个时候回答这个问题，但并不总是立即。如果你在 StackOverflow 上询问时没有使用特定于产品的标签，那么是否有人会注意到这个问题完全是碰运气。)

关于在一个巨大的目录中转换 XML 时出现 java.lang.OutOfMemoryError，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/19764275/

在一个巨大的目录中转换 XML 时出现 java.lang.OutOfMemoryError

上一篇：java - Spring Batch - 无法初始化阅读器

下一篇：python - lxml 中的通配符命名空间