我正在批量处理非常大的文件。我在每一行的每个 URI 上调用以下方法:
public String shortenUri(String uri) {
uri = uri
.replace("http://www.lemon-model.net/lemon#", "lemon:")
.replace("http://babelnet.org/rdf/", "bn:")
.replace("http://purl.org/dc/", "dc:")
.replace("http://www.w3.org/1999/02/22-rdf-syntax-ns#", "rdf:");
return uri;
}
奇怪的是,这会导致以下错误:
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.regex.Pattern$BnM.optimize(Pattern.java:5411)
at java.util.regex.Pattern.compile(Pattern.java:1711)
at java.util.regex.Pattern.<init>(Pattern.java:1351)
at java.util.regex.Pattern.compile(Pattern.java:1054)
at java.lang.String.replace(String.java:2239)
at XYZ.shortenUri(XYZ.java:217)
我确实增加了 Xms
和 Xmx
但它没有帮助。奇怪的是,我在监视进程时也没有观察到内存使用量增加。关于提高性能和内存消耗的任何建议?
最佳答案
引自 Oracle :
Excessive GC Time and OutOfMemoryError
The parallel collector will throw an OutOfMemoryError if too much time is being spent in garbage collection: if more than 98% of the total time is spent in garbage collection and less than 2% of the heap is recovered, an OutOfMemoryError will be thrown. This feature is designed to prevent applications from running for an extended period of time while making little or no progress because the heap is too small. If necessary, this feature can be disabled by adding the option -XX:-UseGCOverheadLimit to the command line.
您可以尝试的第一件事是进一步增加堆大小,例如,使用
-Xmx4G
增加几 GB。另一种选择可能是通过不使用
replace
方法来防止创建太多对象。相反,您可以根据需要创建Pattern
和Matcher
对象(见下文)。我看到的第三个选项是使用
完全禁用此功能-XX:-UseGCOverheadLimit
private static final Pattern PURL_PATTERN = Pattern.compile("http://purl.org/dc/"); // other patterns public static String shortenUri(String uri) { // other matchers Matcher matcher = PURL_PATTERN.matcher(uri); return matcher.replaceAll("dc:"); }
关于java - 重复替换调用导致 java.lang.OutOfMemoryError,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58570828/