基本上,这篇文章是一个挑战。今天我一直在尝试优化 HTML 转义函数,并取得了一定的成功。但我知道那里有一些认真的 Java 黑客,他们可能比我做得更好,我很乐意学习。
我一直在分析我的 Java 网络应用程序,发现一个主要的热点是我们的字符串转义函数。我们目前使用 Apache Commons Lang为此任务,调用 StringEscapeUtils.escapeHtml()。我假设因为它被如此广泛地使用,所以速度会相当快,但即使是我最天真的实现也明显更快。
这是我在 Naive 实现中使用的基准代码。它测试不同长度的字符串,一些只包含纯文本,一些包含需要转义的 HTML。
public class HTMLEscapeBenchmark {
public static String escapeHtml(String text) {
if (text == null) return null;
StringBuilder sb = new StringBuilder();
for (int i = 0; i < text.length(); i++) {
char c = text.charAt(i);
if (c == '&') {
sb.append("&");
} else if (c == '\'') {
sb.append("'");
} else if (c == '"') {
sb.append(""");
} else if (c == '<') {
sb.append("<");
} else if (c == '>') {
sb.append(">");
} else {
sb.append(c);
}
}
return sb.toString();
}
/*
public static String escapeHtml(String text) {
if (text == null) return null;
return StringEscapeUtils.escapeHtml(text);
}
*/
public static void main(String[] args) {
final int RUNS = 5;
final int ITERATIONS = 1000000;
// Standard lorem ipsum text.
String loremIpsum = "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut " +
"labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut " +
"aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum " +
"dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia " +
"deserunt mollit anim id est laborum. ";
while (loremIpsum.length() < 1000) loremIpsum += loremIpsum;
// Add some characters that need HTML escaping. Bold every 2 and 3 letter word, quote every 5 letter word.
String loremIpsumHtml = loremIpsum.replaceAll("[A-Za-z]{2}]", "<b>$0</b>").replaceAll("[A-Za-z]{5}", "\"$0\"");
System.out.print("\nNormal-10");
String text = loremIpsum.substring(0, 10);
for (int run = 1; run <= RUNS; run++) {
long start = System.nanoTime();
for (int i = 0; i < ITERATIONS; i++) {
escapeHtml(text);
}
System.out.printf("\t%.3f", (System.nanoTime() - start) / 1e9);
}
System.out.print("\nNormal-100");
text = loremIpsum.substring(0, 100);
for (int run = 1; run <= RUNS; run++) {
long start = System.nanoTime();
for (int i = 0; i < ITERATIONS; i++) {
escapeHtml(text);
}
System.out.printf("\t%.3f", (System.nanoTime() - start) / 1e9);
}
System.out.print("\nNormal-1000");
text = loremIpsum.substring(0, 1000);
for (int run = 1; run <= RUNS; run++) {
long start = System.nanoTime();
for (int i = 0; i < ITERATIONS; i++) {
escapeHtml(text);
}
System.out.printf("\t%.3f", (System.nanoTime() - start) / 1e9);
}
System.out.print("\nHtml-10");
text = loremIpsumHtml.substring(0, 10);
for (int run = 1; run <= RUNS; run++) {
long start = System.nanoTime();
for (int i = 0; i < ITERATIONS; i++) {
escapeHtml(text);
}
System.out.printf("\t%.3f", (System.nanoTime() - start) / 1e9);
}
System.out.print("\nHtml-100");
text = loremIpsumHtml.substring(0, 100);
for (int run = 1; run <= RUNS; run++) {
long start = System.nanoTime();
for (int i = 0; i < ITERATIONS; i++) {
escapeHtml(text);
}
System.out.printf("\t%.3f", (System.nanoTime() - start) / 1e9);
}
System.out.print("\nHtml-1000");
text = loremIpsumHtml.substring(0, 1000);
for (int run = 1; run <= RUNS; run++) {
long start = System.nanoTime();
for (int i = 0; i < ITERATIONS; i++) {
escapeHtml(text);
}
System.out.printf("\t%.3f", (System.nanoTime() - start) / 1e9);
}
}
}
在我用了两岁的 MacBook pro 上,我得到了以下结果。
Commons Lang StringEscapeUtils.escapeHtml
Normal-10 0.439 0.357 0.351 0.343 0.342
Normal-100 2.244 0.934 0.930 0.932 0.931
Normal-1000 8.993 9.020 9.007 9.043 9.052
Html-10 0.270 0.259 0.258 0.258 0.257
Html-100 1.769 1.753 1.765 1.754 1.759
Html-1000 17.313 17.479 17.347 17.266 17.246
简单的实现
Normal-10 0.111 0.091 0.086 0.084 0.088
Normal-100 0.636 0.627 0.626 0.626 0.627
Normal-1000 5.740 5.755 5.721 5.728 5.720
Html-10 0.145 0.138 0.138 0.138 0.138
Html-100 0.899 0.901 0.896 0.901 0.900
Html-1000 8.249 8.288 8.272 8.262 8.284
我将发布我自己在优化方面的最佳尝试作为答案。所以,我的问题是,你能做得更好吗?转义 HTML 的最快方法是什么?
最佳答案
这是我优化它的最佳尝试。我针对我希望的纯文本字符串的常见情况进行了优化,但对于包含 HTML 实体的字符串,我没能做得更好。
public static String escapeHtml(String value) {
if (value == null) return null;
int length = value.length();
String encoded;
for (int i = 0; i < length; i++) {
char c = value.charAt(i);
if (c <= 62 && (encoded = getHtmlEntity(c)) != null) {
// We found a character to encode, so we need to start from here and buffer the encoded string.
StringBuilder sb = new StringBuilder((int) (length * 1.25));
sb.append(value.substring(0, i));
sb.append(encoded);
i++;
for (; i < length; i++) {
c = value.charAt(i);
if (c <= 62 && (encoded = getHtmlEntity(c)) != null) {
sb.append(encoded);
} else {
sb.append(c);
}
}
value = sb.toString();
break;
}
}
return value;
}
private static String getHtmlEntity(char c) {
switch (c) {
case '&': return "&";
case '\'': return "'";
case '"': return """;
case '<': return "<";
case '>': return ">";
default: return null;
}
}
Normal-10 0.021 0.023 0.011 0.012 0.011
Normal-100 0.074 0.074 0.073 0.074 0.074
Normal-1000 0.671 0.678 0.675 0.675 0.680
Html-10 0.222 0.152 0.153 0.153 0.154
Html-100 0.739 0.715 0.718 0.724 0.706
Html-1000 6.812 6.828 6.802 6.802 6.806
关于java - 绝对最快的 Java HTML 转义函数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12984140/