我和我的 friend 编写了下面的代码来防止 CSV 文件中的代码注入(inject)。 (用java编写)
对于大型 CSV 文件(例如 400 列、10000 行),代码在最坏的情况下大约需要 15 秒(所有列和行都是错误的)。谁能帮我优化一下。
public static String sanitizeInputForCSV(final String inputCSVRow) {
String outputCSVRow = inputCSVRow;
outputCSVRow = escapeMacroTriggersFromCSV(outputCSVRow, "=");
outputCSVRow = escapeMacroTriggersFromCSV(outputCSVRow, "-");
outputCSVRow = escapeMacroTriggersFromCSV(outputCSVRow, "+");
outputCSVRow = escapeMacroTriggersFromCSV(outputCSVRow, "@");
return outputCSVRow;
}
public static String escapeMacroTriggersFromCSV(String inputString, String characterToEscape) {
String outputString = inputString;
// To replace the first ocurrance
if (outputString.startsWith("\"" + characterToEscape)) {
outputString = "\"" + " " + outputString.substring(1, outputString.length());
} else if (outputString.startsWith(characterToEscape)) {
outputString = " " + outputString.substring(0, outputString.length());
}
// To replace subsequent ocurrance
outputString = outputString.replace(",\"" + characterToEscape, ",\"" + " " + characterToEscape);
outputString = outputString.replace("," + characterToEscape, "," + " " + characterToEscape);
return outputString;
}
更新
最终优化代码的运行速度比上面快 4 倍。
private static String sanitizeInputForCSVOpti1b(final String inputCSVRow) {
StringBuilder outputCSVRow = new StringBuilder(inputCSVRow);
outputCSVRow = escapeMacroTriggersFromCSVOpti1b(outputCSVRow, '=');
outputCSVRow = escapeMacroTriggersFromCSVOpti1b(outputCSVRow, '-');
outputCSVRow = escapeMacroTriggersFromCSVOpti1b(outputCSVRow, '+');
outputCSVRow = escapeMacroTriggersFromCSVOpti1b(outputCSVRow, '@');
return outputCSVRow.toString();
}
private static StringBuilder escapeMacroTriggersFromCSVOpti1b(StringBuilder inputRow, char characterToEscape) {
StringBuilder outputRow;
if (inputRow.length() == 0 || (inputRow.length() == 1 && inputRow.charAt(0) != characterToEscape)) {
outputRow = inputRow;
} else if (inputRow.length() == 1 && inputRow.charAt(0) == characterToEscape) {
outputRow = new StringBuilder().append(' ').append(inputRow);
} else {
outputRow = new StringBuilder();
// To replace the first ocurrance
final char firstCharacter = inputRow.charAt(0);
final char secondCharacter = inputRow.charAt(1);
if (firstCharacter == '\"' && secondCharacter == characterToEscape) {
outputRow.append(firstCharacter).append(' ').append(secondCharacter);
} else if (firstCharacter == characterToEscape) {
outputRow.append(' ').append(firstCharacter).append(secondCharacter);
} else {
outputRow.append(firstCharacter).append(secondCharacter);
}
// To replace subsequent ocurrance
for (int i = 2; i < inputRow.length(); i++) {
if (inputRow.charAt(i) != characterToEscape) {
outputRow.append(inputRow.charAt(i));
} else if ((inputRow.charAt(i - 1) == '\"' && inputRow.charAt(i - 2) == ',') || inputRow.charAt(i - 1) == ',') {
outputRow.append(' ').append(inputRow.charAt(i));
}
}
}
return outputRow;
}
最佳答案
正如 dpr 所说,使用专用于 CSV 解析的库会更有利,因为它很可能比我的解决方案更有效。不过,如果您想使用纯 Java,我相信以下内容就足够了:
public static String sanitizeInputForCSV(final String inputCSVRow) {
StringBuilder outputCSVRow = new StringBuilder(inputCSVRow);
escapeMacroTriggersFromCSV(outputCSVRow, '=', '-', '+', '@');
return outputCSVRow.toString();
}
public static void escapeMacroTriggersFromCSV(StringBuilder inputString, char... charactersToEscape) {
for (char c : charactersToEscape) {
// To replace the first ocurrance
if (inputString.charAt(0) == '\"') {
inputString.insert(inputString.charAt(1) == c ? 1 : 0, " ");
}
// To replace subsequent ocurrance
for (int i = 0; i < inputString.length(); i++) {
if (inputString.charAt(i) != c) {
continue;
}
if (inputString.charAt(i - 2) != ',' && inputString.charAt(i - 1) != ',') {
continue;
}
inputString.insert(i, " ");
}
}
}
我的解决方案没有创建大量 String
对象,而是利用 StringBuilder
来节省内存,并且可能更高效地执行!
关于java - String.replace 和同一方法多次调用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45236922/