java - String.replace 和同一方法多次调用

标签 java string csv security optimization

我和我的 friend 编写了下面的代码来防止 CSV 文件中的代码注入(inject)。 (用java编写)

对于大型 CSV 文件(例如 400 列、10000 行),代码在最坏的情况下大约需要 15 秒(所有列和行都是错误的)。谁能帮我优化一下。

public static String sanitizeInputForCSV(final String inputCSVRow) {
    String outputCSVRow = inputCSVRow;
    outputCSVRow = escapeMacroTriggersFromCSV(outputCSVRow, "=");
    outputCSVRow = escapeMacroTriggersFromCSV(outputCSVRow, "-");
    outputCSVRow = escapeMacroTriggersFromCSV(outputCSVRow, "+");
    outputCSVRow = escapeMacroTriggersFromCSV(outputCSVRow, "@");

    return outputCSVRow;
}

public static String escapeMacroTriggersFromCSV(String inputString, String characterToEscape) {

    String outputString = inputString;

    // To replace the first ocurrance
    if (outputString.startsWith("\"" + characterToEscape)) {
        outputString = "\"" + " " + outputString.substring(1, outputString.length());
    } else if (outputString.startsWith(characterToEscape)) {
        outputString = " " + outputString.substring(0, outputString.length());
    }

    // To replace subsequent ocurrance
    outputString = outputString.replace(",\"" + characterToEscape, ",\"" + " " + characterToEscape);
    outputString = outputString.replace("," + characterToEscape, "," + " " + characterToEscape);

    return outputString;

}

更新

最终优化代码的运行速度比上面快 4 倍。

private static String sanitizeInputForCSVOpti1b(final String inputCSVRow) {
        StringBuilder outputCSVRow = new StringBuilder(inputCSVRow);
        outputCSVRow = escapeMacroTriggersFromCSVOpti1b(outputCSVRow, '=');
        outputCSVRow = escapeMacroTriggersFromCSVOpti1b(outputCSVRow, '-');
        outputCSVRow = escapeMacroTriggersFromCSVOpti1b(outputCSVRow, '+');
        outputCSVRow = escapeMacroTriggersFromCSVOpti1b(outputCSVRow, '@');
        return outputCSVRow.toString();
    }

private static StringBuilder escapeMacroTriggersFromCSVOpti1b(StringBuilder inputRow, char characterToEscape) {
    StringBuilder outputRow;

    if (inputRow.length() == 0 || (inputRow.length() == 1 && inputRow.charAt(0) != characterToEscape)) {
        outputRow = inputRow;
    } else if (inputRow.length() == 1 && inputRow.charAt(0) == characterToEscape) {
        outputRow = new StringBuilder().append(' ').append(inputRow);
    } else {
        outputRow = new StringBuilder();

        // To replace the first ocurrance
        final char firstCharacter = inputRow.charAt(0);
        final char secondCharacter = inputRow.charAt(1);
        if (firstCharacter == '\"' && secondCharacter == characterToEscape) {
            outputRow.append(firstCharacter).append(' ').append(secondCharacter);
        } else if (firstCharacter == characterToEscape) {
            outputRow.append(' ').append(firstCharacter).append(secondCharacter);
        } else {
            outputRow.append(firstCharacter).append(secondCharacter);
        }

        // To replace subsequent ocurrance
        for (int i = 2; i < inputRow.length(); i++) {
            if (inputRow.charAt(i) != characterToEscape) {
                outputRow.append(inputRow.charAt(i));
            } else if ((inputRow.charAt(i - 1) == '\"' && inputRow.charAt(i - 2) == ',') || inputRow.charAt(i - 1) == ',') {
                outputRow.append(' ').append(inputRow.charAt(i));
            }
        }
    }
    return outputRow;
}

最佳答案

正如 dpr 所说,使用专用于 CSV 解析的库会更有利,因为它很可能比我的解决方案更有效。不过,如果您想使用纯 Java,我相信以下内容就足够了:

public static String sanitizeInputForCSV(final String inputCSVRow) {
    StringBuilder outputCSVRow = new StringBuilder(inputCSVRow);

    escapeMacroTriggersFromCSV(outputCSVRow, '=', '-', '+', '@');

    return outputCSVRow.toString();
}

public static void escapeMacroTriggersFromCSV(StringBuilder inputString, char... charactersToEscape) {
    for (char c : charactersToEscape) {
        // To replace the first ocurrance
        if (inputString.charAt(0) == '\"') {
            inputString.insert(inputString.charAt(1) == c ? 1 : 0, " ");
        }

        // To replace subsequent ocurrance
        for (int i = 0; i < inputString.length(); i++) {
            if (inputString.charAt(i) != c) {
                continue;
            }

            if (inputString.charAt(i - 2) != ',' && inputString.charAt(i - 1) != ',') {
                continue;
            }

            inputString.insert(i, " ");
        }
    }
}

我的解决方案没有创建大量 String 对象,而是利用 StringBuilder 来节省内存,并且可能更高效地执行!

关于java - String.replace 和同一方法多次调用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45236922/

相关文章:

java - 如何使用 JDBC 驱动程序 0.13.0 在 Hive 中设置 FetchSize

csv - SSIS - 在将事实与查找表匹配两次时重用 Ole DB 源

python - Python中列表列表中的列表中的值的平均值

java - 在另一个 PDF 文件的可用空间附加 PDF

java - 具有多个方法源的 JUnit5 参数化测试

Python csv 模块拆分字符串,而不仅仅是字段

python - 如何在 Python 中使用正则表达式将字符串中的字符替换为 '#' s

csv - 批处理脚本 : Quicker ECHO output to a csv file in a FOR loop?

java - 将java对象树转换为二维表

javascript - 查找字符串中的第一个字母 - Javascript