Java 创建 100MB 压缩 csv 文件性能问题

标签 java supercsv zipoutputstream

我需要在 5 秒内创建 100mb 压缩文件,其中包含使用 java 的 CSV 文件。我已创建包含 CSV 文件的 test.zip,但生成 zip 文件需要太多时间(约 30 秒)。这是我到目前为止编写的代码:

ByteArrayOutputStream baos = new ByteArrayOutputStream();
/* Create instance of ZipOutputStream to create ZIP file. */
ZipOutputStream zipOutputStream = new ZipOutputStream(baos);

/* Create ZIP entry for file.The file which is created put into the
 * zip file.File is not on the disk, csvFileName indicates only the
 * file name to be put into the zip
 */
ZipEntry zipEntry = new ZipEntry("Test.zip");

zipOutputStream.putNextEntry(zipEntry);

/* Create OutputStreamWriter for CSV. There is no need for staging
 * the CSV on filesystem . Directly write bytes to the output stream.
 */
BufferedWriter bufferedWriter = new BufferedWriter(new OutputStreamWriter(zipOutputStream, "UTF-8"));

CsvListWriter csvListWriter = new CsvListWriter(bufferedWriter, CsvPreference.EXCEL_PREFERENCE);

/* Write the CSV header to the generated CSV file. */
csvListWriter.writeHeader(CSVGeneratorConstant.CSV_HEADERS);

/* Logic to Write the content to CSV */
long startTime = System.currentTimeMillis();

for (int rowIdx = 0; rowIdx < 7000000; rowIdx++) {
    final List<String> rowContent = new LinkedList<String>();
    for (int colIdx = 0; colIdx < 6; colIdx++) {
        String str = "R" + rowIdx + "C" + colIdx + " FieldContent";
        rowContent.add(str);
    }
    csvListWriter.write(rowContent);
}
long stopTime = System.currentTimeMillis();
long elapsedTime = stopTime - startTime;
System.out.println("time==" + elapsedTime / 1000f + "Seconds");

System.out.println("Size=====" + baos.size() / (Math.pow(1024, 2)) + "MB");

csvListWriter.close();
bufferedWriter.close();
zipOutputStream.close();
baos.close();

我正在使用 super csv 库,但我也尝试在没有 super csv 库的情况下在内存中创建 zip 文件,但没有成功。你能帮我吗?

最佳答案

您的测试数据约为 1GB,压缩后为 100MB。根据您的硬件,可能无法实现 < 5 秒的性能。

我整理了一个快速而肮脏的基准测试,突出显示了写入 zip 文件对性能的影响。

  • 使用 String.join() 写入 CSV:9.6s
  • 使用 Super CSV 写入 CSV:12.7 秒
  • 使用 String.join() 在 zip 中写入 CSV:18.6s
  • 使用 Super CSV 在 zip 中写入 CSV:22.5 秒

使用 Super CSV 似乎会产生一点开销(约 122%),但无论是否使用 Super CSV,仅写入 zip 文件几乎都会使时间增加一倍(约 190%)。

这是 4 个场景的代码。

与您提供的代码不同,我直接写入文件(我没有注意到写入磁盘与写入内存之间有任何区别,即ByteArrayOutputStream)。我还跳过了 Super CSV 示例中的 BufferedWriter,因为它已经在内部使用了它,并且我使用了 try-with-resources 来使事情变得更清晰。

@Test
public void testWriteToCsvFileWithSuperCSV() throws Exception {
    long startTime = System.currentTimeMillis();

    try (FileOutputStream csvFile = new FileOutputStream(new File("supercsv.csv"));
         ICsvListWriter writer = new CsvListWriter(new OutputStreamWriter(csvFile, "UTF-8"), CsvPreference.EXCEL_PREFERENCE)
    ){
        for (int rowIdx = 0; rowIdx < 7000000; rowIdx++) {
            final List<String> rowContent = new LinkedList<>();
            for (int colIdx = 0; colIdx < 6; colIdx++) {
                String str = "R" + rowIdx + "C" + colIdx + " FieldContent";
                rowContent.add(str);
            }
            writer.write(rowContent);
        }
    }

    long stopTime = System.currentTimeMillis();
    long elapsedTime = stopTime - startTime;
    System.out.println("Writing to CSV with Super CSV took " + (elapsedTime / 1000f) + " seconds");
}

@Test
public void testWriteToCsvFileWithinZipWithSuperCSV() throws Exception {
    long startTime = System.currentTimeMillis();

    try (FileOutputStream zipFile = new FileOutputStream(new File("supercsv.zip"));
         ZipOutputStream zos = new ZipOutputStream(zipFile);
         ICsvListWriter writer = new CsvListWriter(new OutputStreamWriter(zos, "UTF-8"), CsvPreference.EXCEL_PREFERENCE)
    ){

        ZipEntry csvFile = new ZipEntry("supercsvwithinzip.csv");
        zos.putNextEntry(csvFile);

        for (int rowIdx = 0; rowIdx < 7000000; rowIdx++) {
            final List<String> rowContent = new LinkedList<>();
            for (int colIdx = 0; colIdx < 6; colIdx++) {
                String str = "R" + rowIdx + "C" + colIdx + " FieldContent";
                rowContent.add(str);
            }
            writer.write(rowContent);
        }
    }

    long stopTime = System.currentTimeMillis();
    long elapsedTime = stopTime - startTime;
    System.out.println("Writing to CSV within zip file with Super CSV took " + (elapsedTime / 1000f) + " seconds");
}

@Test
public void testWriteToCsvFileWithStringJoin() throws Exception {
    long startTime = System.currentTimeMillis();

    try (FileOutputStream textFile = new FileOutputStream(new File("join.csv"));
         BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(textFile, "UTF-8"));
    ){

        for (int rowIdx = 0; rowIdx < 7000000; rowIdx++) {
            final List<String> rowContent = new LinkedList<>();
            for (int colIdx = 0; colIdx < 6; colIdx++) {
                String str = "R" + rowIdx + "C" + colIdx + " FieldContent";
                rowContent.add(str);
            }
            writer.append(String.join(",", rowContent) + "\n");
        }
    }

    long stopTime = System.currentTimeMillis();
    long elapsedTime = stopTime - startTime;
    System.out.println("Writing to CSV with String.join() took " + (elapsedTime / 1000f) + " seconds");
}

@Test
public void testWriteToCsvFileWithinZipWithStringJoin() throws Exception {
    long startTime = System.currentTimeMillis();

    try (FileOutputStream zipFile = new FileOutputStream(new File("join.zip"));
         ZipOutputStream zos = new ZipOutputStream(zipFile);
         BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(zos, "UTF-8"));
    ){

        ZipEntry csvFile = new ZipEntry("joinwithinzip.csv");
        zos.putNextEntry(csvFile);

        for (int rowIdx = 0; rowIdx < 7000000; rowIdx++) {
            final List<String> rowContent = new LinkedList<>();
            for (int colIdx = 0; colIdx < 6; colIdx++) {
                String str = "R" + rowIdx + "C" + colIdx + " FieldContent";
                rowContent.add(str);
            }
            writer.append(String.join(",", rowContent) + "\n");
        }
    }

    long stopTime = System.currentTimeMillis();
    long elapsedTime = stopTime - startTime;
    System.out.println("Writing to CSV within zip with String.join() took " + (elapsedTime / 1000f) + " seconds");
}

关于Java 创建 100MB 压缩 csv 文件性能问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32580036/

相关文章:

java - super CSV : Looking for integer equivalent to LMinMax

java - 我们可以加速 Java 中的 CPU 密集型任务吗?

java - Spring:我应该使用@Service然后@Autowired它,还是@Component AnnotationConfigApplicationContext

java - 如何用WebDriver处理Telerik弹出窗口(selenium JAVA)

java - SuperCSV - 将多列解析为列表

Java.util.zip 替换单个 zip 文件

java - 在 Spring 批处理中插入跳过的记录信息

java - SUPER CSV 将 bean 写入 CSV

java - "IllegalArgumentException: UNMAPPABLE[1]"压缩带有希腊字符的文件时

java - 生成的 ZIP 文件已损坏/无效