java - 在 Java 中逐行读取和写入大文件的最快方法

标签 java performance file-io bufferedreader

我一直在寻找在内存有限(约 64MB)的 Java 中再次读取和写入大文件(0.5 - 1 GB)的最快方法。文件中的每一行代表一条记录,所以我需要逐行获取它们。该文件是一个普通的文本文件。

我尝试了 BufferedReader 和 BufferedWriter 但它似乎不是最好的选择。读写一个 0.5 GB 大小的文件大约需要 35 秒,只读写不处理。我认为这里的瓶颈是写作,因为单独阅读大约需要 10 秒。

我尝试读取字节数组,但随后在每个读取的数组中搜索行需要更多时间。

有什么建议吗? 谢谢

最佳答案

我怀疑您真正的问题是您的硬件有限,而您所做的是软件不会产生太大影响。如果你有足够的内存和 CPU,更高级的技巧会有所帮助,但如果你只是因为文件没有缓存而在硬盘上等待,那不会有太大的不同。

顺便说一句:10 秒内 500 MB 或 50 MB/秒是 HDD 的典型读取速度。

尝试运行以下命令,看看您的系统在什么时候无法有效地缓存文件。

public static void main(String... args) throws IOException {
    for (int mb : new int[]{50, 100, 250, 500, 1000, 2000})
        testFileSize(mb);
}

private static void testFileSize(int mb) throws IOException {
    File file = File.createTempFile("test", ".txt");
    file.deleteOnExit();
    char[] chars = new char[1024];
    Arrays.fill(chars, 'A');
    String longLine = new String(chars);
    long start1 = System.nanoTime();
    PrintWriter pw = new PrintWriter(new FileWriter(file));
    for (int i = 0; i < mb * 1024; i++)
        pw.println(longLine);
    pw.close();
    long time1 = System.nanoTime() - start1;
    System.out.printf("Took %.3f seconds to write to a %d MB, file rate: %.1f MB/s%n",
            time1 / 1e9, file.length() >> 20, file.length() * 1000.0 / time1);

    long start2 = System.nanoTime();
    BufferedReader br = new BufferedReader(new FileReader(file));
    for (String line; (line = br.readLine()) != null; ) {
    }
    br.close();
    long time2 = System.nanoTime() - start2;
    System.out.printf("Took %.3f seconds to read to a %d MB file, rate: %.1f MB/s%n",
            time2 / 1e9, file.length() >> 20, file.length() * 1000.0 / time2);
    file.delete();
}

在具有大量内存的 Linux 机器上。

Took 0.395 seconds to write to a 50 MB, file rate: 133.0 MB/s
Took 0.375 seconds to read to a 50 MB file, rate: 140.0 MB/s
Took 0.669 seconds to write to a 100 MB, file rate: 156.9 MB/s
Took 0.569 seconds to read to a 100 MB file, rate: 184.6 MB/s
Took 1.585 seconds to write to a 250 MB, file rate: 165.5 MB/s
Took 1.274 seconds to read to a 250 MB file, rate: 206.0 MB/s
Took 2.513 seconds to write to a 500 MB, file rate: 208.8 MB/s
Took 2.332 seconds to read to a 500 MB file, rate: 225.1 MB/s
Took 5.094 seconds to write to a 1000 MB, file rate: 206.0 MB/s
Took 5.041 seconds to read to a 1000 MB file, rate: 208.2 MB/s
Took 11.509 seconds to write to a 2001 MB, file rate: 182.4 MB/s
Took 9.681 seconds to read to a 2001 MB file, rate: 216.8 MB/s

在具有大量内存的 Windows 机器上。

Took 0.376 seconds to write to a 50 MB, file rate: 139.7 MB/s
Took 0.401 seconds to read to a 50 MB file, rate: 131.1 MB/s
Took 0.517 seconds to write to a 100 MB, file rate: 203.1 MB/s
Took 0.520 seconds to read to a 100 MB file, rate: 201.9 MB/s
Took 1.344 seconds to write to a 250 MB, file rate: 195.4 MB/s
Took 1.387 seconds to read to a 250 MB file, rate: 189.4 MB/s
Took 2.368 seconds to write to a 500 MB, file rate: 221.8 MB/s
Took 2.454 seconds to read to a 500 MB file, rate: 214.1 MB/s
Took 4.985 seconds to write to a 1001 MB, file rate: 210.7 MB/s
Took 5.132 seconds to read to a 1001 MB file, rate: 204.7 MB/s
Took 10.276 seconds to write to a 2003 MB, file rate: 204.5 MB/s
Took 9.964 seconds to read to a 2003 MB file, rate: 210.9 MB/s

关于java - 在 Java 中逐行读取和写入大文件的最快方法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13155700/

相关文章:

java - Java 中的专用绘图界面?

javascript - 如何检测由贝塞尔曲线制成的物体与圆之间的碰撞?

ruby - RSpec:如何测试文件操作和文件内容

Java 编程 - 为应用程序保存/读取数据的最佳方式

java - 是否有创建 BPMN 的 Java API?

java - 如何通过最新版本的 java webstart 将系统属性传递给 Java?

java - 使用缓存时管理 Java 对象序列化版本的策略

performance - PostgreSQL 可扩展性 : Towards Millions TPS

matlab - 读取 .txt 文件的 (m x n) 行逗号分隔行

java - 尝试创建文件但没有成功 - 文件出现在其他地方?