node.js - NodeJS通过流复制文件非常慢

标签 node.js performance file-io stream pipe

我在 VMWare 下的 SSD 上使用 Node 复制文件,但性能非常低。我为测量实际速度而运行的基准如下:

$ hdparm -tT /dev/sda

/dev/sda:
 Timing cached reads:   12004 MB in  1.99 seconds = 6025.64 MB/sec
 Timing buffered disk reads: 1370 MB in  3.00 seconds = 456.29 MB/sec

但是,以下复制文件的 Node 代码非常慢,即使后续运行也不会使其更快:

var fs  = require("fs");
fs.createReadStream("bigfile").pipe(fs.createWriteStream("tempbigfile"));

运行如下:

$ seq 1 10000000 > bigfile
$ ll bigfile -h
-rw-rw-r-- 1 mustafa mustafa 848M Jun  3 03:30 bigfile
$ time node test.js 

real    0m4.973s
user    0m2.621s
sys     0m7.236s
$ time node test.js 

real    0m5.370s
user    0m2.496s
sys     0m7.190s

这里有什么问题,我该如何加快速度?我相信我可以通过调整缓冲区大小在 C 中更快地编写它。让我感到困惑的是,当我编写简单的几乎 pv 等效程序时,将 stdin 连接到 stdout 如下所示,速度非常快。

process.stdin.pipe(process.stdout);

运行如下:

$ dd if=/dev/zero bs=8M count=128 | pv | dd of=/dev/null
128+0 records in 174MB/s] [        <=>                                                                                ]
128+0 records out
1073741824 bytes (1.1 GB) copied, 5.78077 s, 186 MB/s
   1GB 0:00:05 [ 177MB/s] [          <=>                                                                              ]
2097152+0 records in
2097152+0 records out
1073741824 bytes (1.1 GB) copied, 5.78131 s, 186 MB/s
$ dd if=/dev/zero bs=8M count=128 |  dd of=/dev/null
128+0 records in
128+0 records out
1073741824 bytes (1.1 GB) copied, 5.57005 s, 193 MB/s
2097152+0 records in
2097152+0 records out
1073741824 bytes (1.1 GB) copied, 5.5704 s, 193 MB/s
$ dd if=/dev/zero bs=8M count=128 | node test.js | dd of=/dev/null
128+0 records in
128+0 records out
1073741824 bytes (1.1 GB) copied, 4.61734 s, 233 MB/s
2097152+0 records in
2097152+0 records out
1073741824 bytes (1.1 GB) copied, 4.62766 s, 232 MB/s
$ dd if=/dev/zero bs=8M count=128 | node test.js | dd of=/dev/null
128+0 records in
128+0 records out
1073741824 bytes (1.1 GB) copied, 4.22107 s, 254 MB/s
2097152+0 records in
2097152+0 records out
1073741824 bytes (1.1 GB) copied, 4.23231 s, 254 MB/s
$ dd if=/dev/zero bs=8M count=128 | dd of=/dev/null
128+0 records in
128+0 records out
1073741824 bytes (1.1 GB) copied, 5.70124 s, 188 MB/s
2097152+0 records in
2097152+0 records out
1073741824 bytes (1.1 GB) copied, 5.70144 s, 188 MB/s
$ dd if=/dev/zero bs=8M count=128 | node test.js | dd of=/dev/null
128+0 records in
128+0 records out
1073741824 bytes (1.1 GB) copied, 4.51055 s, 238 MB/s
2097152+0 records in
2097152+0 records out
1073741824 bytes (1.1 GB) copied, 4.52087 s, 238 MB/s

最佳答案

我不知道你的问题的答案,但也许这有助于你调查问题。

在 Node.js 中 documentation关于流缓冲,它说:

Both Writable and Readable streams will store data in an internal buffer that can be retrieved using writable.writableBuffer or readable.readableBuffer, respectively.

The amount of data potentially buffered depends on the highWaterMark option passed into the stream's constructor. For normal streams, the highWaterMark option specifies a total number of bytes. For streams operating in object mode, the highWaterMark specifies a total number of objects....

A key goal of the stream API, particularly the stream.pipe() method, is to limit the buffering of data to acceptable levels such that sources and destinations of differing speeds will not overwhelm the available memory.

因此,您可以使用缓冲区大小来提高速度:

var fs = require('fs');
var path = require('path');
var from = path.normalize(process.argv[2]);
var to = path.normalize(process.argv[3]);

var readOpts = {highWaterMark: Math.pow(2,16)};  // 65536
var writeOpts = {highWaterMark: Math.pow(2,16)}; // 65536  

var source = fs.createReadStream(from, readOpts);
var destiny = fs.createWriteStream(to, writeOpts)

source.pipe(destiny);

关于node.js - NodeJS通过流复制文件非常慢,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24005496/

相关文章:

matlab - 如何在 MATLAB 中从文本文件创建矩阵?

python - 在 Python 中将多个文件流式传输到可读对象中

java从文件初始化对象

node.js - 在 Redis 回调中使用 node.js http 代理

node.js - 使用 OneSignal 向多个用户发送推送通知

node.js - vue-cli 加载程序不使用 webpack-simple 的默认初始化

performance - 带有 JBoss 'minimal' 配置的 Seam 应用程序?

jquery - jQuery 选择器与局部变量的性能

xml - 如何返回 Marklogic 中元素范围索引中的所有元素

javascript - NodeJS、Windows、wmic 标准输入