node.js - 使用 ascii 编码读取文件

文件:

聵

脚本:

require("fs").readFile ("file", "ascii", function (e, d){
    console.log(d==="聵") //true
})

这怎么可能？ 聩不是ascii字符，是用3个字节编码的，0xE881B5。我期望得到 è\u0081µ 因为 ascii 字符是用单个字节编码的。如果我使用“二进制”编码阅读，它会打印出 true，这是我对 ascii 编码的期望...

require("fs").readFile ("file", "binary", function (e, d){
    console.log(d === "è\u0081µ") //true
})

这个结果是有意为之的吗？如果 ascii 编码返回与 utf8 编码相同的结果，那么为什么“ascii”是一个可能的参数？

编辑:

这是内容(用HxD程序打开):

Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

00000000  E8 81 B5                                         è.µ

和:

require("fs").readFile ("file", function (e, d){
    console.log (d.toString ("ascii") === "聵") //true
    console.log (d.toString ("utf8") === "聵") //true
    console.log (d.toString ("binary") === "è\u0081µ") //true
    console.log (d) //<Buffer e8 81 b5>
})

问题已提交给开发者:https://github.com/joyent/node/issues/4413

最佳答案

快速回答是，Node 在从 Buffer 转换为字符串时不会执行任何魔法，无论是 ascii 还是 utf8。你的 utf8 字符串是完全无效的 ascii，所以我想理想情况下它会抛出一个错误，但显然它不会。我不希望 è\u0081µ 因为它是无效的 ascii。

可以看到in the Node source ，将缓冲区转换为字符串的代码是 ...slice 函数。 ascii 和 utf8 函数是相同的，导致您看到的行为。这些构造函数没有做任何花哨的事情，它们只是采用字节序列并将其转换为 JS 字符串，假设它在该编码中有效。

两种编码之间的差异来自该文件中的 AsciiWrite 和 Utf8Write 函数，它们以不同的方式处理事情。

例如:

new Buffer("聵", 'ascii') // <Buffer 75>
new Buffer("聵", 'utf8')  // <Buffer e8 81 b5>

正如您从测试中看到的，binary 更符合您的要求。 binary 遍历缓冲区中的每个单独字节并返回一个字符串，其中每个代码点都具有该字节值。

(new Buffer([0xe8, 0x81, 0xb5])).toString('binary').charCodeAt(0); // 0xe8

关于node.js - 使用 ascii 编码读取文件，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/13879617/

node.js - 使用 ascii 编码读取文件

上一篇：node.js - 私有(private)仓库的 NPM 更新失败

下一篇：node.js - 如何测量当前 node.js 进程的内存使用峰值