java - 为什么说: CharacterStream classes are used to perform the input/output for the 16-bit Unicode characters?

When an I/O stream manages 8-bit bytes of raw binary data, it is called a byte stream. And, when the I/O stream manages 16-bit Unicode characters, it is called a character stream.

字节流清晰。它使用8 位字节。因此，如果我要编写一个使用 3 个字节 的字符，它只会写入最后的 8 位!从而产生错误的输出。

这就是我们使用字符流的原因。假设我想写拉丁文大写字母 Ạ。我需要 3 个字节 以 UTF-8 格式存储。但是假设我还想存储“普通”A。现在需要 1 个字节 来存储。

你看到模式了吗？在我们转换它们之前，我们无法知道写入任何这些字符需要多少字节。所以我的问题是为什么说character streams manage 16-bit Unicode characters？如果我写的 Ạ 需要 3 个字节，它不会像 byte streams 剪切最后的 8 位。那这句话到底是什么意思？

最佳答案

在 Java 中，String 由一系列 16 位的 char 组成，表示以 UTF-16 编码存储的文本。

Charset 是描述如何将 Unicode 字符转换为字节序列的对象。 UTF-8 是字符集的一个例子。

像 Writer 这样的字符流，当它输出到一个包含字节的东西时——一个文件，或者像 OutputStream 这样的字节输出流——使用一个 Charset 将 String 转换为简单的字节序列进行输出。 (从技术上讲，它将 UTF-16 字符转换为 Unicode 字符，然后使用 Charset 将它们转换为字节序列。)Reader 在从字节源读取时，确实反向转换。

在 UTF-16 中，Ạ 表示为 16 位的 char 0x1EA1。它在 UTF-16 中只需要 16 位，而不是像在 UTF-8 中那样需要 24 位。

如果您使用 UTF-8 编码将其转换为字节，如下所示:

ByteArrayOutputStream baos = new ByteArrayOutputStream();
Writer writer = new OutputStreamWriter(baos, StandardCharsets.UTF_8);
writer.write("Ạ");
writer.close();
return baos.toByteArray();

然后你会得到预期的 3 字节序列 0xE1 0xBA 0xA1。

关于java - 为什么说: CharacterStream classes are used to perform the input/output for the 16-bit Unicode characters?，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/63770350/

java - 为什么说: CharacterStream classes are used to perform the input/output for the 16-bit Unicode characters?

上一篇：c# - 在 Entity Framework 中使用 Postgis 的几何类型

下一篇：cassandra - 在 Cassandra 中重新创建具有相同名称的表/键空间不好吗？