java - Java 打印非英文字符不正确

我认为这只是 Python 2 的问题，但现在 java(Windows 10、JDK8)也遇到了类似的问题。

到目前为止，我的搜索几乎没有得到解决。

我从“stdin”输入流中读取了这个值:Viļāni 。当我将其打印到控制台时，我得到:Vi????ni .

相关代码片段如下:

   BufferedReader in = new BufferedReader(new InputStreamReader(System.in, StandardCharsets.UTF_8));

    ArrayList<String> corpus = new ArrayList<String>();
    String inputString = null;
    while ((inputString = in.readLine()) != null) {
        corpus.add(inputString);
    }
    String[] allCorpus = new String[corpus.size()];
    allCorpus = corpus.toArray(allCorpus);
    for (String line : allCorpus) {
        System.out.println(line);
    }

对我的问题的进一步扩展如下:

我读取了一个包含以下两行的文件: を Sōten_Kōro 当我从磁盘读取此内容并输出到第二个文件时，我得到以下输出:

ã‚’ SÅ�ten_KÅ�ro 当我使用 cat testinput.txt | java UTF8Tester 从标准输入读取文件时我得到以下输出:

??? S??ten_K??ro

两者显然都是错误的。我需要能够将正确的字符打印到控制台和文件。我的示例代码如下:

public class UTF8Tester {

    public static void main(String args[]) throws Exception {
        BufferedReader stdinReader = new BufferedReader(new InputStreamReader(System.in, StandardCharsets.UTF_8));
        String[] stdinData = readLines(stdinReader);
        printToFile(stdinData, "stdin_out.txt");

        BufferedReader fileReader = new BufferedReader(new FileReader("testinput.txt"));
        String[] fileData = readLines(fileReader);
        printToFile(fileData, "file_out.txt");

    }

    private static void printToFile(String[] data, String fileName)
            throws FileNotFoundException, UnsupportedEncodingException {
        PrintWriter writer = new PrintWriter(fileName, "UTF-8");
        for (String line : data) {
            writer.println(line);
        }
        writer.close();
    }

    private static String[] readLines(BufferedReader reader) throws IOException {
        ArrayList<String> corpus = new ArrayList<String>();
        String inputString = null;

        while ((inputString = reader.readLine()) != null) {
            corpus.add(inputString);
        }
        String[] allCorpus = new String[corpus.size()];
        return corpus.toArray(allCorpus);
    }

}

真的被困在这里，非常感谢帮助!提前致谢。保罗

最佳答案

System.in/out 将使用默认的 Windows 字符集。
Java String 将在内部使用 Unicode。
FileReader/FileWriter 是使用默认字符集的旧实用程序类，因此它们仅适用于不可移植的本地文件。

您看到的错误是一个特殊字符作为两个字节的 UTF-8 序列，但每个(特殊 UTF-8)字节都解释为默认的单字节编码，但值不存在，因此两次 ? 替换。

要求该字符可以在 System.in 上以默认字符集输入。
然后将字符串从默认字符集转换而来。
以UTF-8写入文件需要指定UTF-8。

因此:

    BufferedReader stdinReader = new BufferedReader(new InputStreamReader(System.in));
    String[] stdinData = readLines(stdinReader);
    printToFile(stdinData, "stdin_out.txt");

    Path path = Paths.get("testinput-utf8.txt");
    List<String> lines = Files.readAllLines(path); // Here the default is UTF-8!

    Path path = Paths.get("testinput-winlatin1.txt");
    List<String> lines = Files.readAllLines(path, "Windows-1252");

    Files.write(lines, Paths.get("file_out.txt"), StandardCharsets.UTF_8);

<小时/>

要检查您当前的计算机系统是否可以处理日语:

System.out.println("Hiragana letter Wo '\u3092'."); // Either を or ?.

看到 ？ 无法实现到默认系统编码的转换。を是 U+3092，u 编码为 ASCII，带有\u3092。

在 Windows 下创建 UTF-8 文本:

Files.write(Paths.get("out-utf8.txt"),
    "\uFEFFHiragana letter Wo '\u3092'.".getBytes(StandardCharsets.UTF_8));

这里我使用了一个丑陋的(通常不需要的)BOM 标记字符 \uFEFF(零宽度空格)，它可以让 Windows 记事本识别 UTF-8 格式的文本。

关于java - Java 打印非英文字符不正确，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/54212554/

java - Java 打印非英文字符不正确

上一篇：java - 使用 Olingo/ODATA 4 编写 ExpressionVisitor 的集成测试

下一篇：java - 使用 UME API 登录 UME