java - 读取UTF8文件并与String进行比较

我正在尝试读取 UTF8 文本文件，然后使用 equals() 进行文本比较，应该返回 true。但事实并非如此，因为 getBytes() 返回不同的值。

这是一个最小的例子:

public static void main(String[] args) throws Exception {
  System.out.println(Charset.defaultCharset()); // UTF-8
  InputStream is = new FileInputStream("./myUTF8File.txt");
  BufferedReader in = new BufferedReader(new InputStreamReader(is, "UTF8"));
  String line;
  while ((line = in.readLine()) != null) {
    System.out.print(line); // mouseover
    byte[] bytes = line.getBytes(); // [-17, -69, -65, 109, 111, 117, 115, 101, 111, 118, 101, 114]
    String str = "mouseover";
    byte[] bytesStr = str.getBytes(); // [109, 111, 117, 115, 101, 111, 118, 101, 114]
    if (line.equals(str)) { // false
      System.out.println("equal");
    }
  }
}

我希望字符串在 line.readLine() 处转换为 UTF-16，并且 equals 返回 true。无法弄清楚为什么。

最佳答案

文件的起始字节:

-17, -69, -65

是BOM: Byte Order Mark的字节...您的数据的一些相关性:

[-17, -69, -65, 109, 111, 117, 115, 101, 111, 118, 101, 114]
               [109, 111, 117, 115, 101, 111, 118, 101, 114]

此外，字符集的正确名称是“UTF-8” - 请注意破折号

BufferedReader in = new BufferedReader(new InputStreamReader(is, "UTF-8"));

关于java - 读取UTF8文件并与String进行比较，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/19227733/

上一篇：Java多线程消息传递

下一篇：java - 我可以下载 JavaMail (IMAP) 中日期之间收到的电子邮件吗？

相关文章：

python - 关于可打印的 UTF8 字符(用于用户名)是否达成共识？

windows - 为什么 Windows 使用 ANSI 代码页而不是 UNICODE？

javascript - 将 UTF 8 字符串代码(代码保存在字符串变量中)转换为 UTF 16 字符串(实际 utf 16 个字符)

java - 插入排序列表时出现 IndexOutOfBounds 错误

java - 找不到亚马逊凭据方法

java - 如何在通过ExecutorService生成的线程名称中添加前缀

python - 将 UTF-16 转换为 UTF-8

java - 其他节点未收到帧 [UNETSTACK]

python - Beautiful Soup 默认解码字符集？

java 将 unicode 字符打印到 bash shell (mac OsX)