java - 当java中的字符串具有双字节字符时，如何对字符串进行子串

我有一个可能同时包含 Unicode 和 UTF-8 字符的字符串。当我想将它们保存到不处理 Unicode 字符的数据库时，这会变得很困难。我使用的数据库是 PostgreSQL。对于某个专栏来说，它们可能太大了，这是我的情况的一个简单示例

public static void main(String[] args) {
    String test= "İİİİİİİİİİ";
    byte[] bytesOrig = null;
    bytesOrig = test.getBytes("UTF-8");
    System.out.println("bytesOrig="+new String(bytesOrig));
    byte[] bytesFive = new byte[5];

    System.arraycopy(bytesOrig, 0, bytesFive, 0, 5);
    System.out.println("bytes-Five="+new String(bytesFive));
    System.out.println("Substring="+test.substring(0,5));
    System.out.println("Substring real length=" + test.substring(0,5).getBytes().length);
}

我无法使用 String.substring 方法，因为它在双字节字符的情况下对我没有帮助 - 我已尝试使用字节数组副本，但这意味着最后一个字符可能会被删除。

这是调试信息:

    bytesOrig=İİİİİİİİİİ
    bytes-Five=İİ�
    Substring=İİİİİ
    Substring real length=10

你可以看到，因为我只有字节数组的一部分 - 我不希望最后一个字符显示为 �

最佳答案

你可以试试这个:改变的地方会在评论中显示。

public static void main(String[] args) throws UnsupportedEncodingException {
    String test= "İİİİİİİİİİ";
    System.out.println("test.length() = " + test.length()); // out: 10
    byte[] bytesOrig;
    bytesOrig = test.getBytes("UTF-8"); // but after get bytes will return 20
    System.out.println("bytesOrig.length = " + bytesOrig.length); // it
    System.out.println("bytesOrig="+new String(bytesOrig));
    byte[] bytesFive = new byte[10]; // 1. so change here to twice

    System.arraycopy(bytesOrig, 0, bytesFive, 0, 10); // 2. change here also
    System.out.println("bytes-Five="+new String(bytesFive));
    System.out.println("Substring="+test.substring(0,5));
    System.out.println("Substring real length=" + test.substring(0,5).getBytes().length);
}

这是输出:

test.length() = 10
bytesOrig.length = 20
bytesOrig=İİİİİİİİİİ
bytes-Five=İİİİİ
Substring=İİİİİ
Substring real length=10

关于java - 当java中的字符串具有双字节字符时，如何对字符串进行子串，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/39567113/

java - 当java中的字符串具有双字节字符时，如何对字符串进行子串

上一篇：Java Spring上传文件到FTP服务器

下一篇：Java Map get 方法返回 null