c++ - 如何在 C++ 中将 "convert"ISO-8859-7 字符串转换为 UTF-8？

标签 c++ unicode character-encoding

我正在使用 10 年以上的机器，这些机器使用 ISO 8859-7 来表示希腊字符，每个字符使用一个字节。我需要捕获这些字符并将它们转换为 UTF-8，以便将它们注入(inject) JSON 以通过 HTTPS 发送。另外，我使用的是 GCC v4.4.7，我不想升级，所以我不能使用 codeconv 等。

例子:“OΛA”: 我得到字符值 [ 0xcf, 0xcb, 0xc1, ]，我需要写这个字符串 "\u039F\u039B\u0391"。

PS:我不是字符集专家，所以请避免像“ISO 8859 是 Unicode 的一个子集，所以你只需要实现算法”这样的哲学回答。

最佳答案

鉴于要映射的值非常少，一个简单的解决方案是使用查找表。

伪代码:

id_offset    = 0x80  // 0x00 .. 0x7F same in UTF-8
c1_offset    = 0x20  // 0x80 .. 0x9F control characters

table_offset = id_offset + c1_offset

table = [
    u8"\u00A0",  // 0xA0
    u8"‘",       // 0xA1
    u8"’",
    u8"£",
    u8"€",
    u8"₯",
    // ... Refer to ISO 8859-7 for full list of characters.
]

let S be the input string
let O be an empty output string
for each char C in S
    reinterpret C as unsigned char U
    if U less than id_offset       // same in both encodings
        append C to O
    else if U less than table_offset  // control code
        append char '\xC2' to O  // lead byte
        append char C to O
    else
        append string table[U - table_offset] to O

总而言之，我建议改用库来节省一些时间。

关于c++ - 如何在 C++ 中将 "convert"ISO-8859-7 字符串转换为 UTF-8？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/62797594/

上一篇：c++ - 需要具有以下功能的 bluez API 的替代方案

下一篇：c++ - 在 EXPECT_CALL 中使用变量

相关文章：

c++ - 为什么这段代码会失败？ child 不等

C++ 错误 : "use of deleted function" with minGW_32, Qt 5.7.0，windows 10

java - Java (Android) 中的 Unicode 字符串不起作用

Python 如何将 8 位 ASCII 字符串转换为 16 位 Unicode

java - 如何知道文件中有哪些特殊字符？

c++ - std::atomic 可以安全地与 OpenMP 一起使用吗

c++ - 可变函数 : expression contains unexpanded parameter pack 'args'

python - 在我的 python 文件中写入 utf-8 字符串

C++ 将字符串编码为 Unicode - ICU 库

python - 为什么 Eclipse 中的 "Save as UTF-8"修复了 Python UnicodeEncodeError？