go - 如何在 golang 中处理(解码或删除无效的 Unicode 代码点)带有表情符号的字符串？

示例字符串:

"\u0410\u043b\u0435\u043a\u0441\u0430\u043d\u0434\u0440\u044b! \n\u0421\u043f\u0430\u0441\u0438\u0431\u043e \ud83d\udcf8 link.ru \u0437\u0430 \n#hashtag  Русское слово, an English word"

没有这个 \ud83d\udcf8 我的函数运行良好:

func convertUnicode(text string) string {
    s, err := strconv.Unquote(`"` + text + `"`)
    if err != nil {
        // Error.Printf("can't convert: %s | err: %s\n", text, err)
        return text
    }
    return s
}

我的问题是如何检测包含此类条目的文本？以及如何将其转换为表情符号或如何从文本中删除？谢谢

最佳答案

好吧，可能没那么简单，因为 \ud83d 和 \udcf8 都不是有效的代码点，而是 UTF-16 编码中使用的代理对来编码 \U0001F4F8。现在 strconv.Unquote 会给你两个代理部分，你必须自己组合它们。

像您一样使用 strconv.Unquote 取消引用。
为方便起见转换为 []rune。
使用 unicode/utf16.IsSurrogate 查找代理对。
将代理对与 unicode/utf16.DecodeRune 组合。
转换回字符串。

关于go - 如何在 golang 中处理(解码或删除无效的 Unicode 代码点)带有表情符号的字符串？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/52879524/

上一篇：golang ParseQuery url 给了我错误的输出

下一篇：go - 如何自己产生熵？ - RSA Golang

相关文章：

php - 苏格兰国旗在我的页面上不起作用，但其他 Emoji 5 可以

go - 输入 noRows struct{} var _ Result = noRows{}

java - Unicode 字符未在 Swing 中呈现，实际使用的是什么字体？

javascript - 是否可以生成所有表情符号并附加到选择下拉列表中？

python3 : Unescape unicode escapes surrounded by unescaped characters

python - 升级 Google Application Engine 程序以使用 unicode

ios - 为 iPhone 创建自定义国际键盘

go - 如何将 []byte 数据转换为 uint16？

html - 如何在go模板中传入map "created on the way"

go - golang ast遍历中如何从子节点检索父节点？