mysql - UTF-8:一般?垃圾桶?统一码?

标签 mysql utf-8 collation

我试图弄清楚我应该对各种类型的数据使用什么排序规则。我将存储的内容 100% 是用户提交的。

我的理解是我应该使用 UTF-8 通用 CI(不区分大小写)而不是 UTF-8 二进制。但是,我找不到 UTF-8 General CI 和 UTF-8 Unicode CI 之间的明显区别。

  1. 我应该将用户提交的内容存储在 UTF-8 General 还是 UTF-8 Unicode CI 列中?
  2. UTF-8 二进制适用于什么类型的数据?

最佳答案

一般来说,utf8_general_ciutf8_unicode_ci 快,但不太正确。

这里有区别:

For any Unicode character set, operations performed using the _general_ci collation are faster than those for the _unicode_ci collation. For example, comparisons for the utf8_general_ci collation are faster, but slightly less correct, than comparisons for utf8_unicode_ci. The reason for this is that utf8_unicode_ci supports mappings such as expansions; that is, when one character compares as equal to combinations of other characters. For example, in German and some other languages “ß” is equal to “ss”. utf8_unicode_ci also supports contractions and ignorable characters. utf8_general_ci is a legacy collation that does not support expansions, contractions, or ignorable characters. It can make only one-to-one comparisons between characters.

引用自: http://dev.mysql.com/doc/refman/5.0/en/charset-unicode-sets.html

更详细的解释,请阅读 MySQL 论坛的以下帖子: http://forums.mysql.com/read.php?103,187048,188748

关于 utf8_bin: utf8_general_ciutf8_unicode_ci 都执行不区分大小写的比较。相比之下,utf8_bin 区分大小写(还有其他区别),因为它比较字符的二进制值。

关于mysql - UTF-8:一般?垃圾桶?统一码?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/2344118/

相关文章:

mysql - 使用条件 mysql 获取最大 COUNT(DISTINCT)

php - 功能有效,但无法 +1

java - 将 pdf 文件编码为 JSON 字符串时出错

perl - 在 Perl 中正确处理 UTF-8

MySQL 变音符号不敏感搜索

mysql - H2 是否支持单个列的排序规则定义?

mysql - 创建新模型不起作用 - 没有错误

mysql - 按两列查找和删除重复行

html - X 双条的 Unicode

postgresql - 在 PostgreSQL 中存储不区分大小写的 varchar