sql - 等于 (=) 与 LIKE

标签 sql performance equals sql-like

使用 SQL 时,在 WHERE 子句中使用 = 代替 LIKE 有什么好处吗?

如果没有任何特殊的运算符,LIKE= 是相同的,对吧?

最佳答案

不同的运算符

LIKE= 是不同的运算符。这里的大多数答案都集中在通配符支持上,这并不是这些运算符之间的唯一区别!

= 是对数字和字符串进行操作的比较运算符。比较字符串时,比较运算符会比较整个字符串

LIKE 是一个字符串运算符,用于逐个字符进行比较。

让事情变得复杂的是,两个运算符都使用 collation这会对比较结果产生重要影响。

激励示例

让我们首先举一个例子,其中这些运算符产生明显不同的结果。请允许我引用MySQL手册:

Per the SQL standard, LIKE performs matching on a per-character basis, thus it can produce results different from the = comparison operator:

mysql> SELECT 'ä' LIKE 'ae' COLLATE latin1_german2_ci;
+-----------------------------------------+
| 'ä' LIKE 'ae' COLLATE latin1_german2_ci |
+-----------------------------------------+
|                                       0 |
+-----------------------------------------+
mysql> SELECT 'ä' = 'ae' COLLATE latin1_german2_ci;
+--------------------------------------+
| 'ä' = 'ae' COLLATE latin1_german2_ci |
+--------------------------------------+
|                                    1 |
+--------------------------------------+

请注意,MySQL 手册的这一页称为“字符串比较函数”,并且没有讨论 =,这意味着 = 是严格来说不是字符串比较函数。

= 如何工作?

SQL Standard § 8.2描述 = 如何比较字符串:

The comparison of two character strings is determined as follows:

a) If the length in characters of X is not equal to the length in characters of Y, then the shorter string is effectively replaced, for the purposes of comparison, with a copy of itself that has been extended to the length of the longer string by concatenation on the right of one or more pad characters, where the pad character is chosen based on CS. If CS has the NO PAD attribute, then the pad character is an implementation-dependent character different from any character in the character set of X and Y that collates less than any string under CS. Otherwise, the pad character is a <space>.

b) The result of the comparison of X and Y is given by the collating sequence CS.

c) Depending on the collating sequence, two strings may compare as equal even if they are of different lengths or contain different sequences of characters. When the operations MAX, MIN, DISTINCT, references to a grouping column, and the UNION, EXCEPT, and INTERSECT operators refer to character strings, the specific value selected by these operations from a set of such equal values is implementation-dependent.

(强调已添加。)

这是什么意思?这意味着在比较字符串时,= 运算符只是当前排序规则的薄包装。排序规则是一个具有各种比较字符串规则的库。这是 a binary collation from MySQL 的示例:

static int my_strnncoll_binary(const CHARSET_INFO *cs __attribute__((unused)),
                               const uchar *s, size_t slen,
                               const uchar *t, size_t tlen,
                               my_bool t_is_prefix)
{
  size_t len= MY_MIN(slen,tlen);
  int cmp= memcmp(s,t,len);
  return cmp ? cmp : (int)((t_is_prefix ? len : slen) - tlen);
}

这种特殊的排序规则恰好是逐字节比较(这就是为什么它被称为“二进制”——它没有赋予字符串任何特殊含义)。其他排序规则可能提供更高级的比较。

例如,这是 UTF-8 collation支持不区分大小写的比较。该代码太长,无法粘贴到此处,但请转到该链接并阅读 my_strnncollsp_utf8mb4() 的正文。此排序规则可以一次处理多个字节,并且可以应用各种转换(例如不区分大小写的比较)。 = 运算符完全从变幻莫测的排序规则中抽象出来。

LIKE 是如何工作的?

SQL Standard § 8.5描述 LIKE 如何比较字符串:

The <predicate>

M LIKE P

is true if there exists a partitioning of M into substrings such that:

i) A substring of M is a sequence of 0 or more contiguous <character representation>s of M and each <character representation> of M is part of exactly one substring.

ii) If the i-th substring specifier of P is an arbitrary character specifier, the i-th substring of M is any single <character representation>.

iii) If the i-th substring specifier of P is an arbitrary string specifier, then the i-th substring of M is any sequence of 0 or more <character representation>s.

iv) If the i-th substring specifier of P is neither an arbitrary character specifier nor an arbitrary string specifier, then the i-th substring of M is equal to that substring specifier according to the collating sequence of the <like predicate>, without the appending of <space> characters to M, and has the same length as that substring specifier.

v) The number of substrings of M is equal to the number of substring specifiers of P.

(强调已添加。)

这实在是太啰嗦了,所以让我们来分解一下。第 ii 项和第 iii 项分别指通配符 _%。如果 P 不包含任何通配符,则仅适用第 iv 项。这是 OP 提出的兴趣案例。

在本例中,它使用当前排序规则将 M 中的每个“子字符串”(单个字符)与 P 中的每个子字符串进行比较。

结论

底线是,在比较字符串时,= 比较整个字符串,而 LIKE 一次比较一个字符。两个比较都使用当前排序规则。在某些情况下,这种差异会导致不同的结果,如本文第一个示例所示。

您应该使用哪一个?没有人可以告诉你这一点——你需要使用适合你的用例的那个。不要通过切换比较运算符来过早地优化。

关于sql - 等于 (=) 与 LIKE,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/543580/

相关文章:

sql - 在数据库本身中存储数据库版本

java - 从数据库 (SQL) 中检索图像 (Long Blob) 并将其显示到 JLabel

c++ - C++ 中的快速百分位数

java - 当索引位于自定义类上时,Spark join() 如何工作?

java - 为什么 List.contains(Object) 的行为不同?

java - 在 .equals() 中添加针对不同值类型的检查

php - Mysql查询按月获取各产品当前总销量

php - 避免由于延迟时间和多个请求而导致的双重数据库查询

php - SQL 插入速度的变化

java - 多线程 ByteBuffers 比顺序慢?