mysql - GROUP BY 对不相等的值进行分组

标签 mysql group-by

在 Debian 上使用 MySQL 5.1.66-0+squeeze1-log,我得到了一个我不理解的 GROUP BY 结果。

如果我 GROUP BY data,不相等的 data 值会合并,这对我来说没有任何意义。如果我对同一列的哈希值进行 GROUP BY,SHA1(data),一切正常,并且只有 data 的相等值会合并到组中。

这是怎么回事?看起来 GROUP BY 似乎只考虑列的前 x 个字符。如果不是那样,为什么还会发生这种情况?这可能只是我大脑的一个扭曲?

编辑 1: data 的示例值(json 编码的旧版 - 回到我更笨的时候 ;)):

{"a":[{"val":{"tcn":{"1980":"1","1981":"1","1982":"1","1983":"1","1984":"1","1985":"1","1986":"1","1987":"1","1988":"1","1989":"1","1990":"1","1991":"1","1992":"1","1993":"1","1994":"1","1995":"1","1996":"1","1997":"1","1998":"1","1999":"1","2000":"1","2001":"1","2002":"1","2003":"1","2004":"1","2005":"1","2006":"1","2007":"1","2008":"1","2009":"1","2010":"1"},"sic":{"1980":"1","1981":"1","1982":"1","1983":"1","1984":"1","1985":"1","1986":"1","1987":"1","1988":"1","1989":"1","1990":"1","1991":"1","1992":"1","1993":"1","1994":"1","1995":"1","1996":"1","1997":"1","1998":"1","1999":"1","2000":"1","2001":"1","2002":"1","2003":"1","2004":"1","2005":"1","2006":"1","2007":"1","2008":"1","2009":"1","2010":"1"}}}],"b":[{"val":{"tcn":{"1980":"1","1981":"1","1982":"1","1983":"1","1984":"1","1985":"1","1986":"1","1987":"1","1988":"1","1989":"1","1990":"1","1991":"1","1992":"1","1993":"1","1994":"1","1995":"1","1996":"1","1997":"1","1998":"1","1999":"1","2000":"1","2001":"1","2002":"1","2003":"1","2004":"1","2005":"1","2006":"1","2007":"1","2008":"1","2009":"1","2010":"1"},"sic":{"1980":"1","1981":"1","1982":"1","1983":"1","1984":"1","1985":"1","1986":"1","1987":"1","1988":"1","1989":"1","1990":"1","1991":"1","1992":"1","1993":"1","1994":"1","1995":"1","1996":"1","1997":"1","1998":"1","1999":"1","2000":"1","2001":"1","2002":"1","2003":"1","2004":"1","2005":"1","2006":"1","2007":"1","2008":"1","2009":"1","2010":"1"}}}],"0":[{"val":{"com":{"able":"2"}},"str":{"com":{"comm":"According","src":{"1":{"name":"law 256","articles":"B2\/2.11","links":"","type":""},"2":{"name":"law 298","articles":"B.19\/2.3","links":"","type":""}}}}}]}

编辑 2: 很抱歉遗漏了代码,我认为这会使它更短更容易。显然情况恰恰相反……

SELECT
    GROUP_CONCAT(resid) AS ids
    ,data
FROM resdata
GROUP BY data

对比

SELECT
    GROUP_CONCAT(resid) AS ids
    ,CAST(SHA1(data) AS CHAR(40)) AS hash
    ,data
FROM resdata
GROUP BY hash

最佳答案

我终于明白了。该问题仅在存在 GROUP_CONCAT() 时出现,同样在 GROUP_CONCAT() row count when grouping by a text field 中讨论过(我只是在弄清楚它链接到 concat :s 之后才发现的)。

ORDER BY、DISTINCT 和(间接)GROUP_CONCAT() 都依赖于 max_sort_length系统变量。任何使用这些运算符/函数的查询都只会考虑列的前 max_sort_length 字节,在我的例子中是默认的 1024 字节。

虽然 GROUP BY 不使用 ORDER BY,但默认情况下 GROUP_CONCAT() 在 GROUP BY 语句中使用的列上使用 ORDER BY。 (感谢 Saharsh Shah , Jan 4 at 12:42 )

我的 data 列中的大多数值都比 max_sort_length 长得多。在我的例子中,有 377 行的前 1024 个字节相同,但其余部分不同。因此,在我的例子中,DISTINCT 和 GROUP BY 将只返回 2360 行,即使有 2737 个不同的值。

因此,对文本长度超过 max_sort_length 的文本列进行分组时要小心!它可能不代表在对 INT 和较小的 CHAR 进行运算时所使用的不同结果。 DISTINCT 将显示相同的行为,这将在使用它检查 GROUP BY 的完整性时给您一个误报。

关于mysql - GROUP BY 对不相等的值进行分组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18597607/

相关文章:

mysql - 在 Laravel 中使用 DataTables 时出现无效参数编号错误

php - 统计按月份 w 分组的状态的 SQL 查询。修订制度

cakephp - CakePHP 2.0+ 中使用 Group By 进行分页

MySQL如何使用GROUP BY和BETWEEN?

php - 在 PDO 准备语句中使用 if else 连接查询

php - AngularJS 表单数据未存储在 MySQL 数据库中

MySQL 错误 #1452 + 创建外键时出错

sql - Oracle SQL 按字符串聚合字段分组

sql - 如何将别名添加到 group by 子句

php - 如果连接到数据库,Bootstrap 中的类行将不起作用