sql - 如何修复 PostgreSQL 中错误的最重复值

标签 sql postgresql

我有一个查询,它从 my_table 中选择最重复的值。查询如下:

SELECT
gid,
    max_height
    FROM
    (
    SELECT gid, max_height,
    ROW_NUMBER() OVER (PARTITION BY gid ORDER BY freq DESC) AS rn
    FROM (
            SELECT gid, max_height, COUNT(id) AS freq
            FROM my_table
            GROUP BY 1, 2
    order by 1,2
    ) hgt_freq
    ) ranked_hgt_req
WHERE rn = 1

而 my_table 包含三列,例如:

gid id  max_height
3   1   19.3
3   2   19.3
3   3   20.3
3   4   20.3
3   5   19.3
3   6   19.3
3   7   21.4
3   8   21.4
3   9   21.4
3   10  21.4
3   11  21.4
3   12  21.4
22  1   23.1
22  2   23.1
22  3   23.1
22  4   23.1
22  5   23.1
22  6   23.1
22  7   22.1
22  8   22.1
22  9   22.1
22  10  22.1
22  11  22.1
22  12  22.1
29  1   24
29  2   24
29  3   24
29  4   18.9
29  5   18.9
29  6   18.9
29  7   NULL
29  8   NULL
29  9   27.1
29  10  27.1
29  11  6.5
29  12  6.5

此查询的问题是它按降序返回最重复的值,这在 gid = 22 的情况下给出了错误的值。查询的输出是:

gid    max_height
3      21.4
22     22.1
29     24.0

对于 gid = 22 的情况,有两个最重复的值,即 23.1 和 22.1。因此查询应返回 23.1。谁能指出我如何解决这个问题,或者有更好的方法来解决这个问题吗?对于大型记录 (gids),该过程需要自动化。

最佳答案

使用distinct on:

select distinct on(gid) gid, max_height
from (
    select gid, max_height, count(id) as freq
    from my_table
    group by 1, 2
    ) s
order by gid, freq desc

 gid | max_height 
-----+------------
   3 |       21.4
  22 |       23.1
  29 |         24
(3 rows)

来自 the documentation:

SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first.


gid=29 有两个最常见的值。在这种情况下,您可以通过在 order by 中添加一个条件来选择它们的显示顺序:

select distinct on(gid) gid, max_height
from (
    select gid, max_height, count(id) as freq
    from my_table
    group by 1, 2
    ) s
order by gid, freq desc, max_height desc;

 gid | max_height 
-----+------------
   3 |       21.4
  22 |       23.1
  29 |         24
(3 rows)    

select distinct on(gid) gid, max_height
from (
    select gid, max_height, count(id) as freq
    from my_table
    group by 1, 2
    ) s
order by gid, freq desc, max_height;

 gid | max_height 
-----+------------
   3 |       21.4
  22 |       22.1
  29 |       18.9
(3 rows)    

关于sql - 如何修复 PostgreSQL 中错误的最重复值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44467033/

相关文章:

mysql - 是否可以包含连接表中除连接表之外的所有字段?

MySQL 在触发时自动导出到新的 csv 文件

php - 使用 PDO 准备好插入后获取最后一个插入 ID

postgresql - 带参数和 dblink 的 postgres 函数

ruby-on-rails - Rails 4 迁移 : has_and_belongs_to_many table name

SQL Server 加入不同数据类型的性能

php - Laravel - 多行插入和检索 id

mysql - 我不能在 sql delete 中使用别名

postgresql - 关系 “users” 不存在。数据库

postgresql - information_schema.tables 和 pg_tables 的区别