sql - 查找列中具有重复值的行

我有一个表author_data:

 author_id | author_name
 ----------+----------------
 9         | ernest jordan
 14        | k moribe
 15        | ernest jordan
 25        | william h nailon 
 79        | howard jason
 36        | k moribe

现在我需要的结果是:

 author_id | author_name                                                  
 ----------+----------------
 9         | ernest jordan
 15        | ernest jordan     
 14        | k moribe 
 36        | k moribe

也就是说，对于出现重复的名称，我需要 author_id。我试过这个声明:

select author_id,count(author_name)
from author_data
group by author_name
having count(author_name)>1

但它不起作用。我怎样才能得到这个？

最佳答案

我建议 window function在子查询中:

SELECT author_id, author_name  -- omit the name here if you just need ids
FROM (
   SELECT author_id, author_name
        , count(*) OVER (PARTITION BY author_name) AS ct
   FROM   author_data
   ) sub
WHERE  ct > 1;

您将认识到基本的聚合函数 count()。它可以通过附加 OVER 子句变成窗口函数 - 就像任何其他聚合函数一样。

通过这种方式，它可以每个分区 来计算行数。瞧。

它必须在子查询中完成，因为不能在同一 SELECT 的 WHERE 子句中引用结果(发生在之后 哪里)。见:

Best way to get result count before LIMIT was applied

在没有窗口函数的旧版本(v.8.3 或更早版本)中 - 或者通常 - 这个替代方案执行得非常快:

SELECT author_id, author_name  -- omit name, if you just need ids
FROM   author_data a
WHERE  EXISTS (
   SELECT FROM author_data a2
   WHERE  a2.author_name = a.author_name
   AND    a2.author_id <> a.author_id
   );

如果您关心性能，请在 author_name 上添加索引。

关于sql - 查找列中具有重复值的行，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/22722870/

sql - 查找列中具有重复值的行

上一篇：sql - 使用具有多个连接的 SQL 聚合函数

下一篇：sql - PostgreSQL 将时间范围拆分为天