我正在尝试在 Postgres 中提交一个只返回不同元组的查询。在我的示例查询中,我不希望 cluster_id/feed_id 组合的条目多次存在的重复条目。如果我做一个简单的:
select distinct on (cluster_info.cluster_id, feed_id)
cluster_info.cluster_id, num_docs, feed_id, url_time
from url_info
join cluster_info on (cluster_info.cluster_id = url_info.cluster_id)
where feed_id in (select pot_seeder from potentials)
and num_docs > 5 and url_time > '2012-04-16';
我明白了,但我还想根据 num_docs
进行分组。所以,当我执行以下操作时:
select distinct on (cluster_info.cluster_id, feed_id)
cluster_info.cluster_id, num_docs, feed_id, url_time
from url_info join cluster_info
on (cluster_info.cluster_id = url_info.cluster_id)
where feed_id in (select pot_seeder from potentials)
and num_docs > 5 and url_time > '2012-04-16'
order by num_docs desc;
我收到以下错误:
ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions
LINE 1: select distinct on (cluster_info.cluster_id, feed_id) cluste...
我想我明白为什么我会收到错误(除非我以某种方式明确描述组,否则不能按元组分组)但我该怎么做?或者,如果我对错误的解释不正确,是否有办法实现我最初的目标?
最佳答案
最左边的 ORDER BY
项不能与 DISTINCT
子句的项不一致。我引用 the manual about DISTINCT
:
The
DISTINCT ON
expression(s) must match the leftmostORDER BY
expression(s). The ORDER BY clause will normally contain additional expression(s) that determine the desired precedence of rows within eachDISTINCT ON
group.
尝试:
SELECT *
FROM (
SELECT DISTINCT ON (c.cluster_id, feed_id)
c.cluster_id, num_docs, feed_id, url_time
FROM url_info u
JOIN cluster_info c ON (c.cluster_id = u.cluster_id)
WHERE feed_id IN (SELECT pot_seeder FROM potentials)
AND num_docs > 5
AND url_time > '2012-04-16'
ORDER BY c.cluster_id, feed_id, num_docs, url_time
-- first columns match DISTINCT
-- the rest to pick certain values for dupes
-- or did you want to pick random values for dupes?
) x
ORDER BY num_docs DESC;
或者使用GROUP BY
:
SELECT c.cluster_id
, num_docs
, feed_id
, url_time
FROM url_info u
JOIN cluster_info c ON (c.cluster_id = u.cluster_id)
WHERE feed_id IN (SELECT pot_seeder FROM potentials)
AND num_docs > 5
AND url_time > '2012-04-16'
GROUP BY c.cluster_id, feed_id
ORDER BY num_docs DESC;
如果 c.cluster_id, feed_id
是您在 SELECT
列表中包含列的所有(在本例中)表的主键列,那么这只是适用于 PostgreSQL 9.1 或更高版本。
否则您需要GROUP BY
其余列或聚合或提供更多信息。
关于sql - 如何在 PostgreSQL 查询中对不同的元组进行排序,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/10261627/