sql - 从每个类别中选择n个样本

我有一列，分数，它是介于1和5之间（含1和5）的整数。
我试图从每个分数中选择n个（在这种情况下为2000个）样本。
我自己的黑客攻击和其他SO问题导致我构造了以下查询：

select * from (select text, score from data where score= 1 and LENGTH(text) > 45 limit 2000)
union
select * from (select text, score from data where score= 2 and LENGTH(text) > 45 limit 2000)
union
select * from (select text, score from data where score= 3 and LENGTH(text) > 45 limit 2000)
union
select * from (select text, score from data where score= 4 and LENGTH(text) > 45 limit 2000)
union
select * from (select text, score from data where score= 5 and LENGTH(text) > 45 limit 2000)

感觉这是最糟糕的方式，更重要的是，当我分别运行每个查询时，它给我2k个结果，但是，当我运行此联合时，我得到的行数不到1万行
我正在寻找帮助来优化此查询，但是更重要的是我想了解为什么工会返回错误的结果数

最佳答案

关于查询为什么返回错误数量的结果的原因，我敢打赌您的数据不在每个查询返回的结果集中的distinct内。使用union时，它将返回整个结果集中的distinct行。

尝试将其更改为union all：

select * from (select text, score from data where score= 1 and LENGTH(text) > 45 limit 2000)
union all
select * from (select text, score from data where score= 2 and LENGTH(text) > 45 limit 2000)
union all
select * from (select text, score from data where score= 3 and LENGTH(text) > 45 limit 2000)
union all
select * from (select text, score from data where score= 4 and LENGTH(text) > 45 limit 2000)
union all
select * from (select text, score from data where score= 5 and LENGTH(text) > 45 limit 2000)

Here's a condensed demo showing the difference.

如果您具有主键（例如自动增量），那么这是另一种方法，它为每组分数生成一个row_number（这假设一个id主键）：

select text, score
from (
  select text, score, 
         (select count(*) from data b 
          where a.id >= b.id and 
                a.score = b.score and 
                length(b.text) > 45) rn
  from data a
  where length(text) > 45
  ) t
where rn <= 2000

关于sql - 从每个类别中选择n个样本，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/50803974/

sql - 从每个类别中选择n个样本

上一篇：loopbackjs - 停止分离的 strongloop 应用程序

下一篇：laravel - “数据库不存在”-在Laravel中为API路由设置SQLITE数据库的路径