sql - 在 HIVE 的子组中使用排名

嘿，我正在尝试使用 rank() 让这个查询正常工作，但我没有运气，

select t.orig, t.id, count(*) as num_actions, 
rank() over (partition by t.orig order by count(*) desc) as rank
from sample_table t
where rank < 21 
and t.month in (201607,20608,201609,201610,201611,201612) 
and t.orig in (select tw.pageid from tw_sample as tw limit 50) 
group by t.orig, t.id

我不断得到，

FAILED: SemanticException [Error 10004]: Line 4:6 Invalid table alias or column reference 'rank'

我的目标是根据 count(*) 参数为每个 t.orig 获取前 20 行。

如果您还可以解释我哪里出错了，以便我从中吸取教训，那将不胜感激。

最佳答案

您不能在 where 子句中使用别名。使用子查询:

select *
from (select t.orig, t.id, count(*) as num_actions, 
             rank() over (partition by t.orig order by count(*) desc) as rnk
      from sample_table t
      where t.month in (201607, 20608, 201609, 201610, 201611, 201612)  and
            t.orig in (select tw.pageid from tw_sample tw limit 50) 
      group by t.orig, t.id
     ) t
where rank < 21

关于sql - 在 HIVE 的子组中使用排名，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/43620885/

上一篇：hadoop - 了解 Hadoop MapReduce 中映射和 reduce task 的数量

下一篇：java - sqoop import-all-tables slow 和 sequence files 是自定义 java objects

hadoop - 配置单元数据加载问题

sql - Hive 查询在 group by 期间根据另一列选择一列

hadoop - 如何在不移动数据的情况下从具有不同分区的另一个配置单元表创建配置单元表？

hadoop - 如何放置 hive 列？

没有数据的 MySql 导出模式

mysql - 在 where 子句 MySql 中使用嵌套选择查询

SQL - 选择具有多个类别的不同行

hadoop - 如何将Cassandra转换为HDFS文件系统以进行Shark/Hive查询

hadoop - 使用适度的资源对Google图书n-gram数据集进行处理的最可行的选择是什么？