hadoop - Hive按组计算中位数和平均值

标签 hadoop hive hql hiveql cloudera

我有一个按州和县计数的数据集,我想按州和县计算中位数和平均值,例如:

有:

ID  state    county  count
1   MD       aa          2
2   MD       aa          4
3    VA        bb         1
4    VA        bb         2
5    VA        bb         4
6    VA        cc          7
7    VA        cc          8

想:

enter image description here

到目前为止,给我的错误是:
Select id,  STATE,COUNTY,count,
percentile(cast(count as BIGINT), 0.5) OVER() as overall_median, 
round(avg(count),2) OVER() as overall_avg,

percentile(cast(count as bigint),0.5) OVER(PARTITION BY id,STATE) as med_state,
percentile(cast(count as bigint),0.5) as med_county,

AVG(count) OVER (PARTITION BY id, STATE) as avg_state,
AVG(count) AS avg_county,
from have
group by id, state, county

不使用组时收到错误:

ERROR: Execute error: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns. Also check for circular dependencies.Underlying error: org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:457 Expression not in GROUP BY key 'id'



不带组的代码:
Select id,  STATE,county,count,
percentile(cast(count as BIGINT), 0.5) OVER() as overall_median, 
round(avg(count),2) OVER() as overall_avg,

percentile(cast(count as bigint),0.5) OVER(PARTITION BY id,STATE) as med_state,
percentile(cast(count as bigint),0.5) OVER(PARTITION BY id,STATE,county) as med_county,


AVG(count) OVER (PARTITION BY id, STATE) as avg_state,
AVG(count) OVER (PARTITION BY id, STATE, county) as avg_county,
from have

谢谢!

最佳答案

修复:回合(avg(count)OVER(),2)

    select 
        id, STATE, county, count,
        percentile(cast(count as BIGINT), 0.5) OVER() as overall_median, 
        round(avg(count) OVER(), 2) as overall_avg,

        percentile(cast(count as bigint), 0.5) OVER(PARTITION BY id, STATE) as med_state,
        percentile(cast(count as bigint), 0.5) OVER(PARTITION BY id, STATE, county) as med_county,

        AVG(count) OVER (PARTITION BY id, STATE) as avg_state,
        AVG(count) OVER (PARTITION BY id, STATE, county) as avg_county
    from 
        have

提示:请勿将关键字(即计数)用作列名-将来会遇到很多问题

关于hadoop - Hive按组计算中位数和平均值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60420645/

相关文章:

hibernate - 将 SQL 转换为 HQL

linux - 启动 HBase Shell - Zookeeper 存在但失败

hadoop - 无法检查 hadoop 上的节点 [连接被拒绝]

hadoop - 启动namenode和datanode时出错

hadoop - 在配置单元的外部表中创建分区

hibernate - 如何避免 HQL 和 Criteria 中不必要的选择和连接

java - HQL查询会不会使用Hibernate二级缓存

java - 尝试在 Apache Kylin 中为示例数据构建多维数据集时出现 java.io.FileNotFoundException : File does not exist: hive-exec-2. 1.0.jar 错误

python - 我可以在分区的配置单元表上使用 mrjob python 库吗?

date - 如何从时间戳中分离日期和时间并将其存储在Hive的另一个表中