mysql - Hive 使用子查询填充表

标签 mysql sql hadoop hive

我正在处理 Hadoop 数据库,使用 Hive 作为首选接口(interface)。我希望能够将多个 SELECT 语句组合成一个查询(有点像 UNION,但每个查询填充不同的列)。下面的查询将在一个列中返回我需要的所有结果,但我希望能够使用每个查询来填充单独的列。任何关于如何实现这一点的帮助都将是非常棒的——某种与 VALUES 等效的 Hive 可能会做到这一点。干杯。

INSERT OVERWRITE TABLE tstr_tmp SELECT * FROM
(SELECT time_stamp FROM http WHERE ext_hostname = 'exotichorse' AND dt = '01/07/2015' AND ext_url = 'http://lucy.info' ORDER BY time_stamp asc limit 1) as last_visit_of_day
UNION ALL 
SELECT * FROM (SELECT CAST(COUNT(hr) as string) FROM http WHERE ext_hostname = 'exotichorse' AND dt = '01/07/2015' AND ext_url = 'http://lucy.info' group by  ext_url) as n_hour_bins
UNION ALL
SELECT * FROM (SELECT time_stamp FROM http WHERE ext_hostname = 'exotichorse' AND dt = '01/07/2015' AND ext_url = 'http://lucy.info' ORDER BY time_stamp desc limit 1) as first_visit_of_day
UNION ALL
SELECT * FROM (SELECT ext_url FROM http  WHERE ext_url = 'http://lucy.info' group by ext_url) as domain_name
UNION ALL
SELECT * FROM (SELECT CAST(count(*) as string) FROM http WHERE ext_hostname = 'exotichorse' AND dt = '01/07/2015' AND ext_url = 'http://lucy.info' group by ext_url) as n_http_requests
UNION ALL
SELECT * FROM (SELECT int_ip FROM http WHERE ext_hostname = 'exotichorse' group by int_ip) as internal_ip;

按照下面的要求,每个查询将返回一个字符串形式的值。对于这个特定的查询集,将返回以下结果;

00:08:00
2
07:00:00
http://lucy.info
2
192.168.0.22

我正在开发一个可以告诉我用户流量的数据库,因此这个子集将填充下表;

CREATE TABLE metric_http_domain_time_summary( last_visit_of_day string, n_hour_bins string, first_visit_of_day string, domain_name string, n_http_requests string, internal_ip string) PARTITIONED BY (dt string, hr string, origin string, cl string, st string);

我知道我需要对输入的数据进行分区,但我对这部分相当有信心,一旦我设法让未分区的查询运行,我就会对其进行编辑。我的能力差距在于将子查询串在一起以填充表。

最佳答案

在离开并思考了很长时间之后,我找到了答案。 UNION 是不必要的,实际上妨碍了。此查询将根据需要返回上述输出。如果其他人有同样的问题,将离开这个现场。由于堆栈溢出信誉限制,我不得不删除 ext_url,但这个概念会起作用。

SELECT * FROM
(SELECT time_stamp FROM http WHERE referrer_hostname = 'exotichorse' AND dt = '01/07/2015' AND ext_url = 'ext_url_here' ORDER BY time_stamp asc limit 1) as last_visit_of_day,
(SELECT CAST(COUNT(hr) as string) FROM http WHERE referrer_hostname = 'exotichorse' AND dt = '01/07/2015' AND ext_url = 'ext_url_here' group by  ext_url) as n_hour_bins,
(SELECT time_stamp FROM http WHERE referrer_hostname = 'exotichorse' AND dt = '01/07/2015' AND ext_url = 'ext_url_here' ORDER BY time_stamp desc limit 1) as first_visit_of_day,
(SELECT ext_url FROM http  WHERE ext_url = 'http://lucy.info' group by ext_url) as domain_name,
(SELECT CAST(count(*) as string) FROM http WHERE referrer_hostname = 'exotichorse' AND dt = '01/07/2015' AND ext_url = 'ext_url_here' group by ext_url) as n_http_requests,
(SELECT int_ip FROM http WHERE referrer_hostname = 'exotichorse' group by int_ip) as internal_ip;

关于mysql - Hive 使用子查询填充表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31723496/

相关文章:

sql - 比较同一个sql表中两行的数据

hadoop - 通过 webHDFS REST API 将图像上传到 HDFS 的问题

mysql - 在CloudVPS中创建数据库

php - 一个下拉框有助于填充另一个下拉框

mysql - 加入职位

sql - Postgres : Get value of a column corresponding to max of other column in a group

MySQL SELECT 查询字符串分割搜索值

mysql - 按类型/contactid 计算不同的消息

sql - Impala 查询错误 - AnalysisException : operands of type INT and STRING are not comparable

hadoop - Hive 外部表不显示任何内容