mysql - Hive 使用子查询填充表

我正在处理 Hadoop 数据库，使用 Hive 作为首选接口(interface)。我希望能够将多个 SELECT 语句组合成一个查询(有点像 UNION，但每个查询填充不同的列)。下面的查询将在一个列中返回我需要的所有结果，但我希望能够使用每个查询来填充单独的列。任何关于如何实现这一点的帮助都将是非常棒的——某种与 VALUES 等效的 Hive 可能会做到这一点。干杯。

INSERT OVERWRITE TABLE tstr_tmp SELECT * FROM
(SELECT time_stamp FROM http WHERE ext_hostname = 'exotichorse' AND dt = '01/07/2015' AND ext_url = 'http://lucy.info' ORDER BY time_stamp asc limit 1) as last_visit_of_day
UNION ALL 
SELECT * FROM (SELECT CAST(COUNT(hr) as string) FROM http WHERE ext_hostname = 'exotichorse' AND dt = '01/07/2015' AND ext_url = 'http://lucy.info' group by  ext_url) as n_hour_bins
UNION ALL
SELECT * FROM (SELECT time_stamp FROM http WHERE ext_hostname = 'exotichorse' AND dt = '01/07/2015' AND ext_url = 'http://lucy.info' ORDER BY time_stamp desc limit 1) as first_visit_of_day
UNION ALL
SELECT * FROM (SELECT ext_url FROM http  WHERE ext_url = 'http://lucy.info' group by ext_url) as domain_name
UNION ALL
SELECT * FROM (SELECT CAST(count(*) as string) FROM http WHERE ext_hostname = 'exotichorse' AND dt = '01/07/2015' AND ext_url = 'http://lucy.info' group by ext_url) as n_http_requests
UNION ALL
SELECT * FROM (SELECT int_ip FROM http WHERE ext_hostname = 'exotichorse' group by int_ip) as internal_ip;

按照下面的要求，每个查询将返回一个字符串形式的值。对于这个特定的查询集，将返回以下结果；

00:08:00
2
07:00:00
http://lucy.info
2
192.168.0.22

我正在开发一个可以告诉我用户流量的数据库，因此这个子集将填充下表；

CREATE TABLE metric_http_domain_time_summary( last_visit_of_day string, n_hour_bins string, first_visit_of_day string, domain_name string, n_http_requests string, internal_ip string) PARTITIONED BY (dt string, hr string, origin string, cl string, st string);

我知道我需要对输入的数据进行分区，但我对这部分相当有信心，一旦我设法让未分区的查询运行，我就会对其进行编辑。我的能力差距在于将子查询串在一起以填充表。

最佳答案

在离开并思考了很长时间之后，我找到了答案。 UNION 是不必要的，实际上妨碍了。此查询将根据需要返回上述输出。如果其他人有同样的问题，将离开这个现场。由于堆栈溢出信誉限制，我不得不删除 ext_url，但这个概念会起作用。

SELECT * FROM
(SELECT time_stamp FROM http WHERE referrer_hostname = 'exotichorse' AND dt = '01/07/2015' AND ext_url = 'ext_url_here' ORDER BY time_stamp asc limit 1) as last_visit_of_day,
(SELECT CAST(COUNT(hr) as string) FROM http WHERE referrer_hostname = 'exotichorse' AND dt = '01/07/2015' AND ext_url = 'ext_url_here' group by  ext_url) as n_hour_bins,
(SELECT time_stamp FROM http WHERE referrer_hostname = 'exotichorse' AND dt = '01/07/2015' AND ext_url = 'ext_url_here' ORDER BY time_stamp desc limit 1) as first_visit_of_day,
(SELECT ext_url FROM http  WHERE ext_url = 'http://lucy.info' group by ext_url) as domain_name,
(SELECT CAST(count(*) as string) FROM http WHERE referrer_hostname = 'exotichorse' AND dt = '01/07/2015' AND ext_url = 'ext_url_here' group by ext_url) as n_http_requests,
(SELECT int_ip FROM http WHERE referrer_hostname = 'exotichorse' group by int_ip) as internal_ip;

关于mysql - Hive 使用子查询填充表，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/31723496/

mysql - Hive 使用子查询填充表

上一篇：hadoop - Oozie 作业在运行 hue 时由于 "not org.apache.hadoop.mapred.Mapper"而失败

下一篇：azure - 如何从 C# 自动化 Azure Spark？