我在 HDFS 路径中有一组 CSV 文件,我从这些文件创建了一个外部 Hive 表,比如说 table_A。由于一些条目是多余的,我尝试基于 table_A 创建另一个 Hive 表,比如 table_B,它有不同的记录。我能够将 table_B 创建为非外部表(Hive 仓库)。我想知道是否可以将 table_B 创建为外部表?如果可能,它是否会从 table_A 复制记录并在指定路径(最好也是 CSV)上创建自己的 table_B 存储?
最佳答案
I am presuming you want to select distinct data from "uncleaned" table and insert into "cleaned" table.
CREATE EXTERNAL TABLE `uncleaned`( `a` int, `b` string, `c` string, `d` string, `e` bigint ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/external/uncleaned'
创建另一个表,它可以是外部的也可以不是(无所谓)。
CREATE EXTERNAL TABLE `cleaned`( `a` int, `b` string, `c` string, `d` string, `e` bigint ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/external/cleaned'
Read from first table and you can insert it by
insert overwrite table cleaned select distinct a,b,c,d,e from uncleaned;
关于csv - 从现有外部表创建外部 Hive 表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30974604/