hadoop - HIVE-将选择语句的结果作为多个记录插入到配置单元表中,而不会覆盖现有内容

标签 hadoop hive left-join hiveql

我有一个来自以下命令的表:

CREATE TABLE treatment_costs AS SELECT * FROM 
(SELECT r.patient_id, r.transaction_date, r.paid_transaction_amount, o.dob, o.department_name, o.reason_of_visit FROM ReceiptTransactions r
LEFT OUTER JOIN OpdPatientQ o ON (r.patient_id = o.patient_id)
);

我现在想将今天(在给定日期)插入的所有记录插入上表。为此,我写了:
INSERT INTO TABLE treatment_costs SELECT * FROM
(SELECT r.patient_id, r.transaction_date, r.paid_transaction_amount, o.dob, o.department_name, o.reason_of_visit FROM ReceiptTransactions r WHERE timestamp_column = today_date
LEFT OUTER JOIN OpdPatientQ o ON (r.patient_id = o.patient_id)
);

这是在表中插入多个查询的正确方法吗?

编辑1:
例如,表treatment_costs的内容是这些行:
patient_id, transaction_date, paid_transaction_amount, dob, department_name, reason_of_visit
001 01/01/2014 30000 01/01/1985 Cardiology reason_1
002 01/01/2014 35000 01/01/1975 Cardiology reason_2
003 02/01/2014 40000 01/01/1965 Oncology   reason_3
004 02/01/2014 30000 01/01/1985 Cardiology reason_4
005 02/01/2014 20000 01/01/1975 Gynecology reason_5

现在我的疑问是我的插入查询中的select语句:
SELECT * FROM
(SELECT r.patient_id, r.transaction_date, r.paid_transaction_amount, o.dob, o.department_name, o.reason_of_visit FROM ReceiptTransactions r WHERE timestamp_column = today_date
LEFT OUTER JOIN OpdPatientQ o ON (r.patient_id = o.patient_id)
);

,例如,给出以下结果:
patient_id, transaction_date, paid_transaction_amount, dob, department_name, reason_of_visit
011 01/01/2015 30000 01/01/1986 Cardiology reason_11
012 01/01/2015 35000 01/01/1976 Cardiology reason_21
013 02/01/2015 40000 01/01/1966 Oncology   reason_31
014 02/01/2015 30000 01/01/1986 Cardiology reason_41
015 02/01/2015 20000 01/01/1976 Gynecology reason_51

而且,执行插入查询后我表的内容是否如下所示?
patient_id, transaction_date, paid_transaction_amount, dob, department_name, reason_of_visit
001 01/01/2014 30000 01/01/1985 Cardiology reason_1
002 01/01/2014 35000 01/01/1975 Cardiology reason_2
003 02/01/2014 40000 01/01/1965 Oncology   reason_3
004 02/01/2014 30000 01/01/1985 Cardiology reason_4
005 02/01/2014 20000 01/01/1975 Gynecology reason_5
011 01/01/2015 30000 01/01/1986 Cardiology reason_11
012 01/01/2015 35000 01/01/1976 Cardiology reason_21
013 02/01/2015 40000 01/01/1966 Oncology   reason_31
014 02/01/2015 30000 01/01/1986 Cardiology reason_41
015 02/01/2015 20000 01/01/1976 Gynecology reason_51

最佳答案

摘自Hive Language Manual

INSERT INTO will append to the table or partition, keeping the existing data intact.


INSERT INTO TABLE ...

不会覆盖表中已经存在的任何数据。您拥有的INSERT查询将运行MapReduce(基于引擎类型)作业,该作业会将新生成的文件写入表位置,而不会删除现有文件。

关于hadoop - HIVE-将选择语句的结果作为多个记录插入到配置单元表中,而不会覆盖现有内容,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61205530/

相关文章:

hadoop - 是否为 Oozie 操作配置队列可选

Java Hadoop MapReduce 多个键值

symfony - Doctrine2 在多个级别上左连接发出多个请求

sql - “Hive” 多列的最大列值

java - 我们可以在 mapreduce 代码中将一些计算任务放在映射器类的设置方法中吗

date - 在Hive中执行数据功能,其中日期格式包含需要转义的字符

hadoop - 在Hive MapReduce中访问数据

java - Hive:将小写字母应用于数组

mysql - 带有进一步条件的左连接/连接

带有限制的mysql左连接子查询在父选择上给出具有空值的字段