我试图通过设置属性 'serialization.null.format' = ''
将源文件中的空白值转换为配置单元表中的 NULL。我在配置单元中编写的查询是:
create table test(a int, b string) stored as parquet TBLPROPERTIES('serialization.null.format'='');
然后通过 impala 向其中插入值,如下所示:
insert overwrite table test values (1, ''), (2, 'b');
结果显示如下:
| a | b |
| 1 | |
| 2 | b |
有人可以帮我解决为什么空白没有转换为 NULL 吗?
最佳答案
问题是 Parquet SerDe。查看问题 https://issues.apache.org/jira/browse/HIVE-12362 .
说明如下:
create table src (a string);
insert into table src values (NULL), (''), ('');
0: jdbc:hive2://localhost:10000/default> select * from src;
+-----------+--+
| src.a |
+-----------+--+
| NULL |
| |
| |
+-----------+--+
create table dest (a string) row format serde 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' stored as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';
alter table dest set SERDEPROPERTIES ('serialization.null.format' = '');
alter table dest set TBLPROPERTIES ('serialization.null.format' = '');
insert overwrite table dest select * from src;
0: jdbc:hive2://localhost:10000/default> select * from test11;
+-----------+--+
| test11.a |
+-----------+--+
| NULL |
| |
| |
+-----------+--+
关于hive - 在 Hive 中将空白转换为 NULL,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36354410/