我正在尝试将带有管道分隔符的 csv 加载到配置单元外部表中。数据字段中出现的管道用引号括起来。数据中出现的双引号用\转义。当我配置外部表时,我看到带有双引号的数据没有被正确解释。
test.csv
id|name
105|"Test | pipe delim in field"
107|\" Test Escaped single double quote in HIVE
108|\" Test Escaped enclosed double quote in HIVE \"
109|\\" Test Escaped enclosed double quote in HIVE \"
110|\\" Test Escaped enclosed double quote in HIVE \\"
External table create statement
drop table test_schema.hive_test;
CREATE EXTERNAL TABLE test_schema.hive_test (id string, name string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES
(
"separatorChar" = "|",
"quoteChar" = "\"",
"escapeChar" = "\\"
)
LOCATION '/staging/test/hive'
tblproperties ("skip.header.line.count"="1");
Output
+---------------+-------------------------------------------------+
| hive_test.id | hive_test.name |
+---------------+-------------------------------------------------+
| 105 | Test | pipe delim in field |
| 107 | NULL |
| 108 | NULL |
| 109 | NULL |
| 110 | " Test Escaped enclosed double quote in HIVE \ |
+---------------+-------------------------------------------------+
Expected Output
+---------------+-------------------------------------------------+
| hive_test.id | hive_test.name |
+---------------+-------------------------------------------------+
| 105 | Test | pipe delim in field |
| 107 | " Test Escaped single double quote in HIVE |
| 108 | " Test Escaped enclosed double quote in HIVE " |
| 109 | NULL |
| 110 | NULL |
+---------------+-------------------------------------------------+
打开 CSV 版本 2.3
最佳答案
遗憾的是,这是不可能实现的,因为 OpenCSV 使用单个字符作为转义符,而实际上您正在尝试使用双反斜杠作为转义符(这将是 string
)。在 OpenCSVSerde 类中,您可以发现无论您作为转义字符传递什么,OpenCSVSerde获取字符串值的第一个字符 https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/OpenCSVSerde.java#L98
这是当前代码作为引用
private char getProperty(final Properties tbl, final String property, final char def) {
final String val = tbl.getProperty(property);
if (val != null) {
return val.charAt(0);
}
return def;
}我认为缺少一个警告,让用户在创建表时知道只支持单个字符。
关于hadoop - HIVE - 逃避双引号问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63265299/