有一个空的 HBase 表,其中包含两个列族:
create 'emp', 'personal_data', 'professional_data'
现在我尝试将 Hive 外部表映射到它,该表自然会有一些列:
CREATE EXTERNAL TABLE emp(id int, city string, name string, occupation string, salary int)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":id,
personal_data:city,
personal_data:name,
professional_data:occupation,
professional_data:salary")
TBLPROPERTIES ("hbase.table.name" = "emp", "hbase.mapred.output.outputtable" = "emp");
现在我得到的错误是这样的:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException org.apache.hadoop.hive.hbase.HBaseSerDe: columns has 5 elements while hbase.columns.mapping has 6 elements (counting the key if implicit))
你能帮我一下吗?我做错了什么吗?
最佳答案
在映射中,您引用 id
字段,但应引用 HBase key
关键字。如 documentation 中所述:
a mapping entry must be either :key or of the form column-family-name:[column-name][#(binary|string)
只需将 :id
替换为 :key
即可:
CREATE EXTERNAL TABLE emp(id int, city string, name string, occupation string, salary int)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,
personal_data:city,
personal_data:name,
professional_data:occupation,
professional_data:salary")
TBLPROPERTIES ("hbase.table.name" = "emp", "hbase.mapred.output.outputtable" = "emp");
列映射基于列的顺序,而不是列的名称。在文档的“多列和族”段落中,您可以清楚地看到名称并不重要
CREATE TABLE hbase_table_1(key int, value1 string, value2 int, value3 int)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
"hbase.columns.mapping" = ":key,a:b,a:c,d:e"
)
映射就是
- key -> ID
- a:b -> 值 1
- a:c -> 值2
- d:e -> value3
关于hive - 在 HBase 现有表之上定义 Hive 外部表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39993768/