我的表结构如下。
CREATE TABLE db.TEST(
f1 string,
f2 string,
f3 string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'input.regex'='(.{2})(.{3})(.{4})' )
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://nameservice1/location/TEST';
我尝试在表中插入一条记录,如下所示。
insert overwrite table db.TEST2
select '12' as a , '123' as b , '1234' as c ;
尝试向表中插入数据时,遇到以下错误。
Caused by: java.lang.UnsupportedOperationException: Regex SerDe doesn't support the serialize() method at org.apache.hadoop.hive.serde2.RegexSerDe.serialize(RegexSerDe.java:289)
知道出了什么问题吗?
最佳答案
您使用了错误的 SerDe 类。 org.apache.hadoop.hive.serde2.RegexSerDe不支持序列化。看source code - serialize 方法只抛出 UnsupportedOperationException
异常:
public Writable serialize(Object obj, ObjectInspector objInspector)
throws SerDeException {
throw new UnsupportedOperationException(
"Regex SerDe doesn't support the serialize() method");
}
解决方案是
使用另一个 SerDe 类:
org.apache.hadoop.hive.contrib.serde2.RegexSerDe , 它可以使用 format 序列化行对象字符串。序列化格式应在 SERDEPROPERTIES
中指定。看source code了解更多详情。
SerDe 属性示例:
WITH SERDEPROPERTIES ( 'input.regex' = '(.{2})(.{3})(.{4})','output.format.string' = '%1$2s%2$3s%3$4s')
对于你的表,它将是这样的:
CREATE TABLE db.TEST(
f1 string,
f2 string,
f3 string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'input.regex'='(.{2})(.{3})(.{4})',
'output.format.string' = '%1$2s%2$3s%3$4s' )
LOCATION
'hdfs://nameservice1/location/TEST';
关于Regex SerDe 不支持 serialize() 方法错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53744624/