hadoop - 无法将 TotalOrderPartitioner 与 Hive : Can't read partitions file 一起使用

标签 hadoop mapreduce hive hbase totalorderpartitioner

我们正在尝试使用生成 HBase Hfiles 从 Hive 批量加载。我们的主要问题是,当使用

org.apache.hadoop.mapred.lib.TotalOrderPartitioner;

它找不到自定义分区程序文件:

java.lang.IllegalArgumentException: Can't read partitions file

更多详情:

自定义分区程序文件是从 Hive 表创建的:

CREATE EXTERNAL TABLE netezza.ais_lowres_mmsi_range_keys(hbase_key_range_start string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveNullValueSequenceFileOutputFormat'
LOCATION '/tmp/ais_lowres_mmsi_range_keys';

INSERT OVERWRITE TABLE netezza.ais_lowres_mmsi_range_keys SELECT r_start FROM tmp_rows ORDER BY r_start;



# The tmp_rows table holds the partition splits of our current HBase table

# Table content is copied to a file as  per: https://cwiki.apache.org/confluence/display/Hive/HBaseBulkLoad
hadoop fs -cp /tmp/ais_lowres_mmsi_range_keys/* /tmp/ais_lowres_mmsi_range_keys_list

# Hive and HBase jars are added
ADD JAR /usr/hdp/2.3.2.0-2950/hive/lib/hive-hbase-handler-1.2.1.2.3.2.0-2950.jar;
ADD JAR /usr/hdp/2.3.2.0-2950/hbase/lib/hbase-server-1.1.2.2.3.2.0-2950.jar;
ADD JAR /usr/hdp/2.3.2.0-2950/hbase/lib/hbase-common-1.1.2.2.3.2.0-2950.jar;
ADD JAR /usr/hdp/2.3.2.0-2950/hbase/lib/hbase-client-1.1.2.2.3.2.0-2950.jar;
ADD JAR /usr/hdp/2.3.2.0-2950/hbase/lib/hbase-protocol-1.1.2.2.3.2.0-2950.jar;
SET hive.aux.jars.path = /tmp/hive-hbase-handler-1.2.1.2.3.2.0-2950.jar,/tmp/hbase-server-1.1.2.2.3.2.0-2950.jar,/tmp/hbase-client-1.1.2.2.3.2.0-2950.jar,/tmp/hbase-common-1.1.2.2.3.2.0-2950.jar,/tmp/hbase-protocol-1.1.2.2.3.2.0-2950.jar;

SET hive.execution.engine=mr;
SET mapreduce.job.reduces=$((num_range+1)); # The number of reducers is set to the number of partition splits +1
SET hive.mapred.partitioner=org.apache.hadoop.mapred.lib.TotalOrderPartitioner;
SET total.order.partitioner.natural.order=false;
SET total.order.partitioner.path=/tmp/ais_lowres_mmsi_range_keys_list;        
SET hfile.compression=gz;

INSERT OVERWRITE TABLE tmp_table 
SELECT [cols] FROM ais_lowres_mmsi_distinct
CLUSTER BY hbase_key;

产生以下错误:

######################################################################
Starting Job = job_1458218583243_0631, Tracking URL = http://osl5303.cm.cluster:8088/proxy/application_1458218583243_0631/
Kill Command = /usr/hdp/2.3.2.0-2950/hadoop/bin/hadoop job  -kill job_1458218583243_0631
Hadoop job information for Stage-1: number of mappers: 19; number of reducers: 49
2016-03-30 08:19:39,534 Stage-1 map = 0%,  reduce = 0%
2016-03-30 08:19:55,084 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_1458218583243_0631 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1458218583243_0631_m_000009 (and more) from job job_1458218583243_0631
Examining task ID: task_1458218583243_0631_m_000017 (and more) from job job_1458218583243_0631
Examining task ID: task_1458218583243_0631_m_000008 (and more) from job job_1458218583243_0631
Examining task ID: task_1458218583243_0631_m_000001 (and more) from job job_1458218583243_0631
Examining task ID: task_1458218583243_0631_m_000008 (and more) from job job_1458218583243_0631
Examining task ID: task_1458218583243_0631_m_000003 (and more) from job job_1458218583243_0631

Task with the most failures(4): 
-----
Task ID:
  task_1458218583243_0631_m_000012

URL:
  http://osl5303.cm.cluster:8088/taskdetails.jsp?jobid=job_1458218583243_0631&tipid=task_1458218583243_0631_m_000012
-----
Diagnostic Messages for this Task:
Error: java.lang.IllegalArgumentException: Can't read partitions file
    at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
    at org.apache.hadoop.mapred.MapTask$OldOutputCollector.<init>(MapTask.java:592)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.io.FileNotFoundException: File file:/grid/3/hadoop/yarn/local/usercache/ketot/appcache/application_1458218583243_0631/container_e22_1458218583243_0631_01_000086/_partition.lst does not exist
    at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:609)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:822)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:599)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
    at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1752)
    at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1776)
    at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.readPartitions(TotalOrderPartitioner.java:301)
    at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:88)
    ... 10 more


FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 19  Reduce: 49   HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
#########################################################################################

似乎忽略了指定的自定义路径。此外,TotalOrderPartitioner 寻找的路径在本地文件系统上并且不存在。有什么建议么?

最佳答案

设置 total.order.partitioner.path 是错误的,因此 TotalOrderParitioner 正尝试使用默认值。这个好像哪里都没有提到,我只好翻遍源码才搞明白!

已经更新,现在是:

mapreduce.totalorderpartitioner.path

参见 https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/DeprecatedProperties.html获取已弃用属性的完整列表。

关于hadoop - 无法将 TotalOrderPartitioner 与 Hive : Can't read partitions file 一起使用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36302424/

相关文章:

hadoop - 根据用户自动查看HDFS中的文件夹

hadoop - 在配置单元中创建表时 EXTERNAL 关键字的意义

java - 如何运行Hbase和Hadoop mapreduce

hadoop - 使用映射器将空值写入 Parquet 文件

hive - 如何可视化 Hive 数据?

python - ImportError : libsasl2. so.3: 无法打开共享对象文件: 没有那个文件或目录

shell - 如何将单个文件拆分成多个.sh文件并依次执行

hadoop - 如何在 cloudsim 中实现 Hadoop?

java - 尝试使用Hadoop运行mapReduce jar文件

hive 选择进入