java - 在 hbase 中批量加载时出错

标签 java hadoop mapreduce hbase bulk-load

我正在通过 Java MapReduce 程序尝试 Hbase - bulkLoad。 我在 Eclipse 中运行我的程序。

但是我收到以下错误:

12/06/14 20:04:28 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
12/06/14 20:04:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
12/06/14 20:04:29 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/06/14 20:04:29 WARN mapred.JobClient: No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
12/06/14 20:04:29 INFO input.FileInputFormat: Total input paths to process : 1
12/06/14 20:04:29 WARN snappy.LoadSnappy: Snappy native library not loaded
12/06/14 20:04:29 INFO mapred.JobClient: Running job: job_local_0001
12/06/14 20:04:29 INFO mapred.MapTask: io.sort.mb = 100
12/06/14 20:04:29 INFO mapred.MapTask: data buffer = 79691776/99614720
12/06/14 20:04:29 INFO mapred.MapTask: record buffer = 262144/327680
12/06/14 20:04:29 WARN mapred.LocalJobRunner: job_local_0001
java.lang.IllegalArgumentException: Can't read partitions file
    at org.apache.hadoop.hbase.mapreduce.hadoopbackport.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:111)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:560)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:639)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)
Caused by: java.io.FileNotFoundException: File _partition.lst does not exist.
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:383)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
    at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:776)
    at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)
    at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1419)
    at org.apache.hadoop.hbase.mapreduce.hadoopbackport.TotalOrderPartitioner.readPartitions(TotalOrderPartitioner.java:296)
    at org.apache.hadoop.hbase.mapreduce.hadoopbackport.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:82)
    ... 6 more
12/06/14 20:04:30 INFO mapred.JobClient:  map 0% reduce 0%
12/06/14 20:04:30 INFO mapred.JobClient: Job complete: job_local_0001
12/06/14 20:04:30 INFO mapred.JobClient: Counters: 0

我在谷歌上搜索了很多,但没有找到任何解决方案。

我试图从控制台运行同样的程序,但出现了以下错误:

 hadoop jar /home/user/hbase-0.90.4-cdh3u2/lib/zookeeper-3.3.3-cdh3u2.jar /home/user/hadoop-0.20.2-cdh3u2/Test.jar BulkLoadHBase_1 /bulkLoad.txt /out
Exception in thread "main" java.lang.NoSuchMethodException: org.apache.zookeeper.server.quorum.QuorumPeer.main([Ljava.lang.String;)
    at java.lang.Class.getMethod(Class.java:1605)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:180)

我的代码:

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.HFileOutputFormat;
import org.apache.hadoop.hbase.mapreduce.PutSortReducer;
import org.apache.hadoop.hbase.mapreduce.hadoopbackport.TotalOrderPartitioner;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat;

public class BulkLoadHBase_1 {

    public static class BulkLoadHBase_1Mapper 
            extends Mapper<Text, Text, ImmutableBytesWritable, Put>{

        public void map(Text key, Text value, Context context
                        ) throws IOException, InterruptedException {

            System.out.println("KEY  "+key.toString());
            System.out.println("VALUES : "+value);
            System.out.println("Context : "+context);

            ImmutableBytesWritable ibw =
                    new ImmutableBytesWritable(Bytes.toBytes(key.toString()));

            String val = value.toString();
            byte[] b = Bytes.toBytes(val);
            Put p = new Put(Bytes.toBytes(key.toString()));

            p.add(Bytes.toBytes("cf"),Bytes.toBytes("c"),Bytes.toBytes(val));

            context.write(ibw, p);
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();

        Job job = new Job(conf, "bulk-load");

        job.setJarByClass(BulkLoadHBase_1.class);
        job.setMapperClass(BulkLoadHBase_1Mapper.class);

        job.setReducerClass(PutSortReducer.class);
        job.setOutputKeyClass(ImmutableBytesWritable.class);
        job.setOutputValueClass(Put.class);
        job.setPartitionerClass(TotalOrderPartitioner.class);
        job.setInputFormatClass(KeyValueTextInputFormat.class);

        FileInputFormat.addInputPath(job,
                     new Path("/home/user/Desktop/bulkLoad.txt"));
        HFileOutputFormat.setOutputPath(job,
                     new Path("/home/user/Desktop/HBASE_BulkOutput/"));     

       System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

最佳答案

您是否以分布式模式启动了 HBase?! 如果是这样这一行:

org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)

在您的堆栈跟踪中显示您的 map reduce 作业正在本地模式而不是分布式模式下运行。

另请注意,如果您想从控制台中运行命令,您的输入文件必须位于您的 hadoop 文件系统上,而不是位于您的常规(例如 NTFS 或 EXT3)文件系统上。

问候

关于java - 在 hbase 中批量加载时出错,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11035798/

相关文章:

hadoop - 如何通过Hadoop计算不同的集合

python - pyspark 在集群上,确保所有节点都被使用

java - 为什么 JDK 中没有简单的 IO 工具来将整个文件读取到字符串中?

java - 如何在多线程环境中访问PreparedStatement?

java - 如何生成 Checkstyle 报告?

python - Apache Pig - Jython UDF 内存错误

hadoop - 作业期间更改了Hadoop分布式缓存对象

java - hive 错误:失败:执行错误,从org.apache.hadoop.hive.ql.exec.mr.MapRedTask返回代码2

java - 如何使用JAVA访问Hadoop MapReduce中Iterable <DoubleWritable>的第一个元素?

java - Android:文本文件到 InputStream 到包含新行的字符串