mapreduce - 如何使用MapReduce查询HBase数据？

您好，我是 MapReduce 和 HBase 的新手。请指导。我正在使用 MapReduce 将表格数据移动到 HBase。现在数据已到达 HBase 中(HDFS 中也是如此)。我创建了 mapreduce 作业，它将从文件中读取表格数据并使用 HBase API 将其放入 Hbase 中。

现在我的疑问是我可以使用MapReduce查询HBase数据吗？我不想执行 HBase 命令来查询数据。是否可以使用MapReduce查询HBase的数据？

请提供帮助或建议。

最佳答案

当然可以，HBase 附带 TableMapReduceUtil帮助您配置用于扫描数据的 MapReduce 作业。它会自动为每个区域创建一个 map task 。

请检查此示例 extracted from the HBase book :

Configuration config = HBaseConfiguration.create();
Job job = new Job(config, "ExampleRead");
job.setJarByClass(MyReadJob.class);     // class that contains mapper

Scan scan = new Scan();
scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false);  // don't set to true for MR jobs
// set other scan attrs
...

TableMapReduceUtil.initTableMapperJob(
  tableName,        // input HBase table name
  scan,             // Scan instance to control CF and attribute selection
  MyMapper.class,   // mapper
  null,             // mapper output key
  null,             // mapper output value
  job);
job.setOutputFormatClass(NullOutputFormat.class);   // because we aren't emitting anything from mapper

boolean b = job.waitForCompletion(true);
if (!b) {
  throw new IOException("error with job!");
}

<强> MORE EXAMPLES HERE

关于mapreduce - 如何使用MapReduce查询HBase数据？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/21379465/

mapreduce - 如何使用MapReduce查询HBase数据？

上一篇：sql - Oracle 计数未显示 0

下一篇：WordPress 插件，用于使用推荐 ID 进行用户注册