hadoop - 每个 mapper 中的 Hashmap 应该在单个 reducer 中使用

在我的一个类(class)中，我使用 HashMap。我在我的映射器中调用该类(class)。所以现在每个 mapper 都有自己的 HashMap。现在我可以将所有 HashMap 用于单个 reducer 吗？实际上，我的 HashMap 包含 Key 作为我的文件名，值是 Set。因此每个 HashMap 都包含一个文件名和一个 Set。现在我想使用所有包含相同文件名的 HashMap 并想合并所有值(集)，然后将该 HashMap 写入我的 Hdfs 文件

最佳答案

是的，你可以做到。如果您的映射器以 hashmap 的形式提供输出，那么您可以使用 Hadoop 的 MapWritable 作为映射器的值。例如

public class MyMapper extends Mapper<LongWritable, Text, Text, MapWritable>

你必须将你的Hashmap转换成MapWritable格式:

MapWritable mapWritable = new MapWritable();
for (Map.Entry<String,String> entry : yourHashMap.entrySet()) {
    if(null != entry.getKey() && null != entry.getValue()){
       mapWritable.put(new Text(entry.getKey()),new Text(entry.getValue()));
    }
}

然后将 mapwritable 提供给您的上下文:

ctx.write(new Text("my_key",mapWritable);

对于 Reducer 类，您将 MapWritable 作为您的输入值

public class MyReducer extends Reducer<Text, MapWritable, Text, Text>

public void reduce(Text key, Iterable<MapWritable> values, Context ctx) throws IOException, InterruptedException

然后遍历 map 并按照您想要的方式提取值。例如:

for (MapWritable entry : values) {
  for (Entry<Writable, Writable> extractData: entry.entrySet()) {
      //your logic for the data will go here.
   }                    
}

关于hadoop - 每个 mapper 中的 Hashmap 应该在单个 reducer 中使用，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/31264246/

hadoop - 每个 mapper 中的 Hashmap 应该在单个 reducer 中使用

上一篇：hadoop - 将工件部署到 Hadoop 集群

下一篇：java - Apache pig 脚本，错误 1070 : Java UDF could not resolve import