我正在使用 Hadoop in Practice
一书学习 Hadoop,在阅读第 1 章时,我看到了这个图表:
来自 Hadoop 文档:( http://hadoop.apache.org/docs/current2/api/org/apache/hadoop/mapred/Reducer.html )
1.随机播放
Reducer is input the grouped output of a Mapper. In the phase the framework, for each Reducer, fetches the relevant partition of the output of all the Mappers, via HTTP.
2.排序
The framework groups Reducer inputs by keys (since different Mappers may have output the same key) in this stage. The shuffle and sort phases occur simultaneously i.e. while outputs are being fetched they are merged.
虽然我知道 shuffle
和 sorting
同时发生,但我不清楚框架如何决定哪个 reducer
接收哪个映射器
输出。从文档中,似乎每个 reducer
都有办法知道要收集哪个 map
输出,但我不明白如何。
所以我的问题是,鉴于上面的映射器输出,每个 reducer 的最终结果总是相同的吗?如果是这样,实现这一结果的步骤是什么?
感谢任何澄清!
最佳答案
它是 Partitioner这决定了如何将映射器的输出分配给不同的缩减器。
Partitioner controls the partitioning of the keys of the intermediate map-outputs. The key (or a subset of the key) is used to derive the partition, typically by a hash function. The total number of partitions is the same as the number of reduce tasks for the job. Hence this controls which of the m reduce tasks the intermediate key (and hence the record) is sent for reduction.
关于java - Hadoop 中的 Map Reduce 流程,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20916258/