hadoop - 解释Hadoop分区程序

在从权威指南中处理次要排序问题时，我遇到了这样的代码:

 @Override
public int getPartition(TextpairWritable tp, IntWritable value, int numPartitions) {

    return Math.abs(Integer.parseInt(tp.getyear().toString()) * 127) % numPartitions;   
}

我想了解line的含义:

return Math.abs(Integer.parseInt(tp.getyear().toString()) * 127) % numPartitions;

如果我没有在驱动程序代码中告诉 reducer 的数量，hadoop怎么知道上一行的这个参数的值。将其乘以127有什么意义？

最佳答案

return Math.abs(Integer.parseInt(tp.getyear().toString()) * 127) % numPartitions;

您可以根据key's year属性值将其视为哈希。您可以选择任何(素)数与要获取的值相乘。此处选择的值为127。最后一部分numPartitions将数据划分为多少个存储区(归约器)。

If I don't tell the number of reducers in driver code, how does hadoop know the value of this parameter in above line.

参数的默认值为1。因此，所有数据(映射器的输出)都转到相同的reducer任务。

what is the significance of multiplying it with 127?

它是素数。我们通常将其与质数相乘，以便您可以处理/忽略数据的流度。质数不能被任何其他数整除，因此它们有助于在整个范围内平均分配数据。

关于hadoop - 解释Hadoop分区程序，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/32510703/

hadoop - 解释Hadoop分区程序

上一篇：hadoop - TDCH中Hive表的拆分大小

下一篇：hadoop - 我可以绕开HDFS中的无更新限制吗？