performance - 方法 v Hadoop MapReduce 中的类级变量

这是一个关于在 map reduce 步骤中可写变量和分配的性能的问题。这是一个 reducer :

static public class MyReducer extends Reducer<Text, Text, Text, Text> {
      @Override
      protected void reduce(Text key, Iterable<Text> values, Context context) {
        for (Text val : values) {
            context.write(key, new Text(val));
        }
      }
}

或者这在性能方面是否更好:

static public class MyReducer extends Reducer<Text, Text, Text, Text> {
      private Text myText = new Text();
      @Override
      protected void reduce(Text key, Iterable<Text> values, Context context) {
        for (Text val : values) {
            myText.set(val);
            context.write(key, myText);
        }
      }
}

在 Hadoop 权威指南中，所有示例都采用第一种形式，但我不确定这是为了更短的代码示例还是因为它更惯用。

最佳答案

本书可能会使用第一种形式，因为它更简洁。但是，它的效率较低。对于大型输入文件，该方法将创建大量对象。这种过多的对象创建会降低您的性能。在性能方面，第二种方法更可取。

一些讨论这个问题的引用资料: