hadoop - map 错误 - Attempy_xxxx_ 600 秒后超时

我使用的是 Hadoop 2.2.0，当我运行 map task 时出现以下错误

attempt_xxx Timed out after 1800000 seconds

(它是 1800000，因为我更改了 mapreduce.task.timeout 的配置)。

下面是我的 map 代码:

public class MapTask
{
 ContentOfFiles fileContent= new ContentOfFiles();
 @Override
 public void map(LongWritable key, Text value, Context context)
 {
   String line = value.toString(); 
   String splits[] = line.split("\\t");
   List<String> sourceList = Arrays.aslist(splits);
   String finalOutput = fileContent.getContentOfFile(sourceList);
   context.write(NullWritable.get, new Text(finalOutput));  
 }
}

这是我的 ContentOfFiles 类

public class ContentOFFiles
{
  public String getContentOfFile(List<String>sourceList)
   {
     String returnContentOfFile;
     for(List sourceList:sourceLists)
      {
        //Open the files and get the content and then append it to the String returnContentOfFile
      }
    return returnContentOfFile;
   }
}

当我运行我的 map task 时，我收到错误信息

attempt_xxx Timed` out after 1800000 seconds.

我想知道的是如何告诉hadoop我的任务还在运行。

我在 map 中调用了 ContentOfFiles 类。那么有没有办法告诉我的 map task 仍在运行。我试图将配置 mapreduce.task.timeout 更改为 1800000，它仍然给我同样的错误。

我再次使用 hadoop 2.2，所以如果有人能告诉我如何在新的 api 中处理这个问题，那就太好了。

最佳答案

您可以尝试在映射器中的每个长操作结束后添加 context.progress();。据我所知，最好的地方是 for 循环的结尾:

public String getContentOfFile(List < String > sourceList, Context context) {
    String returnContentOfFile;
    for (List sourceList: sourceLists) {
        //Open the files and get the content and then append it to the String returnContentOfFile
        context.progres(); // report on progress
    }
    return returnContentOfFile;
}

关于hadoop - map 错误 - Attempy_xxxx_ 600 秒后超时，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/24258670/

hadoop - map 错误 - Attempy_xxxx_ 600 秒后超时

上一篇：hadoop - 在 Hadoop 上执行更新操作

下一篇：hadoop - Reducer 可以一次拥有多个键吗？