java - Hadoop排序阶段需要几个小时

标签 java sorting hadoop mapreduce

我开始使用hadoop一周了。成功运行示例后，我进行了mapreduce作业，以使用WordCount示例查找最常用的单词。

我正在尝试使用500 MB或数据运行此作业。

但是， map task 要花费几个小时。目前在 map 上67％减少了0％。

map task 日志如下:

2014-10-24 11:19:52,274 DEBUG [IPC Parameter Sending Thread #0] org.apache.hadoop.ipc.Client: IPC Client (592959754) connection to /xxx.xx.xx.xx:52026 from job_1414134493988_0001 sending #2554
2014-10-24 11:19:52,278 DEBUG [IPC Client (592959754) connection to /xxx.xx.xx.xx:52026 from job_1414134493988_0001] org.apache.hadoop.ipc.Client: IPC Client (592959754) connection to /xxx.xx.xx.xx:52026 from job_1414134493988_0001 got value #2554
2014-10-24 11:19:52,279 DEBUG [communication thread] org.apache.hadoop.ipc.RPC: Call: ping 5
2014-10-24 11:19:55,279 DEBUG [IPC Parameter Sending Thread #0] org.apache.hadoop.ipc.Client: IPC Client (592959754) connection to /xxx.xx.xx.xx:52026 from job_1414134493988_0001 sending #2555
2014-10-24 11:19:55,280 DEBUG [IPC Client (592959754) connection to /xxx.xx.xx.xx:52026 from job_1414134493988_0001] org.apache.hadoop.ipc.Client: IPC Client (592959754) connection to /xxx.xx.xx.xx:52026 from job_1414134493988_0001 got value #2555
2014-10-24 11:19:55,280 DEBUG [communication thread] org.apache.hadoop.ipc.RPC: Call: ping 1

这要发生这么长时间吗？

最佳答案

一些技巧:

多久(多久)是什么意思？

对堆栈任务长时间运行的位置进行堆栈跟踪，看看卡在哪里了？

另外，任务经常失败是什么状态？

您的集群中有多少个映射和约简？

关于java - Hadoop排序阶段需要几个小时，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/26575142/