我正在尝试使用 Hadoop 运行 Kmeans。我想将在 Reducer 的清理方法中计算的簇的质心保存到某个文件中,比如 centroids.txt
。现在,我想知道如果多个 reducer 的清理方法同时启动并且它们都尝试同时写入该文件会发生什么。会在内部处理吗?如果没有,有没有办法同步这个任务?
注意这不是我的reducer输出文件。这是我维护的一个附加文件,用于跟踪质心。我正在使用 reducer 清理方法中的 BufferedWriter 来执行此操作。
最佳答案
Yes you are right. You cannot achieve that using existing framework. Cleanup will be called many times.and you cannot synchronize. Possible approaches you can follow are
Call merge after successful job.
hadoop fs -getmerge <src> <localdst> [addnl]
2 Clearly specify where your output file(s) should go. Use this folder as input to your next job.
3 Chain one more MR. where map and reduce don't change the data, and partitioner assigns all data to a single reducer
关于hadoop - 从 hadoop 中的多个 reducer 写入单个文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23299144/