java - 具有两个输入文件和单个输出文件的字数统计程序

我是 Hadoop 新手。我已经用单个输入文件和单个输出文件完成了字数统计程序。现在我想将 2 个文件作为输入并将输出写入单个文件。我尝试过这样的:

FileInputFormat.setInputPaths(conf, new Path(args[0]), new Path(args[1]));
FileOutputFormat.setOutputPath(conf, new Path(args[2]));

这是终端中的命令:

hadoop jar test.jar Driver /user/in.txt /user/sample.txt /user/out

当我运行这个程序时，它会将sample.txt作为输出目录并表示:

Output directory hdfs://localhost:9000/user/sample.txt already exists

谁能帮我解决这个问题吗？

最佳答案

可能是因为它将 Driver 作为您的第一个参数。你为什么不尝试这样呢。

hadoop jar test.jar /user/in.txt /user/sample.txt /user/out

关于java - 具有两个输入文件和单个输出文件的字数统计程序，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/30358087/

相关文章：

hadoop - 我应该选择哪种开源推荐系统来处理大数据集