hadoop - 在集群上运行mapreduce程序时，输入路径被视为输出路径

当我在集群上运行map-reduce程序时，输入路径被认为是输出路径，所以我总是得到关于

的错误

output directory already exists.

但是当我忽略 DriverClass 的参数时，程序运行成功。我真的不知道为什么会这样。我可以在 IntelliJ IDEA 中运行这个程序，我在本地环境中得到了正确的答案。

我没有弄错输入路径的索引和输出的

FileInputFormat.setInputPaths(job,new Path(args[0]));
FileOutputFormat.setOutputPath(job,new Path(args[1]));

HDFS结构

下面是我的 hadoop 命令和错误消息

但是，在我忽略 DriverClass 参数后，程序运行成功

最佳答案

您的问题在于:

FileInputFormat.setInputPaths(job,new Path(args[0]));
FileOutputFormat.setOutputPath(job,new Path(args[1]));

args[0] 实际上是类名，所以你想使用 args[1] 作为输入，args[2] 用于输出。

关于hadoop - 在集群上运行mapreduce程序时，输入路径被视为输出路径，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/56513236/

相关文章：

hadoop - 从虚拟集群转发Apache Hue端口？