hadoop map-reduce : how to deploy non-jar files

标签 hadoop mapreduce

您好,当我使用 hadoop jar ..args.. 提交我的 jar 以进行 map-reduce 作业时,我想知道如何部署非 jar 文件。

对于 hadoop 流,有 --file 选项来发送文件,对于 spark,我们有 --files 但我在文档中找不到这样的选项。

在提交 hadoop map-reduce 作业时,是否可以将非 jar 文件与我的 jar 一起发送?

最佳答案

Applications can specify a comma separated list of paths which would be present in the current working directory of the task using the option -files

The -libjars option allows applications to add jars to the classpaths of the maps and reduces. The option -archives allows them to pass comma separated list of archives as arguments. These archives are unarchived and a link with name of the archive is created in the current working directory of tasks. More details about the command line options are available at Commands Guide.

Running wordcount example with -libjars, -files and -archives: hadoop jar hadoop-examples.jar wordcount -files cachefile.txt -libjars mylib.jar -archives myarchive.zip input output Here, myarchive.zip will be placed and unzipped into a directory by the name "myarchive.zip".

Users can specify a different symbolic name for files and archives passed through -files and -archives option, using #.

For example, hadoop jar hadoop-examples.jar wordcount -files dir1/dict.txt#dict1,dir2/dict.txt#dict2 -archives mytar.tgz#tgzdir input output Here, the files dir1/dict.txt and dir2/dict.txt can be accessed by tasks using the symbolic names dict1 and dict2 respectively. The archive mytar.tgz will be placed and unarchived into a directory by the name "tgzdir".

关于hadoop map-reduce : how to deploy non-jar files,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38363700/

相关文章:

hadoop - 数据从 Kafka 流向 HDFS 时,Flume 空间不足错误

hadoop - HDFS block 大小

hadoop - 如何使用 Spark 在 yarn 簇模式下将原木直接打印到控制台上

java - map 和 reduce 作业可以在不同的机器上吗?

hadoop - Reduce函数中的值列表是否确定排序?

hadoop - Hive Snappy 未压缩的长度必须更小

java - 复合 key 正在更改,Hadoop Map-Reduce?

java - 在 Java 中将 String[] 的所有元素转换为 int[] 的 Mapreduce 方法?

hadoop - 只有一个映射器的组合器,在 hadoop 中有两个映射器的情况下

macos - intelliJ IDEA 设置中的 Mac 上的 Hadoop