hadoop - map 中的第三方 jar 减少工作

我的 map reduce 作业依赖于第三方库，如 hive-hcatalog-xxx.jar。我通过 oozie 运行我所有的工作。 Mapreduce 作业通过 java 操作运行。在我的工作中包含第三方库的最佳方式是什么？我有两个选择

将所有依赖的 jar 打包到主 jar 中并创建一个 fat jar。
将所有依赖的 jar 保存在 HDFS 位置并通过 -libjars 选项添加它

我可以选择哪一个？请指教。

由于我的 mapreduce 作业是通过 oozie 的 java 操作调用的，oozie lib 文件夹中可用的库未添加到 mapper/reducer 的类路径中。如果我将此 java 操作更改为 map reduce 操作，jar 是否可用？

提前致谢。

最佳答案

1.Bundle all the dependent jars into the main jar and create a fat jar. OR 2.Keep all the dependent jars in an HDFS location and add it via -libjars option Which one I can choose?

虽然，这两种方法都在实践中。我建议 Uber jar 即您的第一种方法。

Uber jar :一个有 lib/ 文件夹的 jar，里面有更多依赖的 jar(一种称为“uber”jar 的结构)，您可以通过常规的“hadoop jar”提交作业命令，这些 lib/.jars 被框架获取，因为提供的 jar 是通过 conf.setJarByClass 或 conf.setJar 明确指定的。也就是说，如果这个用户 uber jar 作为 mapred...jar 进入 JT，那么框架会正确处理它，并且 lib/.jars 都会被考虑并放置在 classpath 中。

Why

优点是您可以分发您的 uber-jar 而根本不关心是否在目标位置安装了依赖项，因为您的 uber-jar 实际上没有依赖项。

As my mapreduce job is invoked through a java action of oozie, the libraries available in oozie lib folder is not added to the classpath of mapper/reducer. If I change this java action to map reduce action, will the jars be available?

对于上面的问题，由于答案很宽泛，

我有来自 CDH4.xx 的 sharelib 链接, CDH5.xx & How to configure Mapreduce action with Oozie shre lib.为你

关于hadoop - map 中的第三方 jar 减少工作，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/38270676/

hadoop - map 中的第三方 jar 减少工作

上一篇：json - 如何在 Druid 中格式化 TSV 文件

下一篇：shell - 脚本未完成执行，但 cron 作业再次启动