java - 如何在 Hadoop2 中指定 Hive 查询的 uberization？

Hadoop 2 中有一项名为uberization 的新功能。例如，this reference说:

Uberization is the possibility to run all tasks of a MapReduce job in the ApplicationMaster's JVM if the job is small enough. This way, you avoid the overhead of requesting containers from the ResourceManager and asking the NodeManagers to start (supposedly small) tasks.

我无法判断这是否只是在幕后神奇地发生，还是需要做些什么才能发生？例如，在执行 Hive 查询时是否有设置(或提示)来实现这一点？您能否指定“足够小”的阈值？

另外，我很难找到关于这个概念的很多信息 - 它有另一个名字吗？

最佳答案

我在 YARN Book 中找到了详细信息Arun Murthy 关于“ super 工作”的文章:

An Uber Job occurs when multiple mapper and reducers are combined to use a single container. There are four core settings around the configuration of Uber Jobs found in the mapred-site.xml options presented in Table 9.3.

这是表 9.3:

|-----------------------------------+------------------------------------------------------------|
| mapreduce.job.ubertask.enable     | Whether to enable the small-jobs "ubertask" optimization,  |
|                                   | which runs "sufficiently small" jobs sequentially within a |
|                                   | single JVM. "Small" is defined by the maxmaps, maxreduces, |
|                                   | and maxbytes settings. Users may override this value.      |
|                                   | Default = false.                                           |
|-----------------------------------+------------------------------------------------------------|
| mapreduce.job.ubertask.maxmaps    | Threshold for the number of maps beyond which the job is   |
|                                   | considered too big for the ubertasking optimization.       |
|                                   | Users may override this value, but only downward.          |
|                                   | Default = 9.                                               |
|-----------------------------------+------------------------------------------------------------|
| mapreduce.job.ubertask.maxreduces | Threshold for the number of reduces beyond which           |
|                                   | the job is considered too big for the ubertasking          |
|                                   | optimization. Currently the code cannot support more       |
|                                   | than one reduce and will ignore larger values. (Zero is    |
|                                   | a valid maximum, however.) Users may override this         |
|                                   | value, but only downward.                                  |
|                                   | Default = 1.                                               |
|-----------------------------------+------------------------------------------------------------|
| mapreduce.job.ubertask.maxbytes   | Threshold for the number of input bytes beyond             |
|                                   | which the job is considered too big for the uber-          |
|                                   | tasking optimization. If no value is specified,            |
|                                   | `dfs.block.size` is used as a default. Be sure to          |
|                                   | specify a default value in `mapred-site.xml` if the        |
|                                   | underlying file system is not HDFS. Users may override     |
|                                   | this value, but only downward.                             |
|                                   | Default = HDFS block size.                                 |
|-----------------------------------+------------------------------------------------------------|

我还不知道是否有特定于 Hive 的方法来设置它，或者您是否只是将上面的方法与 Hive 一起使用。

关于java - 如何在 Hadoop2 中指定 Hive 查询的 uberization？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/24092219/

java - 如何在 Hadoop2 中指定 Hive 查询的 uberization？

上一篇：bash - 如果目录不存在，使用 bash 在 HDFS 中创建目录

下一篇：hadoop - 使用 Hadoop 配置 Pig 关系