java - 如何在 Hadoop2 中指定 Hive 查询的 uberization?

标签 java hadoop

Hadoop 2 中有一项名为uberization 的新功能。例如,this reference说:

Uberization is the possibility to run all tasks of a MapReduce job in the ApplicationMaster's JVM if the job is small enough. This way, you avoid the overhead of requesting containers from the ResourceManager and asking the NodeManagers to start (supposedly small) tasks.

我无法判断这是否只是在幕后神奇地发生,还是需要做些什么才能发生?例如,在执行 Hive 查询时是否有设置(或提示)来实现这一点?您能否指定“足够小”的阈值?

另外,我很难找到关于这个概念的很多信息 - 它有另一个名字吗?

最佳答案

我在 YARN Book 中找到了详细信息Arun Murthy 关于“ super 工作”的文章:

An Uber Job occurs when multiple mapper and reducers are combined to use a single container. There are four core settings around the configuration of Uber Jobs found in the mapred-site.xml options presented in Table 9.3.

这是表 9.3:

|-----------------------------------+------------------------------------------------------------|
| mapreduce.job.ubertask.enable     | Whether to enable the small-jobs "ubertask" optimization,  |
|                                   | which runs "sufficiently small" jobs sequentially within a |
|                                   | single JVM. "Small" is defined by the maxmaps, maxreduces, |
|                                   | and maxbytes settings. Users may override this value.      |
|                                   | Default = false.                                           |
|-----------------------------------+------------------------------------------------------------|
| mapreduce.job.ubertask.maxmaps    | Threshold for the number of maps beyond which the job is   |
|                                   | considered too big for the ubertasking optimization.       |
|                                   | Users may override this value, but only downward.          |
|                                   | Default = 9.                                               |
|-----------------------------------+------------------------------------------------------------|
| mapreduce.job.ubertask.maxreduces | Threshold for the number of reduces beyond which           |
|                                   | the job is considered too big for the ubertasking          |
|                                   | optimization. Currently the code cannot support more       |
|                                   | than one reduce and will ignore larger values. (Zero is    |
|                                   | a valid maximum, however.) Users may override this         |
|                                   | value, but only downward.                                  |
|                                   | Default = 1.                                               |
|-----------------------------------+------------------------------------------------------------|
| mapreduce.job.ubertask.maxbytes   | Threshold for the number of input bytes beyond             |
|                                   | which the job is considered too big for the uber-          |
|                                   | tasking optimization. If no value is specified,            |
|                                   | `dfs.block.size` is used as a default. Be sure to          |
|                                   | specify a default value in `mapred-site.xml` if the        |
|                                   | underlying file system is not HDFS. Users may override     |
|                                   | this value, but only downward.                             |
|                                   | Default = HDFS block size.                                 |
|-----------------------------------+------------------------------------------------------------|

我还不知道是否有特定于 Hive 的方法来设置它,或者您是否只是将上面的方法与 Hive 一起使用。

关于java - 如何在 Hadoop2 中指定 Hive 查询的 uberization?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24092219/

相关文章:

java - job.getFileCache从HDFS提供Hadoop中的空文件

java - 如何获取 Android/Java 中调试输出的方法名称?

java - 如何在 Java 8 (Scala) 中将日期时间字符串转换为长(UNIX 纪元时间)

绘制多个多边形后Java绘图程序变慢

Java8 stream().map().reduce() 真的是map reduce

hadoop - 删除列重复字符串值并提取列值的最大字符串值

java - 检查表是否存在

hadoop - 无法将 twitter avro 数据正确加载到配置单元表中

hadoop - HDFS:使用 HDFS API 附加到 SequenceFile

java - 使用 getter/setter 方法将 java 对象包装/解包为 Writable 对象是个好主意吗?