Hadoop reducer 数量配置选项优先级

标签 hadoop configuration mapreduce reduce hadoop-yarn

设置reduce数量的以下3个选项的优先级是什么?换句话说,如果三个都设置了,会考虑哪一个?

选项 1:

setNumReduceTasks(2) within the application code

选项 2:

-D mapreduce.job.reduces=2 as command line argument

选项 3:

through $HADOOP_CONF_DIR/mapred-site.xml file

 <property>
  <name>mapreduce.job.reduces</name>
  <value>2</value>
 </property>

最佳答案

根据Hadoop - 权威指南

The -D option is used to set the configuration property with key color to the value yellow. Options specified with -D take priority over properties from the configuration files. This is very useful because you can put defaults into configuration files and then override them with the -D option as needed. A common example of this is setting the number of reducers for a MapReduce job via -D mapred.reduce.tasks=n. This will override the number of reducers set on the cluster or set in any client-side configuration files.

关于Hadoop reducer 数量配置选项优先级,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20696449/

相关文章:

hadoop - 为什么在执行 "/shared"时未列出 `hadoop fs -ls` 目录,尽管在执行 `hadoop fs -ls/shared/table_name` 时它是可见的

hadoop - 配置单元查询因错误 “Execution Error, return code 2 from org.apache.hadoop.hive.ql.exe”而停止

configuration - 在另一个项目中添加Slow Cheetah 转换的配置文件作为引用

hadoop - Hive Protocol Buffer - 在 Hive 中创建表时出现 NullPointerException

Android 远程或推送配置文件

sql-server - 如何根据作业名称批量更新 SQL Server 代理作业重试尝试

Hadoop 作业一直在运行,没有分配容器

java - 由于 Task attempt failed to report status 600 秒,reduce 失败。杀戮!解决方案?

python - Python Map Reduce Mr工作

hadoop - Hive 表已排序但未排序插入