Hadoop reducer 数量配置选项优先级

标签 hadoop configuration mapreduce reduce hadoop-yarn

设置reduce数量的以下3个选项的优先级是什么？换句话说，如果三个都设置了，会考虑哪一个？

选项 1:

setNumReduceTasks(2) within the application code

选项 2:

-D mapreduce.job.reduces=2 as command line argument

选项 3:

through $HADOOP_CONF_DIR/mapred-site.xml file

 <property>
  <name>mapreduce.job.reduces</name>
  <value>2</value>
 </property>

最佳答案

根据Hadoop - 权威指南

The -D option is used to set the configuration property with key color to the value yellow. Options specified with -D take priority over properties from the configuration files. This is very useful because you can put defaults into configuration files and then override them with the -D option as needed. A common example of this is setting the number of reducers for a MapReduce job via -D mapred.reduce.tasks=n. This will override the number of reducers set on the cluster or set in any client-side configuration files.

关于Hadoop reducer 数量配置选项优先级，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/20696449/

上一篇：hadoop - Mahout - 朴素贝叶斯模型非常慢

下一篇：hadoop - Hadoop如何在各个数据节点的硬盘上写入数据？

hadoop - 配置单元查询因错误 “Execution Error, return code 2 from org.apache.hadoop.hive.ql.exe”而停止

configuration - 在另一个项目中添加Slow Cheetah 转换的配置文件作为引用

hadoop - Hive Protocol Buffer - 在 Hive 中创建表时出现 NullPointerException

Android 远程或推送配置文件

sql-server - 如何根据作业名称批量更新 SQL Server 代理作业重试尝试

Hadoop 作业一直在运行，没有分配容器

java - 由于 Task attempt failed to report status 600 秒，reduce 失败。杀戮!解决方案？

python - Python Map Reduce Mr工作

hadoop - Hive 表已排序但未排序插入