Hadoop权威指南中提到了以下内容
"What qualifies as a small job? By default one that has less than 10 mappers, only one reducer, and the input size is less than the size of one HDFS block. "
但是在 YARN 上执行作业之前,如何计算作业中映射器的数量? 在 MR1 中,映射器的数量取决于编号。输入分割。这同样适用于 YARN 吗? 在 YARN 中,容器是灵活的。那么有没有什么方法可以计算可以在给定集群上并行运行的最大映射任务数(某种严格的上限,因为它会让我粗略地了解我可以并行处理多少数据?)? p>
最佳答案
But how does it count no of mapper in a job before executing it on YARN ? In MR1 number of mapper depends on the no. of input splits. is the same applies for the YARN as well ?
是的,在 YARN 中,如果您使用基于 MapReduce 的框架,映射器的数量取决于输入拆分。
In YARN containers are flexible. So Is there any way for computing max number of map task that can run on a given cluster in parallel( some kind of tight upper bound, because it will give me rough idea about how much data i can process in parallel ? ) ?
YARN 集群上可以并行运行的映射任务数量取决于集群上可以启动和并行运行的容器数量。这最终取决于您将如何在集群中配置 MapReduce,这在本指南 cloudera 中有明确解释。 .
关于hadoop - yarn : maximum parallel Map task count,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30003268/