我正在运行一个 Hadoop 作业,在我的 yarn-site.xml 文件中,我有以下配置:
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>4096</value>
</property>
但是,我仍然偶尔会遇到以下错误:
Container [pid=63375,containerID=container_1388158490598_0001_01_000003] is running beyond physical memory limits. Current usage: 2.0 GB of 2 GB physical memory used; 2.8 GB of 4.2 GB virtual memory used. Killing container.
我发现通过增加 yarn.scheduler.minimum-allocation-mb,分配给容器的物理内存会增加。但是,我并不总是希望为我的容器分配 4GB,并且认为通过明确指定最大大小,我能够解决这个问题。我意识到 Hadoop 无法在映射器运行之前计算出它需要为容器分配多少内存,那么只有当它需要额外的内存时,我应该如何为容器分配更多内存?
最佳答案
您还应该为 MapReduce 正确配置内存分配。来自 this HortonWorks tutorial :
[...]
For our example cluster, we have the minimum RAM for a Container (yarn.scheduler.minimum-allocation-mb) = 2 GB. We’ll thus assign 4 GB for Map task Containers, and 8 GB for Reduce tasks Containers.
In mapred-site.xml:
mapreduce.map.memory.mb
: 4096
mapreduce.reduce.memory.mb
: 8192Each Container will run JVMs for the Map and Reduce tasks. The JVM heap size should be set to lower than the Map and Reduce memory defined above, so that they are within the bounds of the Container memory allocated by YARN.
In mapred-site.xml:
mapreduce.map.java.opts
:-Xmx3072m
mapreduce.reduce.java.opts
:-Xmx6144m
The above settings configure the upper limit of the physical RAM that Map and Reduce tasks will use.
最后,有人在 this thread in the Hadoop mailing list有同样的问题,在他们的情况下,结果证明他们的代码中存在内存泄漏。
关于Hadoop Yarn 容器没有分配足够的空间,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20803577/