apache-spark - YARN ResourceManager的Total Memory是怎么计算的？

我使用 aws emr 和 YARN-client 在 1 MasterNode、3 WorkerNode 配置中运行 Spark 集群，其中 MasterNode 是客户端机器。所有 4 个节点都有 8GB 内存和 4 个内核。鉴于该硬件设置，我设置了以下内容:

spark.executor.memory = 5G
spark.executor.cores = 3
spark.yarn.executor.memoryOverhead = 600

使用该配置，Yarn 的 ResourceManager 识别的预期 Total Memory 是否为 15GB？它显示18GB。我只看到 Yarn 在运行 Spark 应用程序时最多使用 15GB。是 spark.executor.memory * 3 个节点 中的 15GB 吗？

我想假设 YARN 总内存是通过 spark.executor.memory + spark.yarn.executor.memoryOverhead 计算的，但我找不到任何地方的记录。找到确切数字的正确方法是什么？

我应该能够将 spark.executor.memory 的值增加到 6G 对吗？过去我这样设置时遇到过错误。我需要设置其他配置吗？

编辑 - 所以看起来 workerNodes 的 yarn.scheduler.maximum-allocation-mb 值是 6114 或 6GB。这是 EMR 为实例类型设置的默认值。由于 6GB * 3 = 18GB，这可能是有道理的。我想重新启动 Yarn 并将该值从 6GB 增加到 7GB，但不能，因为这是一个正在使用的集群，所以我想我的问题仍然存在。

最佳答案

I want to assume that the YARN Total Memory is calculated by spark.executor.memory + spark.yarn.executor.memoryOverhead but I can't find that documented anywhere. What's the proper way to find the exact number?

这有点正确，但倒过来说。 YARN 的总内存独立于您为 Spark 设置的任何配置。 yarn.scheduler.maximum-allocation-mb控制 YARN 可以访问多少内存，可以找到 here .要使用 Spark 的所有可用内存，您可以设置 spark.executor.memory + spark.yarn.executor.memoryOverhead等于yarn.scheduler.maximum-allocation-mb .参见 here有关调整 Spark 作业的更多信息和 this spreadsheet用于计算配置。

And I should be able to increase the value of spark.executor.memory to 6G right?

根据电子表格，spark.executor.memory的上限是5502M如果 yarn.scheduler.maximum-allocation-mb 是 6114M .手算，这是.9 * 6114作为spark.executor.memoryOverhead默认为 executorMemory * 0.10 , 最小值为 384 ( source )

关于apache-spark - YARN ResourceManager的Total Memory是怎么计算的？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/50451891/

apache-spark - YARN ResourceManager的Total Memory是怎么计算的？

上一篇：amazon-web-services - Glue 作业因 Amazon S3 超时而失败

下一篇：amazon-web-services - 如何定义接受任何给定值的 Lex 插槽类型(自定义/内置)？