java - Spark 内存分数与年轻一代/老一代 java 堆拆分

我在研究Spark，对Executor的内存拆分有一些疑惑。具体来说，在 Spark Apache 文档 ( here ) 中指出:

Java Heap space is divided in to two regions Young and Old. The Young generation is meant to hold short-lived objects while the Old generation is intended for objects with longer lifetimes.

这个:

但对于 Spark Executor，内存还有另一个抽象拆分，如 spark apache doc ( here ) 所述:

Memory usage in Spark largely falls under one of two categories: execution and storage. Execution memory refers to that used for computation in shuffles, joins, sorts and aggregations, while storage memory refers to that used for caching and propagating internal data across the cluster. In Spark, execution and storage share a unified region (M).

如图所示:

我不明白 Young Gen\Old gen 如何与存储\执行内存重叠，因为在同一文档 ( always here ) 中指出:

spark.memory.fraction expresses the size of M as a fraction of the (JVM heap space - 300MiB) (default 0.6). The rest of the space (40%) is reserved for user data structures, internal metadata in Spark, and safeguarding against OOM errors in the case of sparse and unusually large records.

其中spark.memory.fraction表示Java Heap的执行\存储内存部分

但是

If the OldGen is close to being full, reduce the amount of memory used for caching by lowering spark.memory.fraction; it is better to cache fewer objects than to slow down task execution.

这似乎暗示老一代实际上是用户内存，但下面的说法似乎与我的假设相矛盾

If the OldGen is close to being full, alternatively, consider decreasing the size of the Young generation.

我没看到什么？

Young Gen\Old Gen split 与 spark fraction\User Memory 有何关系？

最佳答案

简短的回答是，除了都与 JVM 堆有关之外，它们并没有真正相关。

更好的理解方式是有四个桶(编号不分先后):

激发年轻一代的内存
老一代的 Spark 内存
年轻一代的用户内存
老一代的用户内存

(从技术上讲，还有一些既不是 Spark 也不是用户的系统内存，但这通常足够小，不用担心:这也可以是旧的或年轻的)。

一个对象是被归类为Spark还是User由Spark决定(我其实不知道这是否是一个永恒的名称，或者对象是否可以在这方面改变它们的分类)。

至于旧的还是年轻的，这是由垃圾收集器管理的，GC 可以并且将会把对象从年轻升级到旧。在某些 GC 算法中，世代的大小是动态调整的(或者它们使用固定大小的区域，并且给定的区域可以是老的或年轻的)。

您可以控制 1+2、3+4、1+3 和 2+4 的总容量，但您实际上并没有(并且可能真的不想要，因为这样做有很多好处能够使用一个类别中的多余空间来临时获得另一个类别中的更多空间)控制 1、2、3 或 4 的容量。

关于java - Spark 内存分数与年轻一代/老一代 java 堆拆分，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/63565290/

java - Spark 内存分数与年轻一代/老一代 java 堆拆分

上一篇：javascript - 以问号开头的html标签？

下一篇：json - 使用 python 客户端将 JSON 数据写入 Cassandra，主键选择问题