基于 Spark
的最新版本, shuffle behavior
改变了很多。
问题:
SparkUI
已停止显示是否spill
发生与否(以及发生了多少)。在我的一个实验中,我尝试模拟一种情况,执行器上的随机写入将超过 “JVM Heap Size” * spark.shuffle.memoryFraction * spark.shuffle.safetyFraction
(基于 article )但没有看到任何相关的磁盘溢出日志。有没有办法获取这些信息?
PS:如果这听起来是理论问题,请原谅。
最佳答案
With Spark 1.6.0 ,更新了内存管理系统。简而言之,不再有专用的高速缓存/随机存储器。所有内存均可用于任一操作。来自发行说明
Automatic memory management: Another area of performance gains in Spark 1.6 comes from better memory management. Before Spark 1.6, Spark statically divided the available memory into two regions: execution memory and cache memory. Execution memory is the region that is used in sorting, hashing, and shuffling, while cache memory is used to cache hot data. Spark 1.6 introduces a new memory manager that automatically tunes the size of different memory regions. The runtime automatically grows and shrinks regions according to the needs of the executing application. For many applications, this will mean a significant increase in available memory that can be used for operators such as joins and aggregations, without any user tuning.
This jira ticket给出了更改的背景推理和 this paper深入讨论新的内存管理系统。
关于performance - Spark-1.6.0+ : spark. shuffle.memoryFraction 已弃用 - 何时会发生溢出?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37075721/