我知道如何计算执行器核心和内存。但是谁能解释一下spark.driver.memory是根据什么计算的?
最佳答案
对数据集
的操作(例如collect
take
)需要将所有数据移至应用程序的驱动程序进程中,并在非常大的数据集上执行此操作数据集可能会因 OutOfMemoryError 导致驱动程序进程崩溃。
当您将大量数据收集到驱动程序时,您会增加 spark.driver.memory
。
按照
High Performance Spark by Holden Karau and Rachel Warren (O’Reilly)
most of the computational work of a Spark query is performed by the executors, so increasing the size of the driver rarely speeds up a computation. However, jobs may fail if they collect too much data to the driver or perform large local computations. Thus, increasing the driver memory and correspondingly the value of
spark.driver.maxResultSize
may prevent the out-of-memory errors in the driver.A good heuristic for setting the Spark driver memory is simply the lowest possible value that does not lead to memory errors in the driver, i.e., which gives the maximum possible resources to the executors.
关于apache-spark - Spark Driver 内存计算,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53631853/