apache-spark - Spark Driver 内存计算

标签 apache-spark memory memory-management driver executor

我知道如何计算执行器核心和内存。但是谁能解释一下spark.driver.memory是根据什么计算的?

最佳答案

数据集的操作(例如collect take)需要将所有数据移至应用程序的驱动程序进程中,并在非常大的数据集上执行此操作数据集可能会因 OutOfMemoryError 导致驱动程序进程崩溃。

当您将大量数据收集到驱动程序时,您会增加 spark.driver.memory

按照

High Performance Spark by Holden Karau and Rachel Warren (O’Reilly)

most of the computational work of a Spark query is performed by the executors, so increasing the size of the driver rarely speeds up a computation. However, jobs may fail if they collect too much data to the driver or perform large local computations. Thus, increasing the driver memory and correspondingly the value of spark.driver.maxResultSize may prevent the out-of-memory errors in the driver.

A good heuristic for setting the Spark driver memory is simply the lowest possible value that does not lead to memory errors in the driver, i.e., which gives the maximum possible resources to the executors.

关于apache-spark - Spark Driver 内存计算,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53631853/

相关文章:

r - 内存使用量一直增长直到崩溃

c# - Linq查询blob列表,内存占用

在 C 中释放内存的正确方法

apache-spark - 如何让 spark 为空的 Parquet 输出写一个 _SUCCESS 文件?

hadoop - 如何在spark sql聚合中添加三列整数

java - 更改 DataFrame.write() 的输出文件名前缀

java - 使用 Cloudera 5.14 和 Spark2 : Livy can't find its own JAR files 配置 Livy

assembly - 代码段和数据段背后的基本原理

objective-c - iOS:NSMutableArray 为空

iphone - NSMutableArray 访问问题