python - Spark提交错误 'Cannot allocate memory'

标签 python apache-spark

我在 64 位 ubuntu 机器上运行独立的 Spark 1.6.0。

当另一个应用程序已经在 Spark 上运行时,我尝试提交一个新应用程序一旦我使用 conf = SparkConf() 创建默认配置就会引发此错误:

# native 内存分配 (mmap) 无法映射 17896046592 字节以提交保留内存。

但是我正在以这种方式创建上下文

conf = SparkConf()
conf.setMaster(spark_master)
conf.set('spark.cores.max', 60)
conf.set('spark.executor.memory', '256m')
conf.set('spark.rpc.askTimeout', 240)
conf.set('spark.task.maxFailures', 1)
conf.set('spark.driver.memory', '128m')
conf.set('spark.dynamicAllocation.enabled', True)
conf.set('spark.shuffle.service.enabled', True)
ctxt = SparkContext(conf=conf)

所以我无法弄清楚 17896046592 字节(16.6 GB)来自哪里。

这是大师的输出:

Successfully imported Spark Modules
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x0000000180000000, 17896046592, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 17896046592 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /home/spark/.pyenv/930221056_2001_st-dev/hs_err_pid631.log
Traceback (most recent call last):
  File "/home/spark/.pyenv/930221056_2001_st-dev/bin/run_backtes2.py", line 178, in <module>
    args.config_id, args.mnemonic, args.start_date, args.end_date, extra_backtest_args, cmd_line)
  File "/home/spark/.pyenv/930221056_2001_st-dev/bin/run_backtest2.py", line 26, in _main_notify
    args, extra_backtest_args, config_id, mnemonic, cmd_line, env_config, range)
  File "/home/spark/.pyenv/930221056_2001_st-dev/bin/run_backtest2.py", line 100, in run_backtest_main
    res = runner.run_and_log_backtest(backtest_range)
  File "/home/spark/.pyenv/930221056_2001_st-dev/local/lib/python2.7/site-packages/st/backtesting/backtest_runner.py", line 563, in run_and_log_backtest
    subranges_output = self._run_subranges_on_spark(subranges_to_run)
  File "/home/spark/.pyenv/930221056_2001_st-dev/local/lib/python2.7/site-packages/st/backtesting/backtest_runner.py", line 611, in _run_subranges_on_spark
    executor_memory='128m', max_failures=1, driver_memory='128m')
  File "/home/spark/.pyenv/930221056_2001_st-dev/local/lib/python2.7/site-packages/st/backtesting/backtest_runner.py", line 98, in __init__
    max_failures=max_failures, driver_memory=driver_memory)
  File "/home/spark/.pyenv/930221056_2001_st-dev/local/lib/python2.7/site-packages/st/backtesting/backtest_runner.py", line 66, in create_context
    conf = SparkConf()
  File "/home/spark/spark-1.6.0-bin-cdh4/python/pyspark/conf.py", line 104, in __init__
    SparkContext._ensure_initialized()
  File "/home/spark/spark-1.6.0-bin-cdh4/python/pyspark/context.py", line 245, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway()
  File "/home/spark/spark-1.6.0-bin-cdh4/python/pyspark/java_gateway.py", line 94, in launch_gateway
    raise Exception("Java gateway process exited before sending the driver its port number")

# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 17896046592 bytes for committing reserved memory.
# Possible reasons:
#   The system is out of physical RAM or swap space
#   In 32 bit mode, the process size limit was hit
# Possible solutions:
#   Reduce memory load on the system
#   Increase physical memory or swap space
#   Check if swap backing store is full
#   Use 64 bit Java on a 64 bit OS
#   Decrease Java heap size (-Xmx/-Xms)
#   Decrease number of Java threads
#   Decrease Java thread stack sizes (-Xss)
#   Set larger code cache with -XX:ReservedCodeCacheSize=
# This output file may be truncated or incomplete.
#
#  Out of Memory Error (os_linux.cpp:2627), pid=19061, tid=0x00007fb15f814700
#
# JRE version:  (8.0_111-b14) (build )
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.111-b14 mixed mode linux-amd64 compressed oops)
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#

这似乎只有当其他应用程序已经在 Spark 集群上运行并且主计算机有大约 10GB 的可用内存时才会发生。其他运行的应用程序也都指定 conf.set('spark.driver.memory', '1g')

最佳答案

解决方案: 变量

spark.executor.memory 22g 

配置文件中的配置优先于使用

设置的配置
conf.set('spark.executor.memory', '256m')

关于python - Spark提交错误 'Cannot allocate memory',我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44547393/

相关文章:

python - VSCode Settings.json 丢失

dataframe - PySpark 列到其值的 RDD

python - 如何使用 MLlib 在 Spark 上生成(原始标签、预测标签)的元组?

hadoop - 来自不同用户的 spark-shell 导致错误

python - 如何在Python 3.7中使用Pygame显示用Pillow加载的图像?

python - 如何使用 isalpha() 将非字母字符替换为空格?

python - 做tostring()时丢失原始文本

apache-spark - 使用 S3 时支持 Parquet 作为输入/输出格式

scala - 如何在 Spark Notebook 中导入库

Python MySQL 数字匹配