我正在一个 HPC 登录节点(由多个用户共享的节点,因此管理员为每个用户设置了一些资源限制)中运行 Spark-shell
当我从命令行启动spark-shell时(我使用的是pyspark 3.0.1附带的bin/spark-shell
,不带任何参数)
运行spark-shell -v
时显示的错误:
Main class:
org.apache.spark.repl.Main
Arguments:
Spark config:
(spark.jars,)
(spark.app.name,Spark shell)
(spark.submit.pyFiles,)
(spark.ui.showConsoleProgress,true)
(spark.submit.deployMode,client)
(spark.master,local[*])
Classpath elements:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Exception in thread "main" java.lang.OutOfMemoryError: Metaspace
我得到了这个核心转储文件:
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 16 bytes for AllocateHeap
# Possible reasons:
# The system is out of physical RAM or swap space
# In 32 bit mode, the process size limit was hit
# Possible solutions:
# Reduce memory load on the system
# Increase physical memory or swap space
# Check if swap backing store is full
# Use 64 bit Java on a 64 bit OS
# Decrease Java heap size (-Xmx/-Xms)
# Decrease number of Java threads
# Decrease Java thread stack sizes (-Xss)
# Set larger code cache with -XX:ReservedCodeCacheSize=
# This output file may be truncated or incomplete.
#
# Out of Memory Error (allocation.inline.hpp:61), pid=428290, tid=0x00002ac0965dd700
#
# JRE version: OpenJDK Runtime Environment (8.0_191-b12) (build 1.8.0_191-b12)
# Java VM: OpenJDK 64-Bit Server VM (25.191-b12 mixed mode linux-amd64 compressed oops)
# Core dump written. Default location: /my/working/dir/tmp/core or core.428290
#
...
...
Heap:
PSYoungGen total 281600K, used 39809K [0x00000000eab00000, 0x0000000100000000, 0x0000000100000000)
eden space 262144K, 7% used [0x00000000eab00000,0x00000000ebf30088,0x00000000fab00000)
from space 19456K, 98% used [0x00000000fab00000,0x00000000fbdb0540,0x00000000fbe00000)
to space 67584K, 0% used [0x00000000fbe00000,0x00000000fbe00000,0x0000000100000000)
ParOldGen total 699392K, used 77181K [0x00000000c0000000, 0x00000000eab00000, 0x00000000eab00000)
object space 699392K, 11% used [0x00000000c0000000,0x00000000c4b5f460,0x00000000eab00000)
Metaspace used 72906K, capacity 78552K, committed 78848K, reserved 1118208K
class space used 9013K, capacity 9377K, committed 9472K, reserved 1048576K
“PSYoungGen”的第二部分或 ClassSpace 似乎已满?
我不熟悉Java(我不是用java开发,而是我只想正确启动应用程序)
这是我的 ulimit 信息:
$ulimit -a
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 380195
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 16384
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 300
virtual memory (kbytes, -v) 8388608
file locks (-x) unlimited
还有我的内存信息(cat/proc/meminfo)
MemTotal: 97353940 kB
MemFree: 4565860 kB
MemAvailable: 31438168 kB
Buffers: 527764 kB
Cached: 73469304 kB
SwapCached: 0 kB
Active: 50002012 kB
Inactive: 30665992 kB
Active(anon): 49118320 kB
Inactive(anon): 1973220 kB
Active(file): 883692 kB
Inactive(file): 28692772 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 14092 kB
Writeback: 0 kB
AnonPages: 6671608 kB
Mapped: 382268 kB
Shmem: 44418868 kB
Slab: 9548096 kB
SReclaimable: 752440 kB
SUnreclaim: 8795656 kB
KernelStack: 54816 kB
PageTables: 186136 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 48676968 kB
Committed_AS: 56368676 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 1638408 kB
VmallocChunk: 34307690768 kB
HardwareCorrupted: 0 kB
AnonHugePages: 1048576 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 505664 kB
DirectMap2M: 32661504 kB
DirectMap1G: 68157440 kB
请注意,如果我使用其中一个计算节点(单独分配给我,而不是与其他用户共享)。 Spark-shell 运行正确。
我想知道为什么我有足够的内存,以便我可以联系管理员来调整限制。或者如果我应该使用一些 javaoptions,那就太好了。
如有任何建议,我们将不胜感激!
最佳答案
您设置了虚拟内存限制
virtual memory (kbytes, -v) 8388608
此选项限制地址空间预留的总大小。 JVM 倾向于提前为大多数内存区域保留地址空间。
您可以通过减少各种非堆空间的最大大小来减少 JVM 保留的内存总量。以下是一些建议。
-XX:MaxMetaspaceSize=256m
-XX:CompressedClassSpaceSize=128m
不过,提高用户的虚拟内存限制是更好的解决方案。 64 位系统上的虚拟内存量“实际上”是无限的。不过,其他类型的内存限制在 Linux 中不起作用,因此管理员别无选择。
关于java - 当我在共享登录节点中启动 Spark-Shell 时,JVM GC 出了什么问题?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64862134/