java - 当我在共享登录节点中启动 Spark-Shell 时,JVM GC 出了什么问题?

标签 java apache-spark memory-management garbage-collection

我正在一个 HPC 登录节点(由多个用户共享的节点,因此管理员为每个用户设置了一些资源限制)中运行 Spark-shell

当我从命令行启动spark-shell时(我使用的是pyspark 3.0.1附带的bin/spark-shell,不带任何参数)

运行spark-shell -v时显示的错误:

Main class:
org.apache.spark.repl.Main
Arguments:

Spark config:
(spark.jars,)
(spark.app.name,Spark shell)
(spark.submit.pyFiles,)
(spark.ui.showConsoleProgress,true)
(spark.submit.deployMode,client)
(spark.master,local[*])
Classpath elements:



Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Exception in thread "main" java.lang.OutOfMemoryError: Metaspace

我得到了这个核心转储文件:

#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 16 bytes for AllocateHeap
# Possible reasons:
#   The system is out of physical RAM or swap space
#   In 32 bit mode, the process size limit was hit
# Possible solutions:
#   Reduce memory load on the system
#   Increase physical memory or swap space
#   Check if swap backing store is full
#   Use 64 bit Java on a 64 bit OS
#   Decrease Java heap size (-Xmx/-Xms)
#   Decrease number of Java threads
#   Decrease Java thread stack sizes (-Xss)
#   Set larger code cache with -XX:ReservedCodeCacheSize=
# This output file may be truncated or incomplete.
#
#  Out of Memory Error (allocation.inline.hpp:61), pid=428290, tid=0x00002ac0965dd700
#
# JRE version: OpenJDK Runtime Environment (8.0_191-b12) (build 1.8.0_191-b12)
# Java VM: OpenJDK 64-Bit Server VM (25.191-b12 mixed mode linux-amd64 compressed oops)
# Core dump written. Default location: /my/working/dir/tmp/core or core.428290
#

...
...

Heap:
 PSYoungGen      total 281600K, used 39809K [0x00000000eab00000, 0x0000000100000000, 0x0000000100000000)
  eden space 262144K, 7% used [0x00000000eab00000,0x00000000ebf30088,0x00000000fab00000)
  from space 19456K, 98% used [0x00000000fab00000,0x00000000fbdb0540,0x00000000fbe00000)
  to   space 67584K, 0% used [0x00000000fbe00000,0x00000000fbe00000,0x0000000100000000)
 ParOldGen       total 699392K, used 77181K [0x00000000c0000000, 0x00000000eab00000, 0x00000000eab00000)
  object space 699392K, 11% used [0x00000000c0000000,0x00000000c4b5f460,0x00000000eab00000)
 Metaspace       used 72906K, capacity 78552K, committed 78848K, reserved 1118208K
  class space    used 9013K, capacity 9377K, committed 9472K, reserved 1048576K

“PSYoungGen”的第二部分或 ClassSpace 似乎已满?

我不熟悉Java(我不是用java开发,而是我只想正确启动应用程序)

这是我的 ulimit 信息:

$ulimit -a
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 380195
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 16384
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 300
virtual memory          (kbytes, -v) 8388608
file locks                      (-x) unlimited

还有我的内存信息(cat/proc/meminfo)

MemTotal:       97353940 kB
MemFree:         4565860 kB
MemAvailable:   31438168 kB
Buffers:          527764 kB
Cached:         73469304 kB
SwapCached:            0 kB
Active:         50002012 kB
Inactive:       30665992 kB
Active(anon):   49118320 kB
Inactive(anon):  1973220 kB
Active(file):     883692 kB
Inactive(file): 28692772 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:             14092 kB
Writeback:             0 kB
AnonPages:       6671608 kB
Mapped:           382268 kB
Shmem:          44418868 kB
Slab:            9548096 kB
SReclaimable:     752440 kB
SUnreclaim:      8795656 kB
KernelStack:       54816 kB
PageTables:       186136 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    48676968 kB
Committed_AS:   56368676 kB
VmallocTotal:   34359738367 kB
VmallocUsed:     1638408 kB
VmallocChunk:   34307690768 kB
HardwareCorrupted:     0 kB
AnonHugePages:   1048576 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      505664 kB
DirectMap2M:    32661504 kB
DirectMap1G:    68157440 kB

请注意,如果我使用其中一个计算节点(单独分配给我,而不是与其他用户共享)。 Spark-shell 运行正确。

我想知道为什么我有足够的内存,以便我可以联系管理员来调整限制。或者如果我应该使用一些 javaoptions,那就太好了。

如有任何建议,我们将不胜感激!

最佳答案

您设置了虚拟内存限制

virtual memory          (kbytes, -v) 8388608

此选项限制地址空间预留的总大小。 JVM 倾向于提前为大多数内存区域保留地址空间。

您可以通过减少各种非堆空间的最大大小来减少 JVM 保留的内存总量。以下是一些建议。

  • -XX:MaxMetaspaceSize=256m
  • -XX:CompressedClassSpaceSize=128m

不过,提高用户的虚拟内存限制是更好的解决方案。 64 位系统上的虚拟内存量“实际上”是无限的。不过,其他类型的内存限制在 Linux 中不起作用,因此管理员别无选择。

关于java - 当我在共享登录节点中启动 Spark-Shell 时,JVM GC 出了什么问题?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64862134/

相关文章:

java - 如何通过将 BoxLayout 更改为不同的 LayoutManager 来改进 GUI?

java - Groovy 正则表达式

java - 在 Netbeans 中有效,但在 "outside"中无效

python - pyspark: groupby 然后获取每个组的最大值

java - 如何避免在 Java 中产生垃圾?

java - 如何在 websocket open 方法中获取当前用户的 id?

hadoop - 无法在 EMR 中运行 Spark 步骤

arrays - PySpark:替换 ArrayType(String) 中的值

iPhone App 切换后崩溃,可能是由于 UIImage imageNamed

c++ - 在存储/内存中查找特定程序的基地址?