java - 没有堆栈跟踪的嵌入式 neo4j 崩溃

我正在使用 Java API 嵌入运行 neo4j 2.3.0-RC1。它一直在没有警告的情况下崩溃，我正在尝试找出原因。

我之前在 1.9.8 中使用此代码时效果很好。升级到 2.0+ 需要添加事务、更改一些密码语法、启动时 Spring 配置以及少量有限数量的其他更改。

绝大多数代码保持不变，并且在功能上是正确的，已通过单元和集成测试确认。

当引擎启动时，它会相当稳定地添加新节点。下面的输出显示了 290 分钟后的神秘崩溃。

这似乎总是发生。有时 2 小时后，有时 5 小时后。1.9.8 根本不会发生这种情况。

JVM 使用 ./start-engine.sh > console.out 2>&1 & 运行。

start-engine.sh的运行行是

$JAVA_HOME/bin/java -server $JAVA_OPTIONS $JPROFILER_OPTIONS -cp '.:lib/*' package.engine.Main $*

下面是console.out的最后几行。

17437.902: RevokeBias                       [     112          6              5    ]      [    20     6    27    43    26    ]  1
17438.020: RevokeBias                       [     112          3              9    ]      [     5     0     5     0     0    ]  3
17438.338: GenCollectForAllocation          [     113          2              2    ]      [     1     0    11     4    32    ]  2
17438.857: BulkRevokeBias                   [     112          3             13    ]      [     0     0    28     6     2    ]  3
./start-engine.sh: line 17: 19647 Killed                  $JAVA_HOME/bin/java -server $JAVA_OPTIONS $JPROFILER_OPTIONS -cp '.:lib/*' package.engine.Main $*

没有堆栈跟踪，也没有其他错误输出。

这些是 /mnt/engine-data 中 messages.log 的最后几行

2015-10-30 18:07:44.457+0000 INFO  [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [845664646]:  Starting check pointing...
2015-10-30 18:07:44.458+0000 INFO  [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [845664646]:  Starting store flush...
2015-10-30 18:07:44.564+0000 INFO  [o.n.k.i.s.c.CountsTracker] About to rotate counts store at transaction 845664650 to [/mnt/engine-data/neostore.counts.db.b], from [/mnt/engine-data/neostore.counts.db.a].
2015-10-30 18:07:44.565+0000 INFO  [o.n.k.i.s.c.CountsTracker] Successfully rotated counts store at transaction 845664650 to [/mnt/engine-data/neostore.counts.db.b], from [/mnt/engine-data/neostore.counts.db.a].
2015-10-30 18:07:44.834+0000 INFO  [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [845664646]:  Store flush completed
2015-10-30 18:07:44.835+0000 INFO  [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [845664646]:  Starting appending check point entry into the tx log...
2015-10-30 18:07:44.836+0000 INFO  [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [845664646]:  Appending check point entry into the tx log completed
2015-10-30 18:07:44.836+0000 INFO  [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [845664646]:  Check pointing completed
2015-10-30 18:07:44.836+0000 INFO  [o.n.k.i.t.l.p.LogPruningImpl] Log Rotation [35826]:  Starting log pruning.
2015-10-30 18:07:44.844+0000 INFO  [o.n.k.i.t.l.p.LogPruningImpl] Log Rotation [35826]:  Log pruning complete.

所以在崩溃那一刻之前一切看起来都很好，而崩溃完全出乎意料。

messages.log 中还有很多其他数据，但我不知道我在找什么。

$ java -version
java version "1.7.0_65"
Java(TM) SE Runtime Environment (build 1.7.0_65-b17)
Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)

$uname -a
Linux 3.13.0-65-generic #106-Ubuntu SMP Fri Oct 2 22:08:27 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

最佳答案

您可能会看到 Linux Out-of-Memory Killer 的效果，它会在系统物理内存严重不足时终止进程。这可以解释为什么您在日志中找不到任何内容。

引用this excellent article :

Because many applications allocate their memory up front and often don't utilize the memory allocated, the kernel was designed with the ability to over-commit memory to make memory usage more efficient. ……… When too many applications start utilizing the memory they were allocated, the over-commit model sometimes becomes problematic and the kernel must start killing processes …

上面引用的文章是了解 OOM Killer 的重要资源，其中包含大量有关如何对 Linux 进行故障排除和配置以尽量避免该问题的信息。

并引用 this question 的答案:

The OOM Killer has to select the best process to kill. Best here refers to that process which will free up maximum memory upon killing and is also least important to the system.

因为 neo4j 进程很可能是您系统上内存最密集的进程，所以当物理资源开始耗尽时它会被杀死是有道理的。

避免 OOM Killer 的一种方法是尽量让其他内存密集型进程远离同一系统。这应该会大大降低内存过度使用的可能性。但是您至少应该阅读上面的第一篇文章，以便更好地理解 OOM Killer —— 有很多东西需要了解。

关于java - 没有堆栈跟踪的嵌入式 neo4j 崩溃，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/33474347/

java - 没有堆栈跟踪的嵌入式 neo4j 崩溃

上一篇：java - 非法状态异常 : getOutputStream() when creating oracle connection pool in glassfish

下一篇：java - 扫描仪和 .hasNext() 问题