java - 奇怪的 Java 线程被卡住的问题

标签 java multithreading

Java应用程序的日志时间突然从22:20跳到第二天00:33。 22:20之前或00:33之后应用程序正常运行。该应用似乎在 22:20 到 00:33 之间被卡住。

我的应用程序中有一个线程池。我使用DelayQueue来处理超时业务任务(Runnables)。于是我向线程池提交一个任务来轮询超时业务任务并处理

delayFuture = pool.submit(() -> {
    logger.debug("delayed task execution thread: {}", Thread.currentThread().getName());
    while (true) {
        try {
            AbstractTask task = delayQueue.poll(5, TimeUnit.SECONDS);
            if (task != null && !task.isDone()) {
                task.setOvertime(true);
                task.getExecutingThread().interrupt();

                pool.submit(() -> {
                    task.fireTimeout();
                });
            }
        } catch (Exception e) {
            logger.error("delay task error", e);
        }
    }
});

所有业务任务都以固定延迟执行类似的工作。业务任务执行时,都是在线程池的一个线程中一一完成几个步骤。但日志显示某些任务在同一步骤中“被阻止”

LinkedBlockingQueue.poll(timeout, unit);

日志时间从22:20跳到00:33又恢复正常,真是奇怪。

2019-02-16 22:20:20,627 INFO  (biz_thread_64:com.some.company.task.AbstractTask.interruptProcess:31) - device some_device sent control cmd response
2019-02-16 22:20:20,647 INFO  (biz_thread_64:com.some.company.task.AbstractTask.interruptProcess:36) - user some_user received device some_device control cmd response

2019-02-16 22:20:25,809 INFO  (biz_thread_64:com.some.company.task.AbstractTask.interruptProcess:38) - device some_device disconnected, cancel device heartbeat: true
// the next line will be loged after LinkedBlockingQueue.poll(timeout, unit);
2019-02-17 00:33:18,751 INFO  (biz_thread_64:com.some.company.task.AbstractTask.interruptProcess:46) - user some_user received device some_device offline message

2019-02-17 00:33:18,747 INFO  (biz_thread_56:com.some.company.task.TaskManager.lambda$null$0:54) - fire task timeout operation: user get device offline notification, devMac=some_device, userToken=some_user
2019-02-17 00:33:18,951 INFO  (biz_thread_56:com.some.company.task.AbstractTask.resetState:174) - device some_device disconnected, cancel device heartbeat: false
2019-02-17 00:33:18,951 INFO  (biz_thread_56:com.some.company.task.AbstractTask.resetState:183) - user some_user disconnected, cancel user heartbeat: true

2019-02-17 00:33:18,962 INFO  (biz_thread_64:com.some.company.task.TaskManager.exec:170) - task is overtime: task=user get device offline notification, devMac=some_device, userToken=some_user

第 3 行和第 4 行之间有一个跳转。但通常两个步骤之间的时间间隔应小于 4 分钟。该应用似乎在 22:20 到 00:33 之间被卡住。

其他一些有用的信息:

  1. 线程池大小固定;

  2. 应用程序正在虚拟机中运行,但另一个 java 应用程序在 22:20 到 00:33 之间有日志,因此虚拟机在该时间范围内没有 hibernate ;

  3. 还有另一个固定延迟任务打印每个业务任务状态,但它在 22:20 到 00:33 之间也没有打印任何内容;

  4. 有一个基于Vertx的http服务器,提供RESTful服务。当问题发生时,RESTful 服务在请求时无响应,即向 http 服务器发出 http GET 请求时浏览器被阻止。但 telnet 可以连接到 http 服务器。

如有任何建议,我们将不胜感激。

更新(2019-02-26 18:13)

又出事了!我的应用程序在2019-02-26 17:30:17,576卡住,并在2019-02-26 19:40:29,655恢复,大约2小时10分12秒。该问题最后一次出现的时间为 2 小时 12 分 53 秒。

问题发生时,jstack和jmap命令都执行失败:

[cloud-user@extdev-platform-ug bin]$ ~/jdk1.8.0_192/bin/jstack -F 10914
Attaching to process ID 10914, please wait...
Error attaching to process: sun.jvm.hotspot.debugger.DebuggerException: Can't attach to the process: ptrace(PTRACE_ATTACH, ..) failed for 10914: Operation not permitted
sun.jvm.hotspot.debugger.DebuggerException: sun.jvm.hotspot.debugger.DebuggerException: Can't attach to the process: ptrace(PTRACE_ATTACH, ..) failed for 10914: Operation not permitted
    at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal$LinuxDebuggerLocalWorkerThread.execute(LinuxDebuggerLocal.java:163)
    at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.attach(LinuxDebuggerLocal.java:278)
    at sun.jvm.hotspot.HotSpotAgent.attachDebugger(HotSpotAgent.java:671)
    at sun.jvm.hotspot.HotSpotAgent.setupDebuggerLinux(HotSpotAgent.java:611)
    at sun.jvm.hotspot.HotSpotAgent.setupDebugger(HotSpotAgent.java:337)
    at sun.jvm.hotspot.HotSpotAgent.go(HotSpotAgent.java:304)
    at sun.jvm.hotspot.HotSpotAgent.attach(HotSpotAgent.java:140)
    at sun.jvm.hotspot.tools.Tool.start(Tool.java:185)
    at sun.jvm.hotspot.tools.Tool.execute(Tool.java:118)
    at sun.jvm.hotspot.tools.JStack.main(JStack.java:92)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at sun.tools.jstack.JStack.runJStackTool(JStack.java:140)
    at sun.tools.jstack.JStack.main(JStack.java:106)
Caused by: sun.jvm.hotspot.debugger.DebuggerException: Can't attach to the process: ptrace(PTRACE_ATTACH, ..) failed for 10914: Operation not permitted
    at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.attach0(Native Method)
    at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.access$100(LinuxDebuggerLocal.java:62)
    at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal$1AttachTask.doit(LinuxDebuggerLocal.java:269)
    at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal$LinuxDebuggerLocalWorkerThread.run(LinuxDebuggerLocal.java:138)

[cloud-user@extdev-platform-ug bin]$ ~/jdk1.8.0_192/bin/jmap -heap 10914
Attaching to process ID 10914, please wait...
Error attaching to process: sun.jvm.hotspot.debugger.DebuggerException: Can't attach to the process: ptrace(PTRACE_ATTACH, ..) failed for 10914: Operation not permitted
sun.jvm.hotspot.debugger.DebuggerException: sun.jvm.hotspot.debugger.DebuggerException: Can't attach to the process: ptrace(PTRACE_ATTACH, ..) failed for 10914: Operation not permitted
    at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal$LinuxDebuggerLocalWorkerThread.execute(LinuxDebuggerLocal.java:163)
    at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.attach(LinuxDebuggerLocal.java:278)
    at sun.jvm.hotspot.HotSpotAgent.attachDebugger(HotSpotAgent.java:671)
    at sun.jvm.hotspot.HotSpotAgent.setupDebuggerLinux(HotSpotAgent.java:611)
    at sun.jvm.hotspot.HotSpotAgent.setupDebugger(HotSpotAgent.java:337)
    at sun.jvm.hotspot.HotSpotAgent.go(HotSpotAgent.java:304)
    at sun.jvm.hotspot.HotSpotAgent.attach(HotSpotAgent.java:140)
    at sun.jvm.hotspot.tools.Tool.start(Tool.java:185)
    at sun.jvm.hotspot.tools.Tool.execute(Tool.java:118)
    at sun.jvm.hotspot.tools.HeapSummary.main(HeapSummary.java:49)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at sun.tools.jmap.JMap.runTool(JMap.java:201)
    at sun.tools.jmap.JMap.main(JMap.java:130)
Caused by: sun.jvm.hotspot.debugger.DebuggerException: Can't attach to the process: ptrace(PTRACE_ATTACH, ..) failed for 10914: Operation not permitted
    at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.attach0(Native Method)
    at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.access$100(LinuxDebuggerLocal.java:62)
    at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal$1AttachTask.doit(LinuxDebuggerLocal.java:269)
    at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal$LinuxDebuggerLocalWorkerThread.run(LinuxDebuggerLocal.java:138)

出现问题前的最后一次GC日志

[GC pause (G1 Evacuation Pause) (young), 0.0388874 secs]
   [Parallel Time: 26.9 ms, GC Workers: 2]
      [GC Worker Start (ms): Min: 22597871.6, Avg: 22597871.6, Max: 22597871.7, Diff: 0.0]
      [Ext Root Scanning (ms): Min: 3.8, Avg: 3.9, Max: 4.0, Diff: 0.2, Sum: 7.9]
      [Update RS (ms): Min: 9.8, Avg: 9.8, Max: 9.8, Diff: 0.1, Sum: 19.6]
         [Processed Buffers: Min: 245, Avg: 266.0, Max: 287, Diff: 42, Sum: 532]
      [Scan RS (ms): Min: 2.3, Avg: 2.3, Max: 2.4, Diff: 0.2, Sum: 4.7]
      [Code Root Scanning (ms): Min: 0.2, Avg: 0.7, Max: 1.2, Diff: 1.0, Sum: 1.4]
      [Object Copy (ms): Min: 9.4, Avg: 9.9, Max: 10.5, Diff: 1.1, Sum: 19.8]
      [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 2]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum: 0.1]
      [GC Worker Total (ms): Min: 26.7, Avg: 26.7, Max: 26.7, Diff: 0.0, Sum: 53.5]
      [GC Worker End (ms): Min: 22597898.4, Avg: 22597898.4, Max: 22597898.4, Diff: 0.0]
   [Code Root Fixup: 0.0 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.9 ms]
   [Other: 11.1 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 8.4 ms]
      [Ref Enq: 0.1 ms]
      [Redirty Cards: 0.2 ms]
      [Humongous Register: 0.0 ms]
      [Humongous Reclaim: 0.0 ms]
      [Free CSet: 0.7 ms]
   [Eden: 604.0M(604.0M)->0.0B(604.0M) Survivors: 10.0M->10.0M Heap: 707.4M(1024.0M)->103.4M(1024.0M)]
 [Times: user=0.06 sys=0.00, real=0.04 secs] 

我的应用程序使用选项运行

-Xms1g -Xmx1g -server -XX:+PrintGCDetails -verbose:gc -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:+HeapDumpOnOutOfMemoryError

top 命令给出

top -H -p 10914

Tasks: 545 total,   0 running,   0 sleeping, 545 stopped,   0 zombie
Cpu0  :  0.7%us,  1.0%sy,  0.0%ni, 97.3%id,  0.0%wa,  0.0%hi,  0.0%si,  1.0%st
Cpu1  :  4.0%us,  0.7%sy,  0.0%ni, 95.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   5991196k total,  5858276k used,   132920k free,   225448k buffers
Swap:  6291448k total,   132044k used,  6159404k free,  3820256k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                 
10914 cloud-us  20   0 4151m 1.0g  15m T  0.0 17.3   0:00.10 java                                                                     
10915 cloud-us  20   0 4151m 1.0g  15m T  0.0 17.3   0:03.17 java                                                                     
10916 cloud-us  20   0 4151m 1.0g  15m T  0.0 17.3   0:00.68 java                                                                     
10917 cloud-us  20   0 4151m 1.0g  15m T  0.0 17.3   0:00.71 java                                                                     
10918 cloud-us  20   0 4151m 1.0g  15m T  0.0 17.3   0:20.87 java                                                                     
10919 cloud-us  20   0 4151m 1.0g  15m T  0.0 17.3   0:00.07 java                                                                     
10920 cloud-us  20   0 4151m 1.0g  15m T  0.0 17.3   0:00.13 java                                                                     
10921 cloud-us  20   0 4151m 1.0g  15m T  0.0 17.3   0:00.05 java                                                                     
10922 cloud-us  20   0 4151m 1.0g  15m T  0.0 17.3   0:00.16 java

netstat命令显示有16个处于CLOSE_WAIT或ESTABLISHED状态的TCP连接,并且根据日志,16也是未正常完成的链式任务的数量。

我的应用程序定期打印业务线程池中线程状态的日志。问题前最后一张

    "biz_thread_0":"TIMED_WAITING",
    "biz_thread_1":"TIMED_WAITING",
    "biz_thread_2":"TIMED_WAITING",
    "biz_thread_3":"TIMED_WAITING",
    "biz_thread_4":"TIMED_WAITING",
    "biz_thread_5":"TIMED_WAITING",
    "biz_thread_6":"WAITING",
    "biz_thread_7":"TIMED_WAITING",
    "biz_thread_8":"TIMED_WAITING",
    "biz_thread_9":"TIMED_WAITING",
    "biz_thread_10":"WAITING",
    "biz_thread_11":"TIMED_WAITING",
    "biz_thread_12":"WAITING",
    "biz_thread_13":"TIMED_WAITING",
    "biz_thread_14":"TIMED_WAITING",
    "biz_thread_15":"TIMED_WAITING",
    "biz_thread_16":"TIMED_WAITING",
    "biz_thread_17":"TIMED_WAITING",
    "biz_thread_18":"TIMED_WAITING",
    "biz_thread_19":"TIMED_WAITING",
    "biz_thread_20":"TIMED_WAITING",
    "biz_thread_21":"TIMED_WAITING",
    "biz_thread_22":"TIMED_WAITING",
    "biz_thread_23":"TIMED_WAITING",
    "biz_thread_24":"TIMED_WAITING",
    "biz_thread_25":"TIMED_WAITING",
    "biz_thread_26":"WAITING",
    "biz_thread_27":"WAITING",
    "biz_thread_28":"WAITING",
    "biz_thread_29":"WAITING",
    "biz_thread_30":"WAITING",
    "biz_thread_31":"TIMED_WAITING",
    "biz_thread_32":"TIMED_WAITING",
    "biz_thread_33":"TIMED_WAITING",
    "biz_thread_34":"RUNNABLE",
    "biz_thread_35":"TIMED_WAITING",
    "biz_thread_36":"TIMED_WAITING",
    "biz_thread_37":"WAITING",
    "biz_thread_38":"TIMED_WAITING",
    "biz_thread_39":"TIMED_WAITING",
    "biz_thread_40":"TIMED_WAITING",
    "biz_thread_41":"TIMED_WAITING",
    "biz_thread_42":"TIMED_WAITING",
    "biz_thread_43":"TIMED_WAITING",
    "biz_thread_44":"WAITING",
    "biz_thread_45":"WAITING",
    "biz_thread_46":"TIMED_WAITING",
    "biz_thread_47":"TIMED_WAITING",
    "biz_thread_48":"TIMED_WAITING",
    "biz_thread_49":"TIMED_WAITING",
    "biz_thread_50":"TIMED_WAITING",
    "biz_thread_51":"WAITING",
    "biz_thread_52":"TIMED_WAITING",
    "biz_thread_53":"TIMED_WAITING",
    "biz_thread_54":"TIMED_WAITING",
    "biz_thread_55":"TIMED_WAITING",
    "biz_thread_56":"WAITING",
    "biz_thread_57":"TIMED_WAITING",
    "biz_thread_58":"TIMED_WAITING",
    "biz_thread_59":"TIMED_WAITING",
    "biz_thread_60":"TIMED_WAITING",
    "biz_thread_61":"TIMED_WAITING",
    "biz_thread_62":"TIMED_WAITING",
    "biz_thread_63":"TIMED_WAITING",
    "biz_thread_64":"TIMED_WAITING",
    "biz_thread_65":"WAITING",
    "biz_thread_66":"TIMED_WAITING",
    "biz_thread_67":"WAITING",
    "biz_thread_68":"WAITING",
    "biz_thread_69":"TIMED_WAITING",
    "biz_thread_70":"TIMED_WAITING",
    "biz_thread_71":"TIMED_WAITING",
    "biz_thread_72":"WAITING",
    "biz_thread_73":"TIMED_WAITING",
    "biz_thread_74":"TIMED_WAITING",
    "biz_thread_75":"TIMED_WAITING",
    "biz_thread_76":"WAITING",
    "biz_thread_77":"TIMED_WAITING",
    "biz_thread_78":"TIMED_WAITING",
    "biz_thread_79":"TIMED_WAITING",
    "biz_thread_80":"TIMED_WAITING",
    "biz_thread_81":"TIMED_WAITING",
    "biz_thread_82":"WAITING",
    "biz_thread_83":"TIMED_WAITING",
    "biz_thread_84":"TIMED_WAITING",
    "biz_thread_85":"TIMED_WAITING",
    "biz_thread_86":"TIMED_WAITING",
    "biz_thread_87":"TIMED_WAITING",
    "biz_thread_88":"TIMED_WAITING",
    "biz_thread_89":"WAITING",
    "biz_thread_90":"TIMED_WAITING",
    "biz_thread_91":"TIMED_WAITING",
    "biz_thread_92":"TIMED_WAITING",
    "biz_thread_93":"TIMED_WAITING",
    "biz_thread_94":"WAITING",
    "biz_thread_95":"TIMED_WAITING",
    "biz_thread_96":"TIMED_WAITING",
    "biz_thread_97":"TIMED_WAITING",
    "biz_thread_98":"TIMED_WAITING",
    "biz_thread_99":"TIMED_WAITING",
    "biz_thread_100":"TIMED_WAITING",
    "biz_thread_101":"TIMED_WAITING",
    "biz_thread_102":"TIMED_WAITING",
    "biz_thread_103":"WAITING",
    "biz_thread_114":"WAITING",
    "biz_thread_113":"TIMED_WAITING",
    "biz_thread_112":"WAITING",
    "biz_thread_111":"TIMED_WAITING",
    "biz_thread_110":"TIMED_WAITING",
    "biz_thread_109":"TIMED_WAITING",
    "biz_thread_108":"TIMED_WAITING",
    "biz_thread_107":"TIMED_WAITING",
    "biz_thread_106":"TIMED_WAITING",
    "biz_thread_105":"WAITING",
    "biz_thread_104":"TIMED_WAITING",
    "biz_thread_115":"TIMED_WAITING",
    "biz_thread_116":"TIMED_WAITING",
    "biz_thread_122":"TIMED_WAITING",
    "biz_thread_121":"TIMED_WAITING",
    "biz_thread_120":"TIMED_WAITING",
    "biz_thread_119":"TIMED_WAITING",
    "biz_thread_118":"TIMED_WAITING",
    "biz_thread_117":"TIMED_WAITING",
    "biz_thread_126":"TIMED_WAITING",
    "biz_thread_125":"TIMED_WAITING",
    "biz_thread_124":"WAITING",
    "biz_thread_123":"WAITING",
    "biz_thread_127":"WAITING"

最佳答案

终于!

我不知道哪个java选项或其他东西有帮助,但是nohup stdout重定向文件中有完整的线程转储(第一次发生时没有线程转储)。

2019-02-26 19:40:29
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.192-b12 mixed mode):

"biz_thread_87" #104 prio=5 os_prio=0 tid=0x00007f93e76fd800 nid=0x2b24 waiting for monitor entry [0x00007f938bfbe000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at org.apache.logging.log4j.core.appender.rolling.RollingFileManager.checkRollover(RollingFileManager.java:217)
    - waiting to lock <0x00000000c06725d8> (a org.apache.logging.log4j.core.appender.rolling.RollingRandomAccessFileManager)
    at org.apache.logging.log4j.core.appender.RollingRandomAccessFileAppender.append(RollingRandomAccessFileAppender.java:207)
    at org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:156)
    at org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:129)
    at org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:120)
    at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:84)
    at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:448)
    at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:433)
    at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:417)
    at org.apache.logging.log4j.core.config.LoggerConfig.logParent(LoggerConfig.java:439)
    at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:434)
    at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:417)
    at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:403)
    at org.apache.logging.log4j.core.config.AwaitCompletionReliabilityStrategy.log(AwaitCompletionReliabilityStrategy.java:63)
    at org.apache.logging.log4j.core.Logger.logMessage(Logger.java:146)
    at org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2091)
    at org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:1999)
    at org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1868)
    at org.apache.logging.slf4j.Log4jLogger.info(Log4jLogger.java:194)
    at com.some.company.task.impl.StartTask.interruptProcess(StartTask.java:17)
    at com.some.company.common.task.AbstractTask.process(AbstractTask.java:88)
    at com.some.company.common.task.TaskManager.exec(TaskManager.java:174)
    at com.some.company.common.task.Context.lambda$resubmit$0(Context.java:426)
    at com.some.company.common.task.Context$$Lambda$97/519533636.run(Unknown Source)
    at com.some.company.common.task.TaskManager.lambda$delayExec$2(TaskManager.java:121)
    at com.some.company.common.task.TaskManager$$Lambda$37/873634936.run(Unknown Source)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

"nioEventLoopGroup-173-1" #315 prio=10 os_prio=0 tid=0x00007f939420e000 nid=0x2ced runnable [0x00007f937b3e8000]
   java.lang.Thread.State: RUNNABLE
    at java.io.RandomAccessFile.writeBytes(Native Method)
    at java.io.RandomAccessFile.write(RandomAccessFile.java:525)
    at org.apache.logging.log4j.core.appender.rolling.RollingRandomAccessFileManager.writeToDestination(RollingRandomAccessFileManager.java:109)
    - eliminated <0x00000000c06725d8> (a org.apache.logging.log4j.core.appender.rolling.RollingRandomAccessFileManager)
    at org.apache.logging.log4j.core.appender.OutputStreamManager.flushBuffer(OutputStreamManager.java:288)
    - eliminated <0x00000000c06725d8> (a org.apache.logging.log4j.core.appender.rolling.RollingRandomAccessFileManager)
    at org.apache.logging.log4j.core.appender.rolling.RollingRandomAccessFileManager.flush(RollingRandomAccessFileManager.java:128)
    - locked <0x00000000c06725d8> (a org.apache.logging.log4j.core.appender.rolling.RollingRandomAccessFileManager)
    at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.directEncodeEvent(AbstractOutputStreamAppender.java:179)
    at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.tryAppend(AbstractOutputStreamAppender.java:170)
    at org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.append(AbstractOutputStreamAppender.java:161)
    at org.apache.logging.log4j.core.appender.RollingRandomAccessFileAppender.append(RollingRandomAccessFileAppender.java:218)
    at org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:156)
    at org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:129)
    at org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:120)
    at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:84)
    at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:448)
    at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:433)
    at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:417)
    at org.apache.logging.log4j.core.config.LoggerConfig.logParent(LoggerConfig.java:439)
    at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:434)
    at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:417)
    at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:403)
    at org.apache.logging.log4j.core.config.AwaitCompletionReliabilityStrategy.log(AwaitCompletionReliabilityStrategy.java:63)
    at org.apache.logging.log4j.core.Logger.logMessage(Logger.java:146)
    at org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2091)
    at org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2005)
    at org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1876)
    at org.apache.logging.slf4j.Log4jLogger.info(Log4jLogger.java:184)
    at com.some.company.UserWifiClient.sendHeartbeat(UserWifiEppClient.java:39)
    at com.some.company.AbstractNettyClient.lambda$scheduleHeartbeat$0(AbstractNettyClient.java:146)
    at com.some.company.AbstractNettyClient$$Lambda$73/2134853806.run(Unknown Source)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:133)
    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:495)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:905)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.lang.Thread.run(Thread.java:748)

有 101 个被阻止的线程和 94 个线程“等待锁定 <0x00000000c06725d8>”。

所以我认为这一定是log4j Logger的误用。在 AbstractTask 中,“记录器”的创建如下

protected final static Logger logger = LoggerFactory.getLogger(AbstractTask.class);

所有 AbstractTask 实现都只使用这个“记录器”,但是以这种方式创建的记录器在多线程尝试使用它时会引入争用。现在我在 AbstractTask 中以这种方式创建记录器

protected Logger logger = LoggerFactory.getLogger(getClass());

然后每个实际任务应该有自己的记录器对象来使用,并且不会面临竞争条件问题。

请参阅:Should logger be private static or not

多么重要的一课!

顺便问一下,JVM 有时会自动将线程转储到 stdout 吗?我在 nohup stdout 重定向文件中找到了上面的“完整线程转储...”。

这种线程阻塞问题是死锁吗?大约 2 小时后该应用程序恢复。如果是死锁的话,怎么恢复呢?

更新(2019-03-11)

实际上,在我修改了 log4j logger 的使用方式后,即使在应用程序顺利运行一周后,我还是无意中再次遇到了这个问题。我意识到可能是jmap造成的。我上周五在下面的命令下运行了几个小时

while [ true ]; do jmap -heap 340; echo "=================================================="; sleep 30; done

然后在17:00左右,应用程序再次卡住了!因此,如果您的应用程序运行正确,请尽可能少地针对应用程序 pid 运行 jmapjstack!这些命令将挂起 JVM。当它们执行得太频繁时,JVM可能会被卡住。

关于java - 奇怪的 Java 线程被卡住的问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54745035/

相关文章:

java - NoNodeAvailableException : None of the configured nodes are available

c - 为什么以及在何种意义上 pthread_t 是不透明类型?

c++ - Pthreads问题和几个问题

c# - 如何使用多线程 C# 执行控制台应用程序

java异常

java - Gradle:将一堆 jar 安装到本地 Maven 存储库

java - 更改 .png 以使用 JpegImagesToMovie.java

java - SSL 握手异常 : Received fatal alert: handshake_failure after Java 6 -> 8 upgrade

java - 如何使用带有 Java 的 Selenium WebDriver 查找损坏的链接

在用户空间实现的线程库可以支持超线程吗?