docker - 由于 'accept loop for ServerSocket[addr=0.0.0.0/0.0.0.0..',Ignite TcpDiscoverySpi 失败并出现 SocketTimeout 的严重系统错误

标签 docker ignite gridgain

使用 Ignite 2.7.6,当尝试在 docker 桥接网络上使用简单配置启动嵌入式 ignite 服务器节点(在 Spring Boot 应用程序中)时,服务器启动失败并出现以下错误,

[10:16:16] Ignite node started OK (id=e7276b83)
[10:16:16] >>> Ignite cluster is not active (limited functionality available). Use control.(sh|bat) script or IgniteCluster interface to activate.
[10:16:16] Topology snapshot [ver=1, locNode=e7276b83, servers=1, clients=0, state=INACTIVE, CPUs=1, offheap=0.1GB, heap=0.4GB]
mediation-service - [INFO ] 10:16:16.981 [main] com.**.**.perfmon.common.spring.EmbeddedIgnite    - ====>>> Activating Ignite Cluster
mediation-service - [WARN ] 10:16:17.383 [exchange-worker-#49] org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager     - Started write-ahead log manager in NONE mode, persisted data may be lost in a case of unexpected node failure. Make sure to deactivate the cluster before shutdown.
[10:16:17] Started write-ahead log manager in NONE mode, persisted data may be lost in a case of unexpected node failure. Make sure to deactivate the cluster before shutdown.
mediation-service - [ERROR] 10:16:21.982 [tcp-disco-srvr-#3] org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi        - Failed to accept TCP connection.
java.net.SocketTimeoutException: Accept timed out
        at java.base/java.net.PlainSocketImpl.socketAccept(Native Method)
        at java.base/java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:458)
        at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:565)
        at java.base/java.net.ServerSocket.accept(ServerSocket.java:533)
        at org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:5845)
        at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
        at org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServerThread.body(ServerImpl.java:5763)
        at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
mediation-service - [WARN ] 10:16:21.982 [RMI TCP Accept-19887] sun.rmi.transport.tcp   - RMI TCP Accept-19887: accept loop for ServerSocket[addr=0.0.0.0/0.0.0.0,localport=19887] throws
java.net.SocketTimeoutException: Accept timed out
        at java.base/java.net.PlainSocketImpl.socketAccept(Native Method)
        at java.base/java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:458)
        at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:565)
        at java.base/java.net.ServerSocket.accept(ServerSocket.java:533)
        at java.rmi/sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:394)
        at java.rmi/sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:366)
        at java.base/java.lang.Thread.run(Thread.java:834)
mediation-service - [WARN ] 10:16:21.982 [RMI TCP Accept-0] sun.rmi.transport.tcp       - RMI TCP Accept-0: accept loop for ServerSocket[addr=0.0.0.0/0.0.0.0,localport=33254] throws
java.net.SocketTimeoutException: Accept timed out
        at java.base/java.net.PlainSocketImpl.socketAccept(Native Method)
        at java.base/java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:458)
        at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:565)
        at java.base/java.net.ServerSocket.accept(ServerSocket.java:533)
        at java.rmi/sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:394)
        at java.rmi/sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:366)
        at java.base/java.lang.Thread.run(Thread.java:834)
mediation-service - [ERROR] 10:16:21.984 [tcp-disco-srvr-#3]    - Critical system error detected. Will be handled accordingly to configured handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=java.net.SocketTimeoutException: Accept timed out]]

以下是相关配置,

点燃配置 xml 片段:

....
....
<property name="discoverySpi">
            <bean
                class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
                <property name="ipFinder">
                    <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder"/>
                </property>
            </bean>
</property>
....
....

docker-compose 片段:

services:
  ***-mediation-service:
    image: ***/mediation-service:latest
    build: .
    environment:
    - PERCENTAGE_OF_RAM_FOR_HEAP=80.0
    - SERVICE_NAME=mediation-service
    - SERVICE_PORT=9887
    - IGNITE_TCP_DISCOVERY_ADDRESSES=localhost
    - JAVA_TOOL_OPTIONS=-Dcom.sun.management.jmxremote=true
  -Dcom.sun.management.jmxremote.rmi.port=19887
  -Dcom.sun.management.jmxremote.port=19887
  -Dcom.sun.management.jmxremote.local.only=false
  -Dcom.sun.management.jmxremote.authenticate=false
  -Dcom.sun.management.jmxremote.ssl=false
  -Djava.rmi.server.hostname=$HOST_IP
  -Djava.net.preferIPv4Stack=true
  -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=29887
    ...
    ...
    networks:
      - something-mediation-network

networks:
  something-mediation-network:
    driver: bridge
    ipam:
      driver: default
      config:
      - subnet: 186.30.240.0/24

有人知道这是怎么回事吗?

谢谢 穆图

更新(2020年11月13日):我按照@alamar的建议尝试了与2.9.0相同的操作,但结果相同..请参见下文

mediation-service - [ERROR] 01:03:16.871 [tcp-disco-srvr-[:47500]-#3-#50] org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi   - Failed to accept TCP connection.
java.net.SocketTimeoutException: Accept timed out
    at java.base/java.net.PlainSocketImpl.socketAccept(Native Method)
    at java.base/java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:458)
    at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:565)
    at java.base/java.net.ServerSocket.accept(ServerSocket.java:533)
    at org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:6620)
    at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
    at org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServerThread.body(ServerImpl.java:6543)
    at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58)
mediation-service - [WARN ] 01:03:16.871 [RMI TCP Accept-19887] sun.rmi.transport.tcp   - RMI TCP Accept-19887: accept loop for ServerSocket[addr=0.0.0.0/0.0.0.0,localport=19887] throws
java.net.SocketTimeoutException: Accept timed out
    at java.base/java.net.PlainSocketImpl.socketAccept(Native Method)
    at java.base/java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:458)
    at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:565)
    at java.base/java.net.ServerSocket.accept(ServerSocket.java:533)
    at java.rmi/sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:394)
    at java.rmi/sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:366)
    at java.base/java.lang.Thread.run(Thread.java:834)
mediation-service - [WARN ] 01:03:16.871 [RMI TCP Accept-0] sun.rmi.transport.tcp   - RMI TCP Accept-0: accept loop for ServerSocket[addr=0.0.0.0/0.0.0.0,localport=33351] throws
java.net.SocketTimeoutException: Accept timed out
    at java.base/java.net.PlainSocketImpl.socketAccept(Native Method)
    at java.base/java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:458)
    at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:565)
    at java.base/java.net.ServerSocket.accept(ServerSocket.java:533)
    at java.rmi/sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:394)
    at java.rmi/sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:366)
    at java.base/java.lang.Thread.run(Thread.java:834)
mediation-service - [ERROR] 01:03:16.876 [tcp-disco-srvr-[:47500]-#3-#50]   - Critical system error detected. Will be handled accordingly to configured handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=java.net.SocketTimeoutException: Accept timed out]]
java.net.SocketTimeoutException: Accept timed out
    at java.base/java.net.PlainSocketImpl.socketAccept(Native Method)
    at java.base/java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:458)
    at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:565)
    at java.base/java.net.ServerSocket.accept(ServerSocket.java:533)
    at org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:6620)
    at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
    at org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServerThread.body(ServerImpl.java:6543)
    at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58)
mediation-service - [WARN ] 01:03:17.271 [tcp-disco-srvr-[:47500]-#3-#50] org.apache.ignite.internal.processors.cache.CacheDiagnosticManager    - Page locks dump:

更新(2020 年 11 月 18 日):

我有另一个更新,如果我使用 Java 8 而不是 Java 11,我在集群激活期间不会看到此问题并且一切正常。

所以我怀疑这与底层 java 库的使用/依赖关系有关..

最佳答案

该错误表示套接字设置了超时,并且在超时期间没有收到传入消息。

有趣的是,Ignite 创建的套接字没有超时!这表明某处存在错误...

...这次是 Java 语言:JDK-8237858 。 bug 描述表明,accept 可以被信号中断(这是预期的),这会导致 Java 抛出错误(这就是 bug)。

根据 OpenJDK Jira,这不会影响 Java 8。在 Java 16 中已修复,并且也不会影响默认设置的 Java 13。

不过,我没有看到 Java 11 维护版本中提到修复。

更新:有一个 fix 2.12 中对此进行了说明。基本上,Ignite 必须在自己的代码中嵌入该错误的解决方法。

关于docker - 由于 'accept loop for ServerSocket[addr=0.0.0.0/0.0.0.0..',Ignite TcpDiscoverySpi 失败并出现 SocketTimeout 的严重系统错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64819519/

相关文章:

ubuntu - 运行已编译的 golang 脚本时出错

docker - 在没有root用户访问权限的系统上构建Docker镜像

docker - 远程 Docker 和卷绑定(bind)

sql - Hazelcast 是否支持像 Apache Ignite 这样的 SQL 查询?

java - Gridgain:java.lang.OutOfMemoryError:超出 GC 开销限制

scala - 比一台机器上的多线程应用程序慢的 GridGain 应用程序

node.js - 当挂载卷中的文件在主机上更改时,文件系统事件不会在 docker 容器中触发

c# - Apache 点燃 2.1 : I get errors when setting the PersistentStoreConfiguration property

c++ - 如何在 Apache Ignite C++ 中从缓存中获取所有 key

java - 构建或购买计算网格平台更好?