java - Flyway 经常挂起(MariaDB 连接器、RDS)

标签 java mysql flyway rds amazon-ecs

在迁移步骤中,我经常看到部署挂起。为 Tomcat 打包在 WAR 中的 Java/Scala 应用程序。数据库是使用 MariaDB 连接器 ( https://downloads.mariadb.org/connector-java/ ) 的 RDS Aurora。

可能与 Flyway 无关,但这是获取连接的一般问题。

迁移是从容器中的 shell 运行的:

java -cp `echo WEB-INF/lib/*|tr ' ' :` foo.Migrate

迁移代码如下:

def main(args: Array[String]): Unit = {
  Environment.dbFlywayPassword.foreach { pass =>
    val flyway = new Flyway
    flyway.setDataSource(Environment.jdbcUrl, "flyway", pass)
    flyway.migrate
  }
}

连接字符串:

jdbc:mysql:aurora://%RDS_HOST%/xxx?serverSslCert=/rds-ca-2015-root.pem&useSSL=true&connectTimeout=10000

我试过在 Flyway 中提高日志记录级别,但此行之后没有任何记录:

15:57:35.115 [main] INFO  o.f.c.internal.util.VersionPrinter - Flyway 4.2.0 by Boxfuse

所以我得到了一个线程转储,看起来像这样:

15:57:35.115 [main] INFO  o.f.c.internal.util.VersionPrinter - Flyway 4.2.0 by Boxfuse
2017-06-08 15:57:56
Full thread dump OpenJDK 64-Bit Server VM (25.121-b13 mixed mode):

"MariaDb-failover-1" #8 daemon prio=5 os_prio=0 tid=0x00005555f80ae000 nid=0x14 waiting on condition [0x00007fc330b8f000]
   java.lang.Thread.State: TIMED_WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000000f5c59b10> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
    at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
    at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

"Service Thread" #7 daemon prio=9 os_prio=0 tid=0x00005555f70bf000 nid=0x12 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C1 CompilerThread1" #6 daemon prio=9 os_prio=0 tid=0x00005555f7063000 nid=0x11 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x00005555f7060800 nid=0x10 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x00005555f705e800 nid=0xf waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00005555f702f000 nid=0xe in Object.wait() [0x00007fc331616000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00000000f5a30c58> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
    - locked <0x00000000f5a30c58> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)

"Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00005555f702c000 nid=0xd in Object.wait() [0x00007fc331717000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00000000f5a30e10> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Object.java:502)
    at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
    - locked <0x00000000f5a30e10> (a java.lang.ref.Reference$Lock)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)

"main" #1 prio=5 os_prio=0 tid=0x00005555f6f85000 nid=0xb runnable [0x00007fc34341d000]
   java.lang.Thread.State: RUNNABLE
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
    at java.net.SocketInputStream.read(SocketInputStream.java:171)
    at java.net.SocketInputStream.read(SocketInputStream.java:141)
    at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
    at sun.security.ssl.InputRecord.read(InputRecord.java:503)
    at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973)
    - locked <0x00000000f0a66090> (a java.lang.Object)
    at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930)
    at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
    - locked <0x00000000f0a81eb0> (a sun.security.ssl.AppInputStream)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
    - locked <0x00000000f06b5118> (a java.io.BufferedInputStream)
    at org.mariadb.jdbc.internal.io.input.StandardPacketInputStream.getPacketArray(StandardPacketInputStream.java:125)
    at org.mariadb.jdbc.internal.io.input.StandardPacketInputStream.getPacket(StandardPacketInputStream.java:95)
    at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.readPacket(AbstractQueryProtocol.java:1002)
    at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.getResult(AbstractQueryProtocol.java:982)
    at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.readRequestSessionVariables(AbstractConnectProtocol.java:498)
    at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.readPipelineAdditionalData(AbstractConnectProtocol.java:544)
    at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.connect(AbstractConnectProtocol.java:410)
    at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.connect(AbstractConnectProtocol.java:357)
    at org.mariadb.jdbc.internal.protocol.AuroraProtocol.loop(AuroraProtocol.java:149)
    at org.mariadb.jdbc.internal.failover.impl.AuroraListener.reconnectFailedConnection(AuroraListener.java:179)
    at org.mariadb.jdbc.internal.failover.impl.MastersSlavesListener.initializeConnection(MastersSlavesListener.java:154)
    at org.mariadb.jdbc.internal.failover.FailoverProxy.<init>(FailoverProxy.java:94)
    at org.mariadb.jdbc.internal.util.Utils.retrieveProxy(Utils.java:464)
    at org.mariadb.jdbc.Driver.connect(Driver.java:103)
    at org.flywaydb.core.internal.util.jdbc.DriverDataSource.getConnectionFromDriver(DriverDataSource.java:416)
    at org.flywaydb.core.internal.util.jdbc.DriverDataSource.getConnection(DriverDataSource.java:381)
    at org.flywaydb.core.internal.util.jdbc.JdbcUtils.openConnection(JdbcUtils.java:51)
    at org.flywaydb.core.Flyway.execute(Flyway.java:1418)
    at org.flywaydb.core.Flyway.migrate(Flyway.java:971)
    at tgam.service.data.Migrate$.$anonfun$main$1(Migrate.scala:11)
    at foo.Migrate$.$anonfun$main$1$adapted(Migrate.scala:8)
    at foo.Migrate$$$Lambda$4/458209687.apply(Unknown Source)
    at scala.Option.foreach(Option.scala:257)
    at foo.Migrate$.main(Migrate.scala:8)
    at foo.Migrate.main(Migrate.scala)

"VM Thread" os_prio=0 tid=0x00005555f7022000 nid=0xc runnable

"VM Periodic Task Thread" os_prio=0 tid=0x00005555f70da800 nid=0x13 waiting on condition

JNI global references: 232

Heap
 def new generation   total 4928K, used 1611K [0x00000000f0600000, 0x00000000f0b50000, 0x00000000f5950000)
  eden space 4416K,  24% used [0x00000000f0600000, 0x00000000f0712c10, 0x00000000f0a50000)
  from space 512K, 100% used [0x00000000f0a50000, 0x00000000f0ad0000, 0x00000000f0ad0000)
  to   space 512K,   0% used [0x00000000f0ad0000, 0x00000000f0ad0000, 0x00000000f0b50000)
 tenured generation   total 10944K, used 4187K [0x00000000f5950000, 0x00000000f6400000, 0x0000000100000000)
   the space 10944K,  38% used [0x00000000f5950000, 0x00000000f5d66e78, 0x00000000f5d67000, 0x00000000f6400000)
 Metaspace       used 13061K, capacity 13306K, committed 13568K, reserved 1060864K
  class space    used 1381K, capacity 1449K, committed 1536K, reserved 1048576K

看起来像 org.mariadb.jdbc.Driver.connect 中的 I/O 挂起,但我确实设置了 connectTimeout(10 秒)。这个超时似乎没有效果(我需要一个 socketTimeout per https://github.com/brettwooldridge/HikariCP/issues/754 吗?)

这已经发生了一段时间。当我使用 Tomcat 的 contextInitialized Hook 进行迁移时,同样的事情发生了。我决定在启动 Tomcat 之前重构为一个单独的调用,这在一般情况下看起来是个更好的主意,但它并没有影响这种行为。

通常会发生的情况是代码将挂起,2-3 分钟后 ECS 将超时,并触发重新部署。经过一定次数的重试(例如最多 10 次)后,Flyway 将成功运行并且服务将启动。

最佳答案

好吧,这似乎是 RDS Aurora 中的一个已知错误 - MariaDB 连接器文档中提到了它(当然它应该是一个运行时警告!)

https://mariadb.com/kb/en/mariadb/about-mariadb-connector-j/#infrequently-used

usePipelineAuth Not compatible with aurora During connection, different queries are executed. When option is active those queries are send using pipeline (all queries are send, then only all results are reads), permitting faster connection creation. Default: true. Since 1.6.0

还要感谢 Freenode #maria 中的 wlad_,他为我指明了正确的方向。

关于java - Flyway 经常挂起(MariaDB 连接器、RDS),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44440787/

相关文章:

java - 安卓 : How can I update a variable after using in for loop

Java 还是 Scala 用于新的绿地项目?

java - 是否有带有自定义比较器的 retainAll() 的实现?

java - 将对象类型转换为枚举不会给出枚举或错误

java - Spring Boot 3 未找到 Flyway 迁移

mysql - 无法从 Spring Boot 应用程序运行 flyway 脚本,数据库作为 docker 容器中的 Mysql

php - 使用 USBWebserver 将本地数据库与 Android 应用程序连接

php - 服务器更新后 MySQL insert null 不起作用

php - 从结果计算日期的最佳方法?

java - 我的飞行路线脚本未滚动