ignite - Apache Ignite 连续查询缓存事务

标签 ignite gridgain apacheignite

我们使用连续查询在所有客户端节点之间传输数据。然而,我们有一个可扩展的网格,因此我们经常遇到数据节点不断尝试连接到客户端以从已经缩小的连续查询发送数据的问题。这会导致系统停止,因为 PME 操作无法获取锁,因此拓扑不会更新。

为了解决此问题,我想使用参数 TxTimeoutOnPartitionMapExchange,这将允许 PME 继续进行。 但是,为了利用此参数,我是否需要将缓存的 atomicityMode 更改为事务性?如果是,那么数据节点连续查询尝试发送数据的过程算不算一个事务?

总之,我正在尝试确定 TxTimeoutOnPartitionMapExchange 参数是否对我的连续查询情况有帮助,以及启用此参数的步骤是什么。

编辑: 我试图解决的问题的堆栈跟踪:

  • 持续不断地尝试保留客户,我相信它是有效的 这里的全局锁阻止缓存更新和检查点

:

Deadlock: false
    Completed: 1999706
Thread [name="sys-stripe-6-#7%pv-ib-valuation%", id=42, state=WAITING, blockCnt=52537, waitCnt=734400]
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
        at o.a.i.i.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:178)
        at o.a.i.i.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141)
        at o.a.i.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3229)
        at o.a.i.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:3013)
        at o.a.i.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2960)
        at o.a.i.i.managers.communication.GridIoManager.send(GridIoManager.java:2100)
        at o.a.i.i.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:2365)
        at o.a.i.i.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1964)
        at o.a.i.i.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1935)
        at o.a.i.i.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1917)
        at o.a.i.i.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:1324)
        at o.a.i.i.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:1261)
        at o.a.i.i.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:1059)
        at o.a.i.i.processors.cache.query.continuous.CacheContinuousQueryHandler.access$600(CacheContinuousQueryHandler.java:90)
        at o.a.i.i.processors.cache.query.continuous.CacheContinuousQueryHandler$2.onEntryUpdated(CacheContinuousQueryHandler.java:459)
        at o.a.i.i.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:447)
        at o.a.i.i.processors.cache.GridCacheMapEntry.innerUpdate(GridCacheMapEntry.java:2495)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateSingle(GridDhtAtomicCache.java:2657)
        at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update(GridDhtAtomicCache.java:2118)
  • 这在调用reserveclient后开始出现 无法获取锁

:

>>> Possible starvation in striped pool.
    Thread name: sys-stripe-4-#5%pv-ib-valuation%
    Queue: []
    Deadlock: false
    Completed: 6328076
Thread [name="sys-stripe-4-#5%pv-ib-valuation%", id=40, state=WAITING, blockCnt=111790, waitCnt=2018248]
    Lock [object=java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@66d8e343, ownerName=null, ownerId=-1]
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
        at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
        at o.a.i.i.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1663)
        at o.a.i.i.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.purgeExpiredInternal(GridCacheOffheapManager.java:2715)
        at o.a.i.i.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.purgeExpired(GridCacheOffheapManager.java:2679)
        at o.a.i.i.processors.cache.persistence.GridCacheOffheapManager.expire(GridCacheOffheapManager.java:1051)
        at o.a.i.i.processors.cache.GridCacheTtlManager.expire(GridCacheTtlManager.java:243)
        at o.a.i.i.processors.cache.GridCacheUtils.unwindEvicts(GridCacheUtils.java:873)
        at o.a.i.i.processors.cache.GridCacheIoManager.onMessageProcessed(GridCacheIoManager.java:1189)

到目前为止,我的总体分析是,如果客户端消失,则连续查询会继续尝试连接,并持有阻止所有内容的锁。

  • 示例页锁转储。每次都是类似的页面链接转储 所有线程似乎都在等待并且没有锁定

:

Page locks dump:

Thread=[name=checkpoint-runner-#94%pv-ib-valuation%, id=162], state=WAITING
Locked pages = []
Locked pages log: name=checkpoint-runner-#94%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=checkpoint-runner-#95%pv-ib-valuation%, id=163], state=WAITING
Locked pages = []
Locked pages log: name=checkpoint-runner-#95%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=checkpoint-runner-#96%pv-ib-valuation%, id=164], state=WAITING
Locked pages = []
Locked pages log: name=checkpoint-runner-#96%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=checkpoint-runner-#97%pv-ib-valuation%, id=165], state=WAITING
Locked pages = []
Locked pages log: name=checkpoint-runner-#97%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=data-streamer-stripe-0-#15%pv-ib-valuation%, id=50], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-0-#15%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=data-streamer-stripe-1-#16%pv-ib-valuation%, id=51], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-1-#16%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=data-streamer-stripe-10-#25%pv-ib-valuation%, id=60], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-10-#25%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=data-streamer-stripe-11-#26%pv-ib-valuation%, id=61], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-11-#26%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=data-streamer-stripe-12-#27%pv-ib-valuation%, id=62], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-12-#27%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=data-streamer-stripe-13-#28%pv-ib-valuation%, id=63], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-13-#28%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=data-streamer-stripe-14-#29%pv-ib-valuation%, id=64], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-14-#29%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=data-streamer-stripe-15-#30%pv-ib-valuation%, id=65], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-15-#30%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=data-streamer-stripe-2-#17%pv-ib-valuation%, id=52], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-2-#17%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=data-streamer-stripe-3-#18%pv-ib-valuation%, id=53], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-3-#18%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=data-streamer-stripe-4-#19%pv-ib-valuation%, id=54], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-4-#19%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=data-streamer-stripe-5-#20%pv-ib-valuation%, id=55], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-5-#20%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=data-streamer-stripe-6-#21%pv-ib-valuation%, id=56], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-6-#21%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=data-streamer-stripe-7-#22%pv-ib-valuation%, id=57], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-7-#22%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=data-streamer-stripe-8-#23%pv-ib-valuation%, id=58], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-8-#23%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=data-streamer-stripe-9-#24%pv-ib-valuation%, id=59], state=WAITING
Locked pages = []
Locked pages log: name=data-streamer-stripe-9-#24%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=db-checkpoint-thread-#93%pv-ib-valuation%, id=161], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=db-checkpoint-thread-#93%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=dms-writer-thread-#77%pv-ib-valuation%, id=145], state=WAITING
Locked pages = []
Locked pages log: name=dms-writer-thread-#77%pv-ib-valuation% time=(1674196038673, 2023-01-20 06:27:18.673)


Thread=[name=exchange-worker-#71%pv-ib-valuation%, id=139], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=exchange-worker-#71%pv-ib-valuation% time=(1674196038673, 2023-01-20 06:27:18.673)


Thread=[name=lock-cleanup-0, id=278], state=WAITING
Locked pages = []
Locked pages log: name=lock-cleanup-0 time=(1674196038673, 2023-01-20 06:27:18.673)


Thread=[name=lock-cleanup-scheduled-0, id=171], state=WAITING
Locked pages = []
Locked pages log: name=lock-cleanup-scheduled-0 time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=main, id=1], state=WAITING
Locked pages = []
Locked pages log: name=main time=(1674196038673, 2023-01-20 06:27:18.673)


Thread=[name=query-#5729%pv-ib-valuation%, id=6455], state=WAITING
Locked pages = []
Locked pages log: name=query-#5729%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=query-#5730%pv-ib-valuation%, id=6456], state=WAITING
Locked pages = []
Locked pages log: name=query-#5730%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=query-#5735%pv-ib-valuation%, id=6461], state=WAITING
Locked pages = []
Locked pages log: name=query-#5735%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=query-#5736%pv-ib-valuation%, id=6462], state=WAITING
Locked pages = []
Locked pages log: name=query-#5736%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=sys-stripe-0-#1%pv-ib-valuation%, id=36], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-0-#1%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=sys-stripe-1-#2%pv-ib-valuation%, id=37], state=RUNNABLE
Locked pages = []
Locked pages log: name=sys-stripe-1-#2%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=sys-stripe-10-#11%pv-ib-valuation%, id=46], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-10-#11%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=sys-stripe-11-#12%pv-ib-valuation%, id=47], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-11-#12%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=sys-stripe-12-#13%pv-ib-valuation%, id=48], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-12-#13%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=sys-stripe-13-#14%pv-ib-valuation%, id=49], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-13-#14%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=sys-stripe-2-#3%pv-ib-valuation%, id=38], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-2-#3%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=sys-stripe-3-#4%pv-ib-valuation%, id=39], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-3-#4%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=sys-stripe-4-#5%pv-ib-valuation%, id=40], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-4-#5%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=sys-stripe-5-#6%pv-ib-valuation%, id=41], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-5-#6%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=sys-stripe-6-#7%pv-ib-valuation%, id=42], state=RUNNABLE
Locked pages = []
Locked pages log: name=sys-stripe-6-#7%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=sys-stripe-7-#8%pv-ib-valuation%, id=43], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-7-#8%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=sys-stripe-8-#9%pv-ib-valuation%, id=44], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-8-#9%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=sys-stripe-9-#10%pv-ib-valuation%, id=45], state=WAITING
Locked pages = []
Locked pages log: name=sys-stripe-9-#10%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)


Thread=[name=ttl-cleanup-worker-#62%pv-ib-valuation%, id=127], state=TIMED_WAITING
Locked pages = []
Locked pages log: name=ttl-cleanup-worker-#62%pv-ib-valuation% time=(1674196038674, 2023-01-20 06:27:18.674)

最佳答案

TxTimeoutOnPartitionMapExchange 旨在回滚事件事务以解锁 PME 进程。它不会神奇地解锁每个可能因不同原因而卡住的 PME。

当然,在任何情况下都值得配置此设置。要启用它,您需要调整服务器节点的配置并将此属性设置为某个值,例如 30 秒。 Here是 XML 更改的示例。

谈到最初的 CQ 客户端断开连接问题,我希望 Ignite 能够毫无问题地自动处理该问题。换句话说,我认为挂起的 PME 问题不是由连续查询本身引起的,而是由其他原因引起的,例如,是的,没有超时的事件 TX。

您不需要更改缓存的原子模式。事务不能应用于非事务性缓存(原子)。

关于ignite - Apache Ignite 连续查询缓存事务,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/75187715/

相关文章:

java - 必须设置 Ignite 网格名称线程本地,或者应在 org.apache.ignite.thread.IgniteThread 下访问此方法

mapreduce - GridGain:带有节点本地数据处理的 MapReduce?

java - GridGain 无法绑定(bind)到 Linux 上的任何端口

linux - 网格增益 6.2.0/6.2.1 : too many file descriptors open

ignite - Ignite 中基于 S3 的发现建议的最大节点数

java - 根据要求使用已投入使用的 Spring Boot 启动 ignite

linux - 如何通过docker部署web控制台,docker run -d -p 80 :80 -v <host_absolute_path>:/var/lib/mongodb --name, 主机绝对路径是什么?

java - 将实体从一个 Ignite 缓存复制到另一个 Ignite 缓存的最快方法是什么?