我们的数据库中存在一个长期存在的死锁问题。冲突的查询几乎总是这两个:
UPDATE db.tbl
SET pid = 68111
WHERE pid IS NULL
AND processed IS NULL
AND `time` <= now()
LIMIT 10
和
UPDATE `db`.`tbl`
SET `processed` = -1,
`updated_on` = NOW()
WHERE (`tbl`.`id` = 108588129)
pid
、id
和 processed
值有所不同,但查询是相同的。
我不明白到底是什么会导致死锁,因为第二个查询通过 ID 更新了特定记录。只需拥有一个锁,但至少需要 2 个锁才能产生死锁。也没有事务会锁定其他行 - 每个查询都是独立的。
为什么会发生这种情况以及如何避免它?
show engine status innodb;
的匿名/缩短输出:
------------------------
LATEST DETECTED DEADLOCK
------------------------
2023-06-08 08:16:27 0x7f70f3ef1700
*** (1) TRANSACTION:
TRANSACTION 2857804352, ACTIVE 0 sec fetching rows
mysql tables in use 1, locked 1
LOCK WAIT 2480 lock struct(s), heap size 286928, 9581 row lock(s)
MySQL thread id 25966272, OS thread handle 140144661681920, query id 847014117 x.x.x.x db_user updating
UPDATE db.tbl SET pid = ''68111'' WHERE pid IS NULL AND processed IS NULL AND `time` <= now() LIMIT 10
*** (1) HOLDS THE LOCK(S):
RECORD LOCKS space id 2714 page no 86552 n bits 1552 index processed of table `db`.`tbl` trx id 2857804352 lock_mode X
Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: SQL NULL;
1: len 4; hex 8678cf9b; asc x ;;
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: SQL NULL;
1: len 4; hex 8678cfad; asc x ;;
<Snip a lot of Record Locks just like the one above>
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 2714 page no 113955 n bits 160 index PRIMARY of table `db`.`tbl` trx id 2857804352 lock_mode X locks rec but not gap waiting
Record lock, heap no 92 PHYSICAL RECORD: n_fields 23; compact format; info bits 0
0: len 4; hex 8678ec61; asc x a;;
1: len 6; hex 0000aa56a25d; asc V ];;
2: len 7; hex 0100002bc01686; asc + ;;
3: len 4; hex 803ff583; asc ? ;;
4: len 4; hex 73746f70; asc stop;;
5: len 4; hex 80000002; asc ;;
6: len 1; hex 83; asc ;;
7: len 4; hex 80005737; asc W7;;
8: len 6; hex 4b4a38363431; asc KJ8641;;
9: len 8; hex 80000008a515e59b; asc ;;
10: len 4; hex 800001be; asc ;;
11: len 5; hex 99b050b41a; asc P ;;
12: len 4; hex 84ad8892; asc ;;
13: len 3; hex 736d73; asc sms;;
14: SQL NULL;
15: SQL NULL;
16: len 1; hex 7f; asc ;;
17: SQL NULL;
18: SQL NULL;
19: SQL NULL;
20: len 4; hex 53746f70; asc Stop;;
21: len 5; hex 99b050b41b; asc P ;;
22: len 5; hex 99b050b41b; asc P ;;
*** (2) TRANSACTION:
TRANSACTION 2857804381, ACTIVE 0 sec updating or deleting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 25966282, OS thread handle 140122483259136, query id 847014368 x.x.x.x other_db_user updating
UPDATE `db`.`tbl` SET `processed` = ''-1'', `updated_on` = NOW() WHERE (`tbl`.`id` = 108588129)
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 2714 page no 113955 n bits 160 index PRIMARY of table `db`.`tbl` trx id 2857804381 lock_mode X locks rec but not gap
Record lock, heap no 92 PHYSICAL RECORD: n_fields 23; compact format; info bits 0
0: len 4; hex 8678ec61; asc x a;;
1: len 6; hex 0000aa56a25d; asc V ];;
2: len 7; hex 0100002bc01686; asc + ;;
3: len 4; hex 803ff583; asc ? ;;
4: len 4; hex 73746f70; asc stop;;
5: len 4; hex 80000002; asc ;;
6: len 1; hex 83; asc ;;
7: len 4; hex 80005737; asc W7;;
8: len 6; hex 4b4a38363431; asc KJ8641;;
9: len 8; hex 80000008a515e59b; asc ;;
10: len 4; hex 800001be; asc ;;
11: len 5; hex 99b050b41a; asc P ;;
12: len 4; hex 84ad8892; asc ;;
13: len 3; hex 736d73; asc sms;;
14: SQL NULL;
15: SQL NULL;
16: len 1; hex 7f; asc ;;
17: SQL NULL;
18: SQL NULL;
19: SQL NULL;
20: len 4; hex 53746f70; asc Stop;;
21: len 5; hex 99b050b41b; asc P ;;
22: len 5; hex 99b050b41b; asc P ;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 2714 page no 86552 n bits 1552 index processed of table `db`.`tbl` trx id 2857804381 lock_mode X locks rec but not gap waiting
Record lock, heap no 1462 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: SQL NULL;
1: len 4; hex 8678ec61; asc x a;;
*** WE ROLL BACK TRANSACTION (2)
此外,还有一个简化的表定义:
CREATE TABLE `tbl` (
`id` int NOT NULL AUTO_INCREMENT,
`pid` int DEFAULT NULL,
`processed` tinyint DEFAULT NULL,
`time` datetime DEFAULT NULL,
`created_on` datetime DEFAULT NULL,
`updated_on` datetime DEFAULT NULL,
`other` varchar(10) NOT NULL,
PRIMARY KEY (`id`),
KEY `ix_other` (`other`),
KEY `ix_processed` (`processed`),
KEY `ix_time` (`time`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
我还应该注意到,有很多行(目前超过 5000,但数字在一天中不断变化)带有 processed=null, pid=null
但 time
future 。
添加:根据评论提供更多背景信息:
此表中目前有略多于 500 万行的数据。这是一种队列,我们每月大约获得 140 万行。最早的行是在二月。当时间
是将来时,它几乎总是在同一天。
pid
和 processed
与队列相关。当插入一行(作业)时,pid
和 processed
均为 null
。 时间
是应执行作业的时间(某些作业只需在指定的时间点之后执行;其他作业需要尽快执行)。
当其中一个处理器准备好执行某些作业时,它首先运行第一个查询,并通过将其 PID(Linux 进程 ID)设置为这些行来为自己“保留”一批 10 个作业。然后,它选择这些行并逐一处理它们,完成后设置 processed
标志(第二个查询)。
但是有时这两个查询会发生冲突,导致我们陷入僵局。
最佳答案
我对锁定和死锁的理解非常有限,所以这过于简单化了,因为我不确定“何时请求哪个锁”(初始二级索引扫描,PRIMARY,然后是受更新影响的其他二级索引?)通过 main UPDATE
查询:
session 1 发送初始
UPDATE
查询,选择ix_processed
索引作为最具选择性的索引,并开始锁定索引行 ,直到它收集了足够的行来满足查询。session 2 发送
UPDATE on PK
查询,该查询立即获取聚集索引中行上的 x 锁。session 2 尝试将更新应用到
processed
列,该列需要ix_processed
上的 x-lock,但它已被 session 1 锁定,因此它会等待。session 1 尝试获取
1.
中收集的行的聚集索引上的 x 锁,但正在更新的行已被 session 2 锁定 (2.
),所以它会等待。💣死锁💣
您当前的UPDATE
需要的锁比您想象的要多得多。
当使用默认的可重复读隔离级别(select @@session.tx_isolation
)时,独占锁将被放置在检查 UPDATE 的每一行上,而不仅仅是那些已更新。
您可以使用以下方法进行测试:
START TRANSACTION;
UPDATE tbl
SET pid = 68111
WHERE pid IS NULL
AND processed IS NULL
AND `time` <= now()
LIMIT 10;
SELECT dl.object_name, dl.index_name, dl.lock_type, dl.lock_mode, dl.lock_status, dl.lock_data, tbl.*
FROM performance_schema.data_locks dl
JOIN performance_schema.threads t ON dl.THREAD_ID = t.THREAD_ID
JOIN tbl ON tbl.id = dl.lock_data
WHERE dl.object_name = 'tbl'
AND dl.index_name = 'PRIMARY'
AND t.PROCESSLIST_ID = CONNECTION_ID();
ROLLBACK;
在(processed, pid, time)
上添加复合索引会将UPDATE
语句所需的锁数量减少到仅10个。
ALTER TABLE tbl ADD INDEX ix_processed_pid_time (processed, pid, time);
如果您不想添加索引,那么您可以尝试在 UPDATE
中为 processed
列设置一个不同的值(可能是 -2),如下所示意味着它会立即被排除在后续更新之外。
如果您使用的是 MySQL 8,另一种方法可能是使用SELECT ... FOR UPDATE SKIP LOCKED
:
START TRANSACTION;
SELECT id FROM tbl
WHERE pid IS NULL AND processed IS NULL AND `time` <= NOW()
LIMIT 10
FOR UPDATE SKIP LOCKED;
UPDATE tbl
SET pid = 68111,
processed = -2
WHERE id IN ( ... );
COMMIT;
关于mysql - 需要帮助了解 MySQL 死锁的位置,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/76430515/