考虑 PostgreSQL 中的以下场景(10+ 的任何版本):
CREATE TABLE users(
id serial primary key,
name text not null unique,
last_seen timestamp
);
INSERT INTO users(name, last_seen)
VALUES ('Alice', '2019-05-01'),
('Bob', '2019-04-29'),
('Dorian', '2019-05-11');
CREATE TABLE inactive_users(
user_id int primary key references users(id),
last_seen timestamp not null);
INSERT INTO inactive_users(user_id, last_seen)
SELECT id as user_id, last_seen FROM users
WHERE users.last_seen < '2019-05-04'
ON CONFLICT (user_id) DO UPDATE SET last_seen = excluded.last_seen;
现在假设我想不时地多次插入相同的值(执行最后一条语句)。实际上,从数据库的角度来看,在冲突值的情况下,90% 的时间 last_seen
列将更新为它已有的相同值。行的值保持不变,所以没有理由进行 I/O 写入,对吧?但是真的是这样吗,还是postgres会在实际值没有变化的情况下进行相应的更新呢?
在我的例子中,目标表有数千万行,但只有几百/几千行会在每次插入调用时真正发生变化。
最佳答案
对一行的任何 UPDATE
实际上都会创建一个新行(将旧行标记为已删除/脏行),而不管之前/之后的值如何:
[root@497ba0eaf137 /]# psql
psql (12.1)
Type "help" for help.
postgres=# create table foo (id int, name text);
CREATE TABLE
postgres=# insert into foo values (1,'a');
INSERT 0 1
postgres=# select ctid,* from foo;
ctid | id | name
-------+----+------
(0,1) | 1 | a
(1 row)
postgres=# update foo set name = 'a' where id = 1;
UPDATE 1
postgres=# select ctid,* from foo;
ctid | id | name
-------+----+------
(0,2) | 1 | a
(1 row)
postgres=# update foo set id = 1 where id = 1;
UPDATE 1
postgres=# select ctid,* from foo;
ctid | id | name
-------+----+------
(0,3) | 1 | a
(1 row)
postgres=# select * from pg_stat_user_tables where relname = 'foo';
-[ RECORD 1 ]-------+-------
relid | 16384
schemaname | public
relname | foo
seq_scan | 5
seq_tup_read | 5
idx_scan |
idx_tup_fetch |
n_tup_ins | 1
n_tup_upd | 2
n_tup_del | 0
n_tup_hot_upd | 2
n_live_tup | 1
n_dead_tup | 2
<...>
根据你的例子:
postgres=# select ctid,* FROM inactive_users ;
ctid | user_id | last_seen
-------+---------+---------------------
(0,1) | 1 | 2019-05-01 00:00:00
(0,2) | 2 | 2019-04-29 00:00:00
(2 rows)
postgres=# INSERT INTO inactive_users(user_id, last_seen)
postgres-# SELECT id as user_id, last_seen FROM users
postgres-# WHERE users.last_seen < '2019-05-04'
postgres-# ON CONFLICT (user_id) DO UPDATE SET last_seen = excluded.last_seen;
INSERT 0 2
postgres=# select ctid,* FROM inactive_users ;
ctid | user_id | last_seen
-------+---------+---------------------
(0,3) | 1 | 2019-05-01 00:00:00
(0,4) | 2 | 2019-04-29 00:00:00
(2 rows)
Postgres 不会对列值进行任何数据验证——如果您希望防止不必要的写入事件,您将需要像外科手术一样设计您的 WHERE
子句。
披露:我为 EnterpriseDB (EDB) 工作
关于postgresql - Postgres : does updating column value to the same value marks page as dirty?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59061430/