我的问题是我对一个有 1400 万行的表进行非常缓慢的更新查询。我尝试了不同的方法来调整我的服务器,这带来了良好的性能,但不是更新查询。
我有两个表:
- T1 有 4 列和 3 个索引(530 行)
- T2 有 15 列和 3 个索引(1400 万行)
- 我想通过在文本字段 stxt 上连接两个表,用 T1 中 vid 的相同值更新 T2 中的字段 vid(整数类型)。
这是我的查询及其输出:
explain analyse
update T2
set vid=T1.vid
from T1
where stxt2 ~ stxt1 and T2.vid = 0;
Update on T2 (cost=0.00..9037530.59 rows=2814247 width=131) (actual time=25141785.741..25141785.741 rows=0 loops=1) -> Nested Loop (cost=0.00..9037530.59 rows=2814247 width=131) (actual time=32.636..25035782.995 rows=679354 loops=1) Join Filter: ((T2.stxt2)::text ~ (T1.stxt1)::text) -> Seq Scan on T2 (cost=0.00..594772.96 rows=1061980 width=121) (actual time=0.067..5402.614 rows=1037809 loops=1) Filter: (vid= 1) -> Materialize (cost=0.00..17.95 rows=530 width=34) (actual time=0.000..0.069 rows=530 loops=1037809) -> Seq Scan on T1 (cost=0.00..15.30 rows=530 width=34) (actual time=0.019..0.397 rows=530 loops=1) Total runtime: 25141785.904 ms
如您所见,查询大约花费了 25141 秒(约 7 小时)。 f 我理解得很好,计划者估计执行时间为 9037 秒(~ 2.5 小时)。我在这里遗漏了什么吗?
这是关于我的服务器配置的信息:
- CentOS 5.8、20GB 内存
- shared_buffers = 12GB
- work_mem = 64MB
- maintenance_work_mem = 64MB
- bgwriter_lru_maxpages = 500
- checkpoint_segments = 64
- checkpoint_completion_target = 0.9
- effective_cache_size = 10GB
我已经在表 T2 上运行了 vacuum full 并分析了几次,但这仍然没有太大改善情况。
PS:如果我将 full_page_writes 设置为关闭,这会大大改进更新查询,但我不想冒数据丢失的风险。您有什么建议吗?
最佳答案
这不是解决方案,而是数据建模的变通办法
- 将 url 分解为 {protocol,hostname,pathname} 组件。
- 现在您可以使用完全匹配来连接主机名部分,避免正则表达式匹配中的前导 %。
- 该 View 旨在证明可以根据需要重建 full_url。
更新可能需要几分钟。
SET search_path='tmp';
DROP TABLE urls CASCADE;
CREATE TABLE urls
( id SERIAL NOT NULL PRIMARY KEY
, full_url varchar
, proto varchar
, hostname varchar
, pathname varchar
);
INSERT INTO urls(full_url) VALUES
( 'ftp://www.myhost.com/secret.tgz' )
,( 'http://www.myhost.com/robots.txt' )
,( 'http://www.myhost.com/index.php' )
,( 'https://www.myhost.com/index.php' )
,( 'http://www.myhost.com/subdir/index.php' )
,( 'https://www.myhost.com/subdir/index.php' )
,( 'http://www.hishost.com/index.php' )
,( 'https://www.hishost.com/index.php' )
,( 'http://www.herhost.com/index.php' )
,( 'https://www.herhost.com/index.php' )
;
UPDATE urls
SET proto = split_part(full_url, '://' , 1)
, hostname = split_part(full_url, '://' , 2)
;
UPDATE urls
SET pathname = substr(hostname, 1+strpos(hostname, '/' ))
, hostname = split_part(hostname, '/' , 1)
;
-- the full_url field is now redundant: we can drop it
ALTER TABLE urls
DROP column full_url
;
-- and we could always reconstruct the full_url from its components.
CREATE VIEW vurls AS (
SELECT id
, proto || '://' || hostname || '/' || pathname AS full_url
, proto
, hostname
, pathname
FROM urls
);
SELECT * FROM urls;
;
SELECT * FROM vurls;
;
输出:
INSERT 0 10
UPDATE 10
UPDATE 10
ALTER TABLE
CREATE VIEW
id | proto | hostname | pathname
----+-------+-----------------+------------------
1 | ftp | www.myhost.com | secret.tgz
2 | http | www.myhost.com | robots.txt
3 | http | www.myhost.com | index.php
4 | https | www.myhost.com | index.php
5 | http | www.myhost.com | subdir/index.php
6 | https | www.myhost.com | subdir/index.php
7 | http | www.hishost.com | index.php
8 | https | www.hishost.com | index.php
9 | http | www.herhost.com | index.php
10 | https | www.herhost.com | index.php
(10 rows)
id | full_url | proto | hostname | pathname
----+-----------------------------------------+-------+-----------------+------------------
1 | ftp://www.myhost.com/secret.tgz | ftp | www.myhost.com | secret.tgz
2 | http://www.myhost.com/robots.txt | http | www.myhost.com | robots.txt
3 | http://www.myhost.com/index.php | http | www.myhost.com | index.php
4 | https://www.myhost.com/index.php | https | www.myhost.com | index.php
5 | http://www.myhost.com/subdir/index.php | http | www.myhost.com | subdir/index.php
6 | https://www.myhost.com/subdir/index.php | https | www.myhost.com | subdir/index.php
7 | http://www.hishost.com/index.php | http | www.hishost.com | index.php
8 | https://www.hishost.com/index.php | https | www.hishost.com | index.php
9 | http://www.herhost.com/index.php | http | www.herhost.com | index.php
10 | https://www.herhost.com/index.php | https | www.herhost.com | index.php
(10 rows)
关于regex - 在 Postgres 9.1 上更新查询太慢,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11381908/