postgresql - 使用索引在 postgres 中加入大表和小表

标签 postgresql join indexing

我们有一个包含 9600 万笔抵押贷款的数据库。在这个数据库中,我们有贷款发放时的原始房价。我们想用一个非常简单的房价指数来更新这些房价,我们从互联网上提取了 csv 文件,并将其导入到与抵押贷款相同的数据库中的一个表中。我已经可以加入表格了,但是速度很慢。我想我没有正确使用索引....这是表格的样子: 按揭贷款:

CREATE TABLE mydb.mortgageloans
(
  pkrmbloan bigint NOT NULL,
  fkdeal bigint NOT NULL,
  edcode character varying(50) NOT NULL,
  poolcutoffdate character varying(50) NOT NULL,
  recno integer NOT NULL,
  submissiontimestamp timestamp without time zone NOT NULL,
  col1 character varying(10),
  col2 character varying(100),
  country character varying(10),
  col......
  col199 character varying(25)
CONSTRAINT rmb_loan_pkey PRIMARY KEY (pkrmbloan),
  CONSTRAINT fk_rmbloan2deal FOREIGN KEY (fkdeal)
      REFERENCES mydb_data.deal (pkdeal) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE CASCADE
)
WITH (
  OIDS=FALSE
);
ALTER TABLE mydb.mortgageloans
  OWNER TO mydb_admin;
GRANT ALL ON TABLE mydb.mortgageloans TO mydb_admin;
GRANT SELECT ON TABLE mydb.mortgageloans TO mydb_addon;

CREATE INDEX idx_rmbloan_edcode_poolcod
  ON mydb.mortgageloans
  USING btree
  (edcode COLLATE pg_catalog."default", poolcutoffdate COLLATE pg_catalog."default");

CREATE INDEX idx_rmbloan_fkdeal
  ON mydb.mortgageloans
  USING btree
  (fkdeal);

CREATE INDEX idx_rmbloan_recno
  ON mydb.mortgageloans
  USING btree
  (recno);

我自己导入的房价指数表。

CREATE TABLE mydb.hpi
(
  period character varying(100),
  au character varying(100),
  be character varying(100),
  ca character varying(100),
  ch character varying(100),
  de character varying(100),
  dk character varying(100),
  es character varying(100),
  fi character varying(100),
  fr character varying(100),
  uk character varying(100),
  ie character varying(100),
  it character varying(100),
  jp character varying(100),
  nl character varying(100),
  no character varying(100),
  nz character varying(100),
  us character varying(100),
  pt character varying(100)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE mydb.hpi
  OWNER TO mydb_admin;

以及根据贷款发起日期(col55)添加原始房价指数的查询

ALTER TABLE mydb.mortgageloans ADD COLUMN OriginalHPI varchar(130);
UPDATE mydb.mortgageloans set OriginalHPI = test.rv
FROM
(
select 
CASE
            WHEN a.country = 'NL'::text THEN c.nl::numeric
            WHEN a.country = 'BE'::text THEN c.be::numeric
            WHEN a.country = 'ES'::text THEN c.es::numeric
            WHEN a.country = 'FR'::text THEN c.fr::numeric
            WHEN a.country = 'IT'::text THEN c.IT::numeric
            WHEN a.country = 'DE'::text THEN c.de::numeric
            WHEN a.country = 'IE'::text THEN c.ie::numeric
            else NULL::numeric
            END AS rv,
 ,a.pkrmbloan
FROM mydb.mortgageloans a
LEFT JOIN mydb_data.hpi c on a.col55 = c.Period
)
 as test
 where test.pkrmbloan = mydb.mortgageloans.pkrmbloan

如有任何帮助,我们将不胜感激!

最好的问候,

蒂姆

编辑:添加解释输出

  • 使用略有不同的数据库名称,想先匿名

实际查询:

EXPLAIN
UPDATE edp_data.rmb_loan set OriginalHPI = test.rv
FROM
(
select 
CASE
            WHEN "substring"(a.edcode::text, 5, 2)::text = 'NL'::text THEN c.nl::numeric
            WHEN "substring"(a.edcode::text, 5, 2)::text = 'BE'::text THEN c.be::numeric
            WHEN "substring"(a.edcode::text, 5, 2)::text = 'ES'::text THEN c.es::numeric
            WHEN "substring"(a.edcode::text, 5, 2)::text = 'FR'::text THEN c.fr::numeric
            WHEN "substring"(a.edcode::text, 5, 2)::text = 'IT'::text THEN c.IT::numeric
            WHEN "substring"(a.edcode::text, 5, 2)::text = 'DE'::text THEN c.de::numeric
            WHEN "substring"(a.edcode::text, 5, 2)::text = 'IE'::text THEN c.ie::numeric
            else 12::numeric
            END AS rv,
 a.pkrmbloan, a.fkdeal
FROM edp_data.rmb_loan a
LEFT JOIN edp_data.hpi c on a.ar55 = c.period
)
 as test
 where test.pkrmbloan = edp_data.rmb_loan.pkrmbloan and test.fkdeal = edp_data.rmb_loan.fkdeal;

输出

"Update on rmb_loan  (cost=22.11..60667621.09 rows=342266 width=4090)"
"  ->  Hash Left Join  (cost=22.11..60667621.09 rows=342266 width=4090)"
"        Hash Cond: ((a.ar55)::text = (c.period)::text)"
"        ->  Merge Join  (cost=0.00..60635941.00 rows=341941 width=4049)"
"              Merge Cond: (rmb_loan.pkrmbloan = a.pkrmbloan)"
"              Join Filter: (rmb_loan.fkdeal = a.fkdeal)"
"              ->  Index Scan using rmb_loan_pkey on rmb_loan  (cost=0.00..28746023.33 rows=179651105 width=4014)"
"              ->  Index Scan using rmb_loan_pkey on rmb_loan a  (cost=0.00..28746023.33 rows=179651105 width=51)"
"        ->  Hash  (cost=15.38..15.38 rows=538 width=56)"
"              ->  Seq Scan on hpi c  (cost=0.00..15.38 rows=538 width=56)"

最佳答案

我认为您混淆 from 子句的原因是您希望列默认为 12。为避免这种情况,只需在添加列时声明默认值

alter table mydb.mortgageloans
add column OriginalHPI varchar(130) default '12';

update edp_data.rmb_loan a
set OriginalHPI = (
    case substring(a.edcode::text, 5, 2)
        when 'NL' then c.nl
        when 'BE' then c.be
        when 'ES' then c.es
        when 'FR' then c.fr
        when 'IT' then c.IT
        when 'DE' then c.de
        when 'IE' then c.ie
        else 12
    end)::numeric
from edp_data.hpi c
where a.ar55 = c.period

为什么将 case 结果转换为 numeric 只是为了将其保存在 varchar 列中?

关于postgresql - 使用索引在 postgres 中加入大表和小表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25929145/

相关文章:

sql - 为 SQL Server 中的常用查询创建适当的索引

mysql 多列索引不工作(如预期的那样)?

postgresql - 如何在 Vercel 无服务器函数中处理 Postgres 连接池? (“too many connections for role” )

asp.net - 无法使用 docker-compose 连接到 postgresql

node.js - Sequelize |如果 hasMany 关联匹配它们的 where 条件,则返回结果

更新 (ALTER) 另一个函数/过程的函数/过程

MySQL JOIN 两次查找

php - 如何在 JOIN 语句中指定 WHERE?

MySQL - 使用连接查找最低价格

sql - 在 postgres 中查询索引表太慢