我在 Postgresql 中有一个大表(~6M 行,41 列),如下所示:
id | answer1 | answer2 | answer3 | ... | answer40
1 | xxx | yyy | null | ... | null
2 | xxx | null | null | ... | null
3 | xxx | null | zzz | ... | aaa
注意每一行都有很多空列,我只想要那些有数据的
我想规范化它来得到这个:
id | answers
1 | xxx
1 | yyy
2 | xxx
3 | xxx
3 | zzz
...
3 | aaa
问题是,哪个更高效/更快,多个插入还是单个插入和多个联合?:
选项 1
create new_table as
select id, answer1 from my_table where answer1 is not null
union
select id, answer2 from my_table where answer2 is not null
union
select id, answer3 from my_table where answer3 is not null
union ...
选项 2
create new_table as select id, answer1 from my_table where answer1 is not null;
insert into new_table select id, answer2 from my_table where answer2 is not null;
insert into new_table select id, answer3 from my_table where answer3 is not null;
...
选项 3:有更好的方法吗?
最佳答案
选项 2 应该更快。
将所有语句包装在 begin-commit
block 中以节省单独提交的时间。
为了更快的选择,确保被过滤的列(例如 where answer1 is not null
)有索引
关于performance - 什么更有效率 : several insert vs single insert with union,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30081220/