到目前为止,以下查询需要 5 个小时才能运行:
INSERT $LINEITEM_PUBLIC SELECT *
FROM LINEITEM
WHERE L_PARTKEY IN ( SELECT P_PARTKEY FROM $PART_PUBLIC )
AND L_SUPPKEY IN ( SELECT S_SUPPKEY FROM $SUPPLIER_PUBLIC )
AND L_ORDERKEY IN ( SELECT O_ORDERKEY FROM $ORDERS_PUBLIC );
我添加了所有必需的索引,但似乎没有任何帮助。查询解释计划打印以下内容:
+----+-------------+------------------+------------+--------+--------------------------------+-------------+---------+--------------------------------+----------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------------+------------+--------+--------------------------------+-------------+---------+--------------------------------+----------+----------+-------------+
| 1 | INSERT | $LINEITEM_PUBLIC | NULL | ALL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 1 | SIMPLE | $ORDERS_PUBLIC | NULL | index | PRIMARY | O_ORDERDATE | 3 | NULL | 12826617 | 100.00 | Using index |
| 1 | SIMPLE | LINEITEM | NULL | ref | PRIMARY,LINEITEM_FK2,L_SUPPKEY | PRIMARY | 4 | TPCH.$ORDERS_PUBLIC.O_ORDERKEY | 3 | 100.00 | NULL |
| 1 | SIMPLE | $SUPPLIER_PUBLIC | NULL | eq_ref | PRIMARY | PRIMARY | 4 | TPCH.LINEITEM.L_SUPPKEY | 1 | 100.00 | Using index |
| 1 | SIMPLE | $PART_PUBLIC | NULL | eq_ref | PRIMARY | PRIMARY | 4 | TPCH.LINEITEM.L_PARTKEY | 1 | 100.00 | Using index |
+----+-------------+------------------+------------+--------+--------------------------------+-------------+---------+--------------------------------+----------+----------+-------------+
关于如何优化此查询有什么建议吗?
更新: 上一个查询中表的大小如下:
- LINEITEM:6000 万条记录
- $ORDERS_PUBLIC:1300 万条记录
- $SUPPLIER_PUBLIC:92K 条记录
- $PART_PUBLIC:200 万条记录
最佳答案
确保存在以 O_ORDERKEY
开头的索引。
IN (SELECT ...)
可能优化不佳(取决于版本);试试这个:
INSERT $LINEITEM_PUBLIC
SELECT l.*
FROM LINEITEM AS l
WHERE EXISTS( SELECT * FROM $PART_PUBLIC WHERE P_PARTKEY = L_PARTKEY )
AND EXISTS( SELECT * FROM $SUPPLIER_PUBLIC WHERE S_SUPPKEY = L_SUPPKEY )
AND EXISTS( SELECT * FROM $ORDERS_PUBLIC WHERE O_ORDERKEY = L_ORDERKEY );
关于mysql - TPCH 查询优化,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43690778/