SQL查询2表空结果

我在一次采访中被问到这个问题: 从下面的 2 个表中，编写一个查询来提取没有销售订单的客户。有多少种方法可以编写此查询以及哪种方法具有最佳性能。

表 1:客户 - 客户 ID
表 2:SalesOrder - OrderID、CustomerID、OrderDate

查询:

SELECT *
FROM Customer C
  RIGHT OUTER JOIN SalesOrder SO ON C.CustomerID = SO.CustomerID
WHERE SO.OrderID = NULL

我的查询正确吗？是否有其他方法可以编写查询并获得相同的结果？

最佳答案

回答MySQL而不是SQL Server，因为你后来用SQL Server标记了它，所以我想(因为这是一个面试问题，它不会打扰你，这是哪个DBMS)。但请注意，我编写的查询是标准 SQL，它们应该在每个 RDBMS 中运行。不过，每个 RDBMS 如何处理这些查询是另一个问题。

我为您编写了这个小程序，用于测试用例。它创建了您指定的客户和订单表，并且我添加了主键和外键，就像人们通常会做的那样。没有其他索引，因为这里值得索引的每个列都已经是主键。创建了 250 个客户，其中 100 个客户下了订单(尽管出于方便，没有人两次/多次)。接下来是数据转储，发布脚本以防万一您想通过增加数字来玩一下。

delimiter $$
create procedure fill_table()
begin
create table customers(customerId int primary key) engine=innodb;
set @x = 1;
while (@x <= 250) do
insert into customers values(@x);
set @x := @x + 1;
end while;

create table orders(orderId int auto_increment primary key,
customerId int,
orderDate timestamp,
foreign key fk_customer (customerId) references customers(customerId)
) engine=innodb;

insert into orders(customerId, orderDate)
select
customerId,
now() - interval customerId day
from
customers
order by rand()
limit 100;

end $$
delimiter ;

call fill_table();

对我来说，这导致了这样的结果:

CREATE TABLE `customers` (
  `customerId` int(11) NOT NULL,
  PRIMARY KEY (`customerId`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

INSERT INTO `customers` VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29),(30),(31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41),(42),(43),(44),(45),(46),(47),(48),(49),(50),(51),(52),(53),(54),(55),(56),(57),(58),(59),(60),(61),(62),(63),(64),(65),(66),(67),(68),(69),(70),(71),(72),(73),(74),(75),(76),(77),(78),(79),(80),(81),(82),(83),(84),(85),(86),(87),(88),(89),(90),(91),(92),(93),(94),(95),(96),(97),(98),(99),(100),(101),(102),(103),(104),(105),(106),(107),(108),(109),(110),(111),(112),(113),(114),(115),(116),(117),(118),(119),(120),(121),(122),(123),(124),(125),(126),(127),(128),(129),(130),(131),(132),(133),(134),(135),(136),(137),(138),(139),(140),(141),(142),(143),(144),(145),(146),(147),(148),(149),(150),(151),(152),(153),(154),(155),(156),(157),(158),(159),(160),(161),(162),(163),(164),(165),(166),(167),(168),(169),(170),(171),(172),(173),(174),(175),(176),(177),(178),(179),(180),(181),(182),(183),(184),(185),(186),(187),(188),(189),(190),(191),(192),(193),(194),(195),(196),(197),(198),(199),(200),(201),(202),(203),(204),(205),(206),(207),(208),(209),(210),(211),(212),(213),(214),(215),(216),(217),(218),(219),(220),(221),(222),(223),(224),(225),(226),(227),(228),(229),(230),(231),(232),(233),(234),(235),(236),(237),(238),(239),(240),(241),(242),(243),(244),(245),(246),(247),(248),(249),(250);

CREATE TABLE `orders` (
  `orderId` int(11) NOT NULL AUTO_INCREMENT,
  `customerId` int(11) DEFAULT NULL,
  `orderDate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`orderId`),
  KEY `fk_customer` (`customerId`),
  CONSTRAINT `orders_ibfk_1` FOREIGN KEY (`customerId`) REFERENCES `customers` (`customerId`)
) ENGINE=InnoDB AUTO_INCREMENT=128 DEFAULT CHARSET=utf8;

INSERT INTO `orders` VALUES (1,247,'2013-06-24 19:50:07'),(2,217,'2013-07-24 19:50:07'),(3,8,'2014-02-18 20:50:07'),(4,40,'2014-01-17 20:50:07'),(5,52,'2014-01-05 20:50:07'),(6,80,'2013-12-08 20:50:07'),(7,169,'2013-09-10 19:50:07'),(8,135,'2013-10-14 19:50:07'),(9,115,'2013-11-03 20:50:07'),(10,225,'2013-07-16 19:50:07'),(11,112,'2013-11-06 20:50:07'),(12,243,'2013-06-28 19:50:07'),(13,158,'2013-09-21 19:50:07'),(14,24,'2014-02-02 20:50:07'),(15,214,'2013-07-27 19:50:07'),(16,25,'2014-02-01 20:50:07'),(17,245,'2013-06-26 19:50:07'),(18,182,'2013-08-28 19:50:07'),(19,166,'2013-09-13 19:50:07'),(20,69,'2013-12-19 20:50:07'),(21,85,'2013-12-03 20:50:07'),(22,44,'2014-01-13 20:50:07'),(23,103,'2013-11-15 20:50:07'),(24,19,'2014-02-07 20:50:07'),(25,33,'2014-01-24 20:50:07'),(26,102,'2013-11-16 20:50:07'),(27,41,'2014-01-16 20:50:07'),(28,94,'2013-11-24 20:50:07'),(29,43,'2014-01-14 20:50:07'),(30,150,'2013-09-29 19:50:07'),(31,218,'2013-07-23 19:50:07'),(32,131,'2013-10-18 19:50:07'),(33,77,'2013-12-11 20:50:07'),(34,2,'2014-02-24 20:50:07'),(35,45,'2014-01-12 20:50:07'),(36,230,'2013-07-11 19:50:07'),(37,101,'2013-11-17 20:50:07'),(38,31,'2014-01-26 20:50:07'),(39,56,'2014-01-01 20:50:07'),(40,176,'2013-09-03 19:50:07'),(41,223,'2013-07-18 19:50:07'),(42,145,'2013-10-04 19:50:07'),(43,26,'2014-01-31 20:50:07'),(44,62,'2013-12-26 20:50:07'),(45,195,'2013-08-15 19:50:07'),(46,153,'2013-09-26 19:50:07'),(47,179,'2013-08-31 19:50:07'),(48,104,'2013-11-14 20:50:07'),(49,7,'2014-02-19 20:50:07'),(50,209,'2013-08-01 19:50:07'),(51,86,'2013-12-02 20:50:07'),(52,110,'2013-11-08 20:50:07'),(53,204,'2013-08-06 19:50:07'),(54,187,'2013-08-23 19:50:07'),(55,114,'2013-11-04 20:50:07'),(56,38,'2014-01-19 20:50:07'),(57,236,'2013-07-05 19:50:07'),(58,79,'2013-12-09 20:50:07'),(59,96,'2013-11-22 20:50:07'),(60,37,'2014-01-20 20:50:07'),(61,207,'2013-08-03 19:50:07'),(62,22,'2014-02-04 20:50:07'),(63,120,'2013-10-29 20:50:07'),(64,200,'2013-08-10 19:50:07'),(65,51,'2014-01-06 20:50:07'),(66,181,'2013-08-29 19:50:07'),(67,4,'2014-02-22 20:50:07'),(68,123,'2013-10-26 19:50:07'),(69,108,'2013-11-10 20:50:07'),(70,55,'2014-01-02 20:50:07'),(71,76,'2013-12-12 20:50:07'),(72,6,'2014-02-20 20:50:07'),(73,18,'2014-02-08 20:50:07'),(74,211,'2013-07-30 19:50:07'),(75,53,'2014-01-04 20:50:07'),(76,216,'2013-07-25 19:50:07'),(77,32,'2014-01-25 20:50:07'),(78,74,'2013-12-14 20:50:07'),(79,138,'2013-10-11 19:50:07'),(80,197,'2013-08-13 19:50:07'),(81,221,'2013-07-20 19:50:07'),(82,118,'2013-10-31 20:50:07'),(83,61,'2013-12-27 20:50:07'),(84,28,'2014-01-29 20:50:07'),(85,16,'2014-02-10 20:50:07'),(86,39,'2014-01-18 20:50:07'),(87,3,'2014-02-23 20:50:07'),(88,46,'2014-01-11 20:50:07'),(89,189,'2013-08-21 19:50:07'),(90,59,'2013-12-29 20:50:07'),(91,249,'2013-06-22 19:50:07'),(92,127,'2013-10-22 19:50:07'),(93,47,'2014-01-10 20:50:07'),(94,178,'2013-09-01 19:50:07'),(95,141,'2013-10-08 19:50:07'),(96,188,'2013-08-22 19:50:07'),(97,220,'2013-07-21 19:50:07'),(98,15,'2014-02-11 20:50:07'),(99,175,'2013-09-04 19:50:07'),(100,206,'2013-08-04 19:50:07');

好的，现在开始提问。我想到了三种方法，我省略了 MDiesel 所做的右连接，因为它实际上只是左连接的另一种编写方式。它是为懒惰的 SQL 开发人员发明的，他们不想切换表名，而只是重写一个单词。

无论如何，第一个查询:

select
c.*
from
customers c
left join orders o on c.customerId = o.customerId
where o.customerId is null;

执行计划的结果如下:

+----+-------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+
| id | select_type | table | type  | possible_keys | key         | key_len | ref              | rows | Extra                    |
+----+-------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+
|  1 | SIMPLE      | c     | index | NULL          | PRIMARY     | 4       | NULL             |  250 | Using index              |
|  1 | SIMPLE      | o     | ref   | fk_customer   | fk_customer | 5       | wtf.c.customerId |    1 | Using where; Using index |
+----+-------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+

第二个查询:

select
c.*
from
customers c
where c.customerId not in (select distinct customerId from orders);

执行计划的结果如下:

+----+--------------------+--------+----------------+---------------+-------------+---------+------+------+--------------------------+
| id | select_type        | table  | type           | possible_keys | key         | key_len | ref  | rows | Extra                    |
+----+--------------------+--------+----------------+---------------+-------------+---------+------+------+--------------------------+
|  1 | PRIMARY            | c      | index          | NULL          | PRIMARY     | 4       | NULL |  250 | Using where; Using index |
|  2 | DEPENDENT SUBQUERY | orders | index_subquery | fk_customer   | fk_customer | 5       | func |    2 | Using index              |
+----+--------------------+--------+----------------+---------------+-------------+---------+------+------+--------------------------+

第三个查询:

select
c.*
from
customers c
where not exists (select 1 from orders o where o.customerId = c.customerId);

执行计划的结果如下:

+----+--------------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+
| id | select_type        | table | type  | possible_keys | key         | key_len | ref              | rows | Extra                    |
+----+--------------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+
|  1 | PRIMARY            | c     | index | NULL          | PRIMARY     | 4       | NULL             |  250 | Using where; Using index |
|  2 | DEPENDENT SUBQUERY | o     | ref   | fk_customer   | fk_customer | 5       | wtf.c.customerId |    1 | Using where; Using index |
+----+--------------------+-------+-------+---------------+-------------+---------+------------------+------+--------------------------+

我们可以在所有执行计划中看到，customers 表是作为一个整体读取的，但是是从索引读取的(隐式表作为唯一的列是主键)。当您从表中选择不在索引中的其他列时，这可能会发生变化。

第一个似乎是最好的。对于客户中的每一行，仅读取订单中的一行。 id 列表明，MySQL 可以一步完成此操作，因为只涉及索引。

第二个查询似乎是最差的(尽管所有 3 个查询的性能应该不会太差)。对于客户中的每一行，都会执行子查询(select_type 列告诉了这一点)。

第三个查询没有太大不同，因为它使用依赖子查询，但应该比第二个查询执行得更好。解释这些微小的差异将会导致到目前为止。如果您有兴趣，请参阅以下手册页，其中解释了每列及其值的含义:EXPLAIN output

最后:我想说，第一个查询的性能最好，但与往常一样，最终必须测量、测量、再测量。

关于SQL查询2表空结果，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/22052074/

SQL查询2表空结果

上一篇：ruby-on-rails - Modelwhichbelongs_to可以是几个模型？

下一篇：matlab - 删除单词列表中的重复元素并计算重复次数