我正在尝试查找使用 MySQL 的共同客户数量最多的企业对。
表格如下:
+------------+------------+
| BusinessID | CustomerID |
+------------+------------+
| A | 1 |
| A | 2 |
| A | 3 |
| B | 4 |
| B | 1 |
| B | 3 |
| B | 2 |
| C | 3 |
| C | 4 |
| C | 5 |
+------------+------------+
我希望输出是企业对和共同客户的数量,如下所示:
+-------------+-------------+------------------------+
| BusinessID | BusinessID | Common Customers Count |
+-------------+-------------+------------------------+
| A | B | 3 |
| A | C | 1 |
| B | C | 2 |
+-------------+-------------+------------------------+
这是我写的查询:
SELECT a.BusinessID,b.BusinessID,COUNT(*) AS ncom
FROM (SELECT BusinessID, CustomerID FROM MYTABLE) AS a JOIN
(SELECT BusinessID,CustomerID FROM MYTABLE) AS b
ON a.BusinessID < b.BusinessID AND a.CustomerID = b.CustomerID
GROUP BY a.BusinessID, b.BusinessID
ORDER BY ncom
问题是我的数据集大约有 5m 行,这在大型数据集上似乎效率太低。我通过限制数据在较小的数据集上测试了查询 - 处理 10k 行需要 8 秒,处理 20k 行需要 30 秒,因此该查询无法运行 5m 行。我还能如何编写查询以使其更快?
最佳答案
不要使用子查询从表中获取列,这可能会阻止它使用索引。
SELECT a.BusinessID, b.BusinessID, COUNT(*) as ncom
FROM MYTABLE AS a
JOIN MYTABLE AS b ON a.BusinessID < b.BusinessID AND a.CustomerID = b.CustomerID
GROUP BY a.BusinessID, b.BusinessID
ORDER BY ncom
此外,为表指定以下索引:
CREATE INDEX ix_cust_bus ON MYTABLE (CustomerID, BusinessID);
关于mysql - 识别 SQL 中匹配次数最多的列中的 ID 对,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42590370/