我在 Google Cloud 上的 mySQL 数据库中有一个包含 433,333 条记录的表,如下所示:
Album_ID | Track_Len | Track_Name | Ft_LName1 | Ft_FName1 | Ft_LName2 | Ft_FName2 | Ft_LName3 | Ft_FName3 | Row_Num |
+---------+-----------+---------------------------------------+-----------+-----------+-----------+-----------+-----------+-----------+---------+
| N40781 | 5.19 | Tumbala (Da Lata Remix) | NULL | Novalima | NULL | NULL | NULL | NULL | 1 |
| N40781 | 5.01 | Ruperta (Zeb Remix) | NULL | Novalima | NULL | NULL | NULL | NULL | 2 |
| N40781 | 6.35 | Coba Guarango (Toni Economides Remix) | NULL | Novalima | NULL | NULL | NULL | NULL | 3 |
| B15033 | 6.02 | II-V-P | Quartet | ARC | NULL | NULL | NULL | NULL | 4 |
| N32395 | 4.47 | My Babe | Stigers | Curtis | NULL | NULL | NULL | NULL | 5 |
| N32395 | 5.13 | Thats All Right | Stigers | Curtis | NULL | NULL | NULL | NULL | 6 |
注意主键应该是 (Album_ID,Track_Name)。 有很多重复项,所以我正在运行以下命令来尝试消除它们:
delete from Track where (Album_ID, Track_Name, Row_Num) IN(
select Album_ID, Track_Name, MAX(Row_Num)
from (select Album_ID,Track_Name,Row_Num from Track) as x
where (Album_ID, Track_Name) IN(
select Album_ID,Track_Name
from (select Album_ID,Track_Name from Track) as y
group by Album_ID, Track_Name
having count(*) > 1
)
group by Album_ID,Track_Name);
但这花费的时间太长,并且不会立即删除所有重复项。 有什么优化此查询的建议吗??
+------------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+---------------------+------+-----+---------+----------------+
| Album_ID | varchar(6) | YES | | NULL | |
| Track_Len | decimal(4,2) | YES | | NULL | |
| Track_Name | varchar(100) | YES | | NULL | |
| Ft_LName1 | varchar(40) | YES | | NULL | |
| Ft_FName1 | varchar(40) | YES | | NULL | |
| Ft_LName2 | varchar(40) | YES | | NULL | |
| Ft_FName2 | varchar(40) | YES | | NULL | |
| Ft_LName3 | varchar(40) | YES | | NULL | |
| Ft_FName3 | varchar(40) | YES | | NULL | |
| Row_Num | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
+------------+---------------------+------+-----+---------+----------------+
最佳答案
在 MySQL 中执行此操作的传统方法是使用 JOIN
和 GROUP BY
:
delete t
from Track t left join
(select tt.Album_ID, tt.Track_Name, min(tt.row_num) as min_row_num
from Track tt
group by tt.Album_ID, tt.Track_Name
) tt
on t.row_number = tt.min_row_num
where tt.min_row_num is null;
这利用了 id
整体上唯一且可能是主键的事实。您还可以将其指定为:
delete t
from Track t join
(select tt.Album_ID, tt.Track_Name, min(tt.row_num) as min_row_num
from Track tt
group by tt.Album_ID, tt.Track_Name
) tt
on tt.Album_ID = t.Album_ID and
tt.Track_Name = t.Track_Name and
t.row_number > tt.min_row_num;
关于mysql - 如何优化查询以删除重复的 mySQL?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55082276/