我已从 API 收集数据以构建历史记录。最初,我每五分钟保存一次所有值。后来,我将我的程序更改为只保存已更改的数据。
现在,我想清理我的旧数据并删除 count
与同一 account
和 id 中的先前记录相比未发生变化的所有值
。
account id count time
42 12147 492 2015-09-20 11:31:14.0
42 12147 492 2015-09-20 11:36:19.0 // delete
13 12147 246 2015-09-20 11:31:14.0
2 12253 183 2015-09-20 11:36:19.0
2 19684 805 2015-09-20 12:00:41.0 // note in next comment
2 19684 810 2015-09-20 12:05:41.0
2 19684 805 2015-09-20 12:10:41.0 // we had this combination, but don't delete this record because the previous value was different
2 19684 805 2015-09-20 12:15:41.0 // delete
2 19684 805 2015-09-20 12:20:41.0 // delete
2 19684 806 2015-09-20 12:25:41.0
我尝试通过 account
、id
和 count
上的 group by
来解决这个问题。但是,通过这种方法,它将删除非连续 重复项——即,如果某条记录在一段时间后再次具有相同的值,它将属于同一组。
我还考虑编写一个小脚本,在其中迭代所有数据并删除当前行,如果 account
、id
和 count
与之前的记录相同,但我很好奇这是否可以通过单个 SQL 语句实现?
最佳答案
您可以使用以下查询:
DELETE history
FROM history
INNER JOIN (SELECT MIN(time) AS minTime, account, id, count
FROM history
GROUP BY account, id, count) AS h
ON history.account = h.account AND history.id = h.id AND history.count = h.count
WHERE history.time > h.minTime
编辑:
修改后,我认为OP的示例数据仍然存在一些错误(time
字段应按升序排列)。
使用表中存在 PK 的附加假设,您可以使用以下查询:
SELECT pk
FROM history AS h1
WHERE account = (SELECT account
FROM history AS h2
WHERE h1.account = h2.account AND
h1.id = h2.id AND
h2.time < h1.time
ORDER BY time DESC
LIMIT 1)
AND
id = (SELECT id
FROM history AS h2
WHERE h1.account = h2.account AND
h1.id = h2.id AND
h2.time < h1.time
ORDER BY time DESC
LIMIT 1)
AND
count = (SELECT count
FROM history AS h2
WHERE h1.account = h2.account AND
h1.id = h2.id AND
h2.time < h1.time
ORDER BY time DESC
LIMIT 1)
为了识别to-de-deleted记录(参见this demo)。
现在您可以使用 NOT IN
运算符轻松删除不需要的行:
DELETE FROM history
WHERE pk IN (
SELECT x.pk
FROM (
SELECT pk
FROM history AS h1
WHERE
account = (SELECT account
FROM history AS h2
WHERE h1.account = h2.account AND
h1.id = h2.id AND
h2.time < h1.time
ORDER BY time DESC
LIMIT 1)
AND
id = (SELECT id
FROM history AS h2
WHERE h1.account = h2.account AND
h1.id = h2.id AND
h2.time < h1.time
ORDER BY time DESC
LIMIT 1)
AND
count = (SELECT count
FROM history AS h2
WHERE h1.account = h2.account AND
h1.id = h2.id AND
h2.time < h1.time
ORDER BY time DESC
LIMIT 1)) AS x)
编辑 2:
使用变量来定位要删除的 pk
值可能会导致查询速度大大加快:
SELECT pk
FROM (
SELECT pk, account, id, count, time,
@rn := IF (account = @acc AND id = @id AND count = @count,
@rn + 1, 1) AS rn,
@acc := account,
@id := id,
@count := count
FROM history
CROSS JOIN (SELECT @rn = 0, @acc = 0, @id = 0, @count = 0) AS vars
ORDER BY account, id, time, count ) AS t
WHERE t.rn > 1
关于mysql - 只删除连续的重复行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32682656/