mysql - 只删除连续的重复行

标签 mysql sql sql-delete

我已从 API 收集数据以构建历史记录。最初,我每五分钟保存一次所有值。后来,我将我的程序更改为只保存已更改的数据。

现在,我想清理我的旧数据并删除 count 与同一 accountid 中的先前记录相比未发生变化的所有值

account id      count   time
42      12147   492     2015-09-20 11:31:14.0
42      12147   492     2015-09-20 11:36:19.0 // delete
13      12147   246     2015-09-20 11:31:14.0
2       12253   183     2015-09-20 11:36:19.0
2       19684   805     2015-09-20 12:00:41.0 // note in next comment
2       19684   810     2015-09-20 12:05:41.0
2       19684   805     2015-09-20 12:10:41.0 // we had this combination, but don't delete this record because the previous value was different
2       19684   805     2015-09-20 12:15:41.0 // delete
2       19684   805     2015-09-20 12:20:41.0 // delete
2       19684   806     2015-09-20 12:25:41.0

我尝试通过 accountidcount 上的 group by 来解决这个问题。但是,通过这种方法,它将删除非连续 重复项——即,如果某条记录在一段时间后再次具有相同的值,它将属于同一组。

我还考虑编写一个小脚本,在其中迭代所有数据并删除当前行,如果 accountidcount与之前的记录相同,但我很好奇这是否可以通过单个 SQL 语句实现?

最佳答案

您可以使用以下查询:

DELETE history 
FROM history 
INNER JOIN (SELECT MIN(time) AS minTime, account, id, count
            FROM history
            GROUP BY account, id, count) AS h
ON history.account = h.account AND history.id = h.id AND history.count = h.count
WHERE history.time > h.minTime

Demo here

编辑:

修改后,我认为OP的示例数据仍然存在一些错误(time字段应按升序排列)。

使用表中存在 PK 的附加假设,您可以使用以下查询:

SELECT pk
FROM history AS h1
WHERE account = (SELECT account 
                 FROM history AS h2
                 WHERE h1.account = h2.account AND
                       h1.id = h2.id AND                       
                       h2.time < h1.time
                 ORDER BY time DESC 
                 LIMIT 1)
      AND
      id = (SELECT id 
            FROM history AS h2
            WHERE h1.account = h2.account AND
                  h1.id = h2.id AND                  
                  h2.time < h1.time
            ORDER BY time DESC 
            LIMIT 1)
      AND
      count = (SELECT count
               FROM history AS h2
               WHERE h1.account = h2.account AND
                     h1.id = h2.id AND                     
                     h2.time < h1.time
               ORDER BY time DESC 
               LIMIT 1)

为了识别to-de-deleted记录(参见this demo)。

现在您可以使用 NOT IN 运算符轻松删除不需要的行:

DELETE FROM history 
WHERE pk IN (
SELECT x.pk
FROM (             
  SELECT pk
  FROM history AS h1
  WHERE 
     account = (SELECT account 
                FROM history AS h2
                WHERE h1.account = h2.account AND
                      h1.id = h2.id AND                       
                      h2.time < h1.time
                      ORDER BY time DESC 
                      LIMIT 1)

     AND

     id = (SELECT id 
           FROM history AS h2
           WHERE h1.account = h2.account AND
                 h1.id = h2.id AND                  
                 h2.time < h1.time
           ORDER BY time DESC 
           LIMIT 1)

     AND

     count = (SELECT count
              FROM history AS h2
              WHERE h1.account = h2.account AND
                    h1.id = h2.id AND                     
                    h2.time < h1.time
              ORDER BY time DESC 
              LIMIT 1)) AS x)

Demo here

编辑 2:

使用变量来定位要删除的 pk 值可能会导致查询速度大大加快:

SELECT pk
FROM (
  SELECT pk, account, id, count, time,
         @rn := IF (account = @acc AND id = @id AND count = @count,
                    @rn + 1, 1) AS rn,
         @acc := account,
         @id := id,
         @count := count
  FROM history
  CROSS JOIN (SELECT @rn = 0, @acc = 0, @id = 0, @count = 0) AS vars
  ORDER BY account, id, time, count ) AS t
WHERE t.rn > 1

Demo here

关于mysql - 只删除连续的重复行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32682656/

相关文章:

MySQL根据最低单元格值连接表行

MySQL - 如何删除多对多关系中的相关行

mysql - 标准差 (STDDEV) 是否适合该工作?

php - 如何使用多个?在 PHP 的准备语句中

sql - 使用后现代将 json 数据插入 postgresql 数据库

python - 使用 Python/pyodbc 插入 Access DB

sql - 为什么查询中的 "Where 1 <> 1"会返回所有行?

JPA 双向关系中的一对一实体删除

mysql - 从没有键(唯一)列的表中删除重复项

php - 如何将sql数据导入sqlite?