mysql - 如何检查MySQL中的每一行(时间戳 - 60秒)以确定是否存在重复数据?

标签 mysql sql laravel duplicates

我有这样的 table

table data will be like this

你会看到红十字签名就是我想要的结果。 我想将红十字签名移至错误日志表,因为它指示重复的数据。

确定数据是否重复:

  1. 查找每行时间戳之前 60 秒的数据
  2. 相同的advertiser_id、offer_id、commission_id、commission_tier_id、creative_id、publisher_id、publisher_asset_id、source_id

示例:

1545981655
1545981657 x -> will marked as duplicate because 1545981657 - 60 = 1545981597. Search first data > 1545981597 except this line. 1545981655 will return.
1545981660 x -> will marked as duplicate because 1545981660 - 60 = 1545981600. Search first data > 1545981600 except this line. 1545981655 will return.
1545981662 x -> will marked as duplicate because 1545981662 - 60 = 1545981602. Search first data > 1545981602 except this line. 1545981655 will return.
1545981707  -> won't marked as duplicate because 1545981707 - 60 = 1545981647. Search first data > 1545981647 except this line. 1545981655 won't return because publisher_asset_id is different.
1545981710 x -> will marked as duplicate because 1545981710 - 60 = 1545981650. Search first data > 1545981650 except this line. 1545981707 will return.
1545981712 x -> will marked as duplicate because 1545981712 - 60 = 1545981652. Search first data > 1545981650 except this line. 1545981707 will return.
1545981714 x -> will marked as duplicate because 1545981714 - 60 = 1545981654. Search first data > 1545981654 except this line. 1545981707 will return.
1545981718  -> won't marked as duplicate because 1545981718 - 60 = 1545981658. Search first data > 1545981658 except this line. No data returns, because pubisher_asset_id is different

如何在 mysql 查询语句中实现此目的,而不是循环整个数据?

我想达到这样的结果:

result table want to achieve

需要你们的帮助。 非常感谢。

最佳答案

将表 T 重命名为您的表并尝试以下操作:

SELECT * FROM (
SELECT id, advertiser_id, offer_id, commission_id, commission_tier_id, creative_id, publisher_id, publisher_asset_id, source_id, impression_timestamp,
COUNT(*) OVER (PARTITION BY advertiser_id, offer_id, commission_id, commission_tier_id, creative_id, publisher_id, publisher_asset_id, source_id ORDER BY impression_timestamp RANGE 60 PRECEDING) AS DuplicateFlag
FROM T
) DetectDuplicate
WHERE DuplicateFlag > 1

编辑:在 MySQL 8 之前,上面的查询无法完成,必须替换为带有 JOIN 的查询(不幸的是有点慢):

SELECT DISTINCT T2.*
FROM T T1
LEFT OUTER JOIN T T2
  ON T1.id                   <> T2.id
 AND T1.advertiser_id         = T2.advertiser_id
 AND T1.offer_id              = T2.offer_id
 AND T1.commission_id         = T2.commission_id
 AND T1.commission_tier_id    = T2.commission_tier_id
 AND T1.creative_id           = T2.creative_id
 AND T1.publisher_id          = T2.publisher_id
 AND T1.publisher_asset_id    = T2.publisher_asset_id
 AND T1.source_id             = T2.source_id
 AND T1.impression_timestamp >= T2.impression_timestamp - 60
WHERE T2.id IS NOT NULL

至少还有一种其他语法是可能的,例如:

SELECT *
FROM T Main
WHERE EXISTS (
    SELECT 1
    FROM T
   WHERE id                   <> Main.id
     AND advertiser_id         = Main.advertiser_id
     AND offer_id              = Main.offer_id
     AND commission_id         = Main.commission_id
     AND commission_tier_id    = Main.commission_tier_id
     AND creative_id           = Main.creative_id
     AND publisher_id          = Main.publisher_id
     AND publisher_asset_id    = Main.publisher_asset_id
     AND source_id             = Main.source_id
     AND impression_timestamp >= Main.impression_timestamp - 60
)

关于mysql - 如何检查MySQL中的每一行(时间戳 - 60秒)以确定是否存在重复数据?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54068375/

相关文章:

php - 启用文件信息扩展时的 Laravel "Unable to guess the mime type as no guessers are available (Did you enable the php_fileinfo extension?)"

php - 与 mysqli_fetch_array 的混淆

php - PDO DELETE 还创建了一个新的 INSERT

sql - BigQuery,SQL 更新命令,错误 : Scalar subquery produced more than one element

sql - Postgres : Only return results that do not have letters

php - 创建盲SQL注入(inject)漏洞

Laravel 从 3 个表中获取相关数据

php - 如何计算横幅展示次数和点击次数

php - 无法连接到PHP中的数据库(使用WAMP服务器)

laravel - 如何在 Laravel 中在 MIN 和 MAX 之间搜索