我有一个表属性
source| address | price | size | created_at |duplicate
file1 |Fleet St., 1 | 230.00 | 13 | 2019-12-01 | 0
file2 |Help St.43 | 90.00 | 4 | 2018-5-5 | 0
file1 |Fleet St., 1 | 230.00 | 13 | 2019-10-01 | 0
file1 |Fleet St., 1 | 230.00 | 13 | 2017-10-01 | 0
我需要根据来源、地址、价格和大小查找重复项,并将除最近的以外的所有项都标记为重复项。
低于所需的输出。
source| address | price | size | created_at |duplicate
file1 |Fleet St., 1 | 230.00 | 13 | 2019-12-01 | 0
file2 |Help St.43 | 90.00 | 4 | 2018-5-5 | 0
file1 |Fleet St., 1 | 230.00 | 13 | 2019-10-01 | 1
file1 |Fleet St., 1 | 230.00 | 13 | 2017-10-01 | 1
我提出了以下查询来识别重复项,但我不知道如何继续。
SELECT
source,
address,
COUNT(address),
price,
COUNT(price),
size,
COUNT(size),
MAX(created_at)
FROM properties
GROUP BY
source,
address,
price,
size
HAVING
COUNT(address) > 1 AND
COUNT(price) > 1 AND
COUNT(size) > 1 AND
COUNT(source) > 1
我们将不胜感激。
最佳答案
如果要更改值,请使用 update
。在这种情况下,使用 from
和聚合查询:
update properties p join
(select source, address, price, size,
max(created_at) as max_created_at
from properties
group by source, address, price, size
) pp
using (source, address, price, size)
set p.is_duplicate = 1
where p.created_at < pp.max_created_at;
请注意,这不会将最近的重复值设置为 0
。如果您的数据以 NULL
值开头,则使用:
update properties p join
(select source, address, price, size,
max(created_at) as max_created_at
from properties
group by source, address, price, size
) pp
using (source, address, price, size)
set p.is_duplicate = (p.created_at < pp.max_created_at);
关于mysql - 在 Mysql 中标记旧的重复项,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57645301/