SQL删除重复数据,只保留一行

概述

在sql的使用中,我们总是碰到需要删除重复数据的情况,但是又不能全部删除完,必须要保留至少一个重复的数据。重复的记录根据两个字段a2,a3判断(实际使用中可以拓展为多个)

实现过程

1. 创建表A,数据如下

a1a2a3
111
212
322
422
533
622

2. 选择重复的行:

 select *,count(*) 
 from A group by a2,a3 
 having count(*)>1;
a1a2a3count(*)
3223

使用in来找到我们想要的ID

SELECT *
FROM A 
WHERE (a2,a3) IN 
(SELECT A.`a2`,A.`a3` 
FROM A 
GROUP BY A.`a2`,A.`a3` 
HAVING COUNT(*)>1)
a1a2a3
322
422
622

那么后面就很好办了:

3. 选出要删除的值:

SELECT * 
FROM A 
WHERE (a2, a3) IN 
  (SELECT `a2`,`a3` 
  FROM A 
  GROUP BY A.`a2`,A.`a3` 
  HAVING COUNT(*) > 1) 
  AND a1 NOT IN 
  (SELECT MIN(a1) 
  FROM A 
  GROUP BY A.`a2`,A.`a3` 
  HAVING COUNT(*) > 1) ;

-- 结果是保留a1最小的值,其他选项全部选出,
-- 请注意此时并不是将Select 改为delete就可以了,如果你直接这样改的话,会报如下错误:
-- You can't specify target table 'A' for update in FROM clause
-- 该错误提示你,不能先select出同一表中的某些值,再update这个表(在同一语句中)。所以要稍微修改一下。

4. 删除值

sql语句如下:

//创建中间表
CREATE TABLE F(a1 INTEGER,a2 INTEGER,a3 INTEGER);
//将要删除的数据插入中间表
INSERT INTO F (
SELECT *  
FROM A 
WHERE (a2, a3) IN (SELECT `a2`,`a3` 
  FROM A GROUP BY A.`a2`,A.`a3` 
  HAVING COUNT(*) > 1) 
  AND a1 NOT IN 
  (SELECT MIN(a1) FROM A 
  GROUP BY A.`a2`,A.`a3` 
  HAVING COUNT(*) > 1)) ;
//删除中间表
DELETE FROM A WHERE a1 IN (SELECT a1 FROM F);
SELECT *FROM A;
a1a2a3
111
212
322
533

备注

如果说不用保留一行数据的话那么就简单多了,只需要一个很简单的sql语句:

    DELETE FROM A WHERE (a2,a3) IN (SELECT a2,a3 FROM A GROUP BY a2,a3 HAVING COUNT(*)>1)
    原文作者:Cesar
    原文地址: https://segmentfault.com/a/1190000004342902
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞