记一次MySql去重的经历

整个表target_table大概有14000W条记录,有个主键id,需要去重的字段是mac.

  1. 查看重复的记录数

    SELECT
     min( id ) AS mid,
     mac,
     count ( * ) 
    FROM
     target_table 
    GROUP BY
     mac 
    HAVING
     count( mac ) > 1;
    

    整个表重复的数据2w+

  2. 创建临时表

    CREATE TABLE dup_tmp (
    SELECT
     min( id ) AS mid,
     mac,
     count ( * ) 
    FROM
     target_table 
    GROUP BY
     mac 
    HAVING
     count( mac ) > 1 
     );
    

    将重复的mac筛选到临时表

  3. 删除多余的数据保留最小的id记录

    DELETE target 
    FROM
     target_table AS target
     JOIN dup_tmp AS tmp ON target.mac = tmp.mac 
    WHERE
     target.id > tmp.mid;
    

    刚开始的时候mac字段没有建索引,删除很慢。建了索引删除就很快了。

    原文作者:始终我是我
    原文地址: https://www.jianshu.com/p/bba87524c55f
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞