mysql大数据量下取两张表的差集

2024年6月8日 244次阅读来源: 西门吹雪的峥嵘岁月

发现之前线上接口响应缓慢，排查后发现百万数据量下sql取差集耗时过长, 查找资料后整理如下:

需求

用户表(user),提交记录表(record)。查询某天未提交的用户。

表结构

user: id,name,status
record: id,uid,create
关联： user.id=record.uid

思路

取两张表的关联id(uid)的并集并作为临时表temp的id,然后count(id)=1即为两张表的差集(因为两张表union all后同一个关联id出现次数必然会大于等于2)

之前sql

select id,name from user
left join record on record.create='2020-02-11' and user.id=record.uid 
where user.status=1 and record.id is null

修改后sql

select id from (
select id from user where user.status=1
union all
select DISTINCT uid as id from record where record.create='2020-02-11'
)temp
group by id 
HAVING COUNT(id) = 1

最后

这种修改确实可以大幅提高查询效率，只是这样获取的结果只有user表的id,不能一并获取user表的其他字段比如user.name(原因是union操作要求字段属性一致，对应我们的需求就是record表也有一个与name属性一样的字段与之对应) 。所以后续可能还需要一个in操作。

扩展阅读

union和union all区别
union：对两个结果集进行并集操作，不包括重复行，同时进行默认规则的排序；
union All：对两个结果集进行并集操作，包括重复行，不进行排序，效率高于union ；
mysql join 查询原理
参考链接
https://bbs.csdn.net/topics/391844448

    原文作者：西门吹雪的峥嵘岁月
    原文地址: https://blog.csdn.net/qq_41734645/article/details/104285753
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。