本篇博客学习如何用SQL来实现次日、三日及七日留存率的计算。
假设有一个表 role_login_back,有字段:event_time表示登陆日期时间和device_id表示设备ID。
event_time | device_id |
---|---|
2020-01-01 12:40:44 | cB789hj888888 |
现在要计算用户的次日留存率、三日留存率、七日留存率等。在计算之前,我们先来弄清楚这些留存的定义。
- 次日留存:即当日登录后,第二天也登录的用户,称为次日留存用户。
- 三日留存:即当日登录后,第三天也登录的用户,称为三日留存用户。
- 七日留存:即当日登录后,第七天也登录的用户,称为七日留存用户。
代码如下:
① 将device_id分组,每个device_id按照时间进行排序。
select
distinct date(event_time) as log_day, # 只关心日期,不关注具体的时间。
device_id as user_id_d0
from role_login_back
group by device_id
order by log_day; a
② 取出次日、第三天以及第7天登录的设备id。
select
distinct log_day,
a.user_id_d0,
b.device_id as user_id_d1,
c.device_id as user_id_d3,
d.device_id as user_id_d7
from a
left join role_login_back b
on datediff(date(b.event_time),a.log_day) = 1
and a.user_id_d0 = b.device_id
left join role_login_back c
on datediff(date(c.event_time), a.log_day) = 2
and a.user_id_d0 = c.device_id
left join role_login_back d
on datediff(date(d.event_time), a.log_day) = 6
and a.user_id_d0 = d.device_id; temp
③ 计算次日、三日及七日留存率。
select
log_day '日期',
count(user_id_d0) '新增数量',
count(user_id_d1) / count(user_id_d0) '次日留存率',
count(user_id_d3) / count(user_id_d0) '3日留存率',
count(user_id_d7) / count(user_id_d0) '7日留存率',
from temp
group by log_day;
综合代码:
select
log_day '日期',
count(user_id_d0) '新增数量',
count(user_id_d1) / count(user_id_d0) '次日留存率',
count(user_id_d3) / count(user_id_d0) '3日留存率',
count(user_id_d7) / count(user_id_d0) '7日留存率',
from (
select
distinct log_day,
a.user_id_d0,
b.device_id as user_id_d1,
c.device_id as user_id_d3,
d.device_id as user_id_d7
from
(select
distinct date(event_time) as log_day, # 只关心日期,不关注具体的时间。
device_id as user_id_d0
from role_login_back
group by device_id
order by log_day) a
left join role_login_back b
on datediff(date(b.event_time),a.log_day) = 1
and a.user_id_d0 = b.device_id
left join role_login_back c
on datediff(date(c.event_time), a.log_day) = 2
and a.user_id_d0 = c.device_id
left join role_login_back d
on datediff(date(d.event_time), a.log_day) = 6
and a.user_id_d0 = d.device_id
)
group by log_day;