[Python]实现用户注册数据的分析,计算复购率,回购率注销率等--在线教育行业分析案例连载2

2021年2月12日 124次阅读来源: 今天我有更博学吗？

文章目录

前言

该文章为连载的第二篇：用户注册数据分析

某家在线教育机构拥有自己开发的教育产品VLE，该教育机构提供了他们四个学期里，开展的七门课的数据，接下来我会根据这些数据，为该教育机构做一系列的数据分析，包括用户的RFM模型、用户分群特征、用户成绩分析等等。

该教育机构部分数据库结构如下

《[Python]实现用户注册数据的分析,计算复购率,回购率注销率等--在线教育行业分析案例连载2》

如下这一篇文章为：用户注册数据分析

studentRegistration记录了每学期每门课的用户注册信息，其中：

code_presentation：学期
code_module：课程
id_student：用户编号
date_registration：用户注册日期离开学日期的距离（如：-1，即用户在该学期该门课开学前一天注册，5即为用户在该学期该门课开学后五天注册）
date_unregistration：用户注销日期离开学日期的距离

我们现在要计算并且可视化

用户注册日期分布
用户延迟注册率
用户流失率
用户复购率
用户回购率

一、导入库

import pandas as pd
import numpy as np
import datetime
import time
import matplotlib
matplotlib.rcParams['font.sans-serif'] = ['Arial Unicode MS']
matplotlib.rcParams['axes.unicode_minus']=False
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn
from sklearn.preprocessing import LabelEncoder
np.set_printoptions(suppress=True)
pd.set_option('display.float_format', lambda x: '%.4f' % x)

二、用户注册日期分布

我们可以跟住用户注册日期的分布，去判断用户集中在开课前多久选择去注册这一门课。

module=sorted(regi["code_module"].unique().tolist())
presentation=sorted(regi["code_presentation"].unique().tolist())

这里需要注意的是，不是每一学期都会开展这七门课


fig,axes=plt.subplots(7,4,figsize=(14,14))
for m in range(len(module)):
    for p in range(len(presentation)):
        if len(regi[(regi["code_module"]==module[m])&(regi["code_presentation"]==presentation[p])]["date_registration"])==0:
            axes[m][p].axis('off')
        else:
            sns.distplot(regi[(regi["code_module"]==module[m])&(regi["code_presentation"]==presentation[p])]["date_registration"],norm_hist=True,ax=axes[m][p])
            axes[m][p].set_title(module[m]+" in "+presentation[p])

plt.tight_layout()

《[Python]实现用户注册数据的分析,计算复购率,回购率注销率等--在线教育行业分析案例连载2》

从分布图来看，大部分学生都在开学前完成注册，并在开学前50天至开学当天，会有一个注册的小高峰。

二、用户延迟注册率

即date_unregistration>0，在开学后注册的学生比率。

对数据进行聚合处理一下，用groupby的方法计算每学期每门课的每一天都有多少人注册。

line_plot_df=regi.groupby(["code_presentation","code_module","date_registration"]).agg({ "id_student":pd.Series.nunique}).reset_index().sort_values(by=["code_presentation","code_module","date_registration"])

结果如下：
《[Python]实现用户注册数据的分析,计算复购率,回购率注销率等--在线教育行业分析案例连载2》


fig,axes=plt.subplots(7,4,figsize=(16,16))
for m in range(len(module)):
    for p in range(len(presentation)):
    #因为不是每学期都有7门课
        if len(line_plot_df[(line_plot_df["code_module"]==module[m])&(line_plot_df["code_presentation"]==presentation[p])]["date_registration"])==0:
        #该学期没有这门课的时候取消边框
            axes[m][p].axis('off')
        else:
            axes[m][p].plot(line_plot_df[(line_plot_df["code_module"]==module[m])&(line_plot_df["code_presentation"]==presentation[p])]["date_registration"], line_plot_df[(line_plot_df["code_module"]==module[m])&(line_plot_df["code_presentation"]==presentation[p])]["id_student"],alpha = 0.8)
            #对子图做标题
            axes[m][p].set_title(module[m]+" in "+presentation[p])
            #对0界限做预警线
            axes[m][p].axvline(x= 0, color = 'grey',linestyle=":")
            #计算延迟注册率
            yanchilv=line_plot_df[(line_plot_df["code_module"]==module[m])&(line_plot_df["code_presentation"]==presentation[p])&(line_plot_df["date_registration"]>0)]["id_student"].sum()/line_plot_df[(line_plot_df["code_module"]==module[m])&(line_plot_df["code_presentation"]==presentation[p])]["id_student"].sum()
            #在对应的子图标注数据大于0的比例，即该学期该门课开学后才注册的比例
            axes[m][p].annotate('>0:\n{:.2%}'.format(yanchilv),
                    xy=(1,1),
                    xytext=(1, 0.5),  # 0.5 points vertical offset
                    textcoords="offset points",
                    ha='center', va='bottom')
            

plt.tight_layout()

《[Python]实现用户注册数据的分析,计算复购率,回购率注销率等--在线教育行业分析案例连载2》

这个图与用户注册日期分布有点像，但是具体到总数，并且能看到具体的波动也标了延迟注册率。

B学期学生注册时间间隔比J学期长,B学期是开课前300天就有人报名了,而J学期是开课前200-150

B学期开课前300到200天都有人报名但是人数并不多,波动也不大,可能会造成人力核算统计等的成本提高,可以考虑把开放报名时间缩短

从课程角度来看，BBB和DDD课程，尾部数据的线较长，即有学生延迟注册的时间比较长，这两门课需要注意提醒学生尽快注册

从学期角度来看，2014J学期除了DDD课程和FFF课程，延迟注册率都是最高的，需要找到原因，可以结合学生扶沟等情况查看是否学生不愿意再购课

三、用户流失率

用户流失率即用户注册后又注销，同理先对数据进行处理做聚合计算

churnRate=regi.groupby(["code_presentation","code_module"]).agg({ "date_registration":"count","date_unregistration":"count"}).reset_index()
churnRate["churnRate"]=churnRate["date_unregistration"]/churnRate["date_registration"]

示例如下：
《[Python]实现用户注册数据的分析,计算复购率,回购率注销率等--在线教育行业分析案例连载2》


fig,axes=plt.subplots(figsize=(10,6))
ChurnRateBarplot=sns.barplot(x="code_presentation", y="churnRate", hue="code_module",data=churnRate.sort_values(by=["code_presentation","code_module"]))

for p in ChurnRateBarplot.patches:
    if str(p.get_height())!="nan":
        ChurnRateBarplot.annotate("{:.0%}".format(p.get_height()), (p.get_x() + p.get_width() / 2., p.get_height()), ha = 'center', va = 'center', xytext = (0,6), textcoords = 'offset points')



plt.legend(loc=0,bbox_to_anchor=(1, 0.8),title="code_module",frameon=False)
plt.title("流失率")

plt.tight_layout()
plt.show()

《[Python]实现用户注册数据的分析,计算复购率,回购率注销率等--在线教育行业分析案例连载2》

这里比较明显看出CCC这门课程是2014年的新开课程，并且用户注销率很高，在两个学期都高于40%，建议对用户做问卷调研，探究用户流失注销的原因。

整体来说GGG这门课注销率都比较低，但也在逐学期升高，需要做好精细化运营，挽回用户。

其他课程的注销率变化比较稳定，但是稳中有升，用户注销对教育产品的损失非常大，可以结合老师信息、学生成绩信息、竞品信息等进行归因分析。

五、用户复购率

定义：复购率即在某时间窗口内重复消费用户（消费两次及以上的用户）在总消费用户中占比。

这里以一学期为单位，计算重复购买课程的用户。

同时这里也可以计算每学期每门课购买用户的人数。同样先用到groupby。

fugou=regi.groupby(["id_student","code_presentation"]).agg({ "code_module":"count"}).reset_index().sort_values(by=["code_presentation","code_module"], ascending=[True,False])

《[Python]实现用户注册数据的分析,计算复购率,回购率注销率等--在线教育行业分析案例连载2》

接着再选择用crosstab()，这个函数有margins的功能十分实用，跟excel的透视表比较相似

fugou_df=pd.crosstab(fugou["code_presentation"],fugou["code_module"],margins=True)

#计算复购率：
fugou_df["fugoulv"]=(fugou_df["All"]-fugou_df[1])/fugou_df["All"]

《[Python]实现用户注册数据的分析,计算复购率,回购率注销率等--在线教育行业分析案例连载2》

做到这里，已经对每学期，学生购买1门课、2门课、3门课的分布十分清晰了。这些绘制图，可视化数据。

fig,ax=plt.subplots(1,2,figsize=(12,5))

cnt_module=sns.countplot(x="code_module", hue="code_presentation", data=fugou,alpha=0.8,ax=ax[0])
plt.legend(loc=0,bbox_to_anchor=(1, 0.8),title="code_module",frameon=False)
#标注数据
for p in range(len(cnt_module.patches)):
    for label in range(len(labels)):
        if p==label:
            patch=cnt_modile.patches[p]
            ax[0].annotate('{}'.format(labels[p]),
            xy=(patch.get_x()+ patch.get_width() / 2,patch.get_height()),
            xytext=(1, 0.5),
            textcoords="offset points",
            ha='center', va='bottom')

ax[0].set_title("课程数")
ax[0].set_xlabel("学生购买课程数分布图")
ax[0].set_ylabel("购买学生人数")
ax[0].legend(loc='upper right',frameon=False)


fugoulv_data=fugou_df.loc[fugou_df.index.tolist()[:-1],"fugoulv"].values.tolist()

fugoulv_df_data=sns.barplot(x=fugou_df.index.tolist()[:-1],y=fugoulv_data,alpha = 0.8,ax=ax[1])

#标注数据
for f in range(len(fugoulv_data)):
    for p in range(len(fugoulv_df_data.patches)):
        if f==p:
            patch=fugoulv_df_data.patches[p]
            plt.annotate('{:.2%}'.format(fugoulv_data[f]),
            xy=(patch.get_x()+ patch.get_width()/ 2,patch.get_height()),
            xytext=(0, 0.5),
            textcoords="offset points",
            ha='center', va='bottom')


ax[1].set_title("复购率")
ax[1].set_xlabel("课程")

plt.tight_layout()
plt.show()

《[Python]实现用户注册数据的分析,计算复购率,回购率注销率等--在线教育行业分析案例连载2》
J学期的注册人数比B学期要多，并且2014年的B、J学期对比与2013年的B、J学期分别都有上涨。

复购率来说，2014年比2013年有明显增长。但在最近的两学期有所下降。

六、用户回购率

定义：是某一个时间窗口内消费的用户，在下一个时间窗口仍旧消费的占比。

在这里指用户在上个学期使用该产品购买过课程，下学期继续使用该产品购买。不限制是哪一门课。

这里用crosstab看每个学期每个学生购买的数量

huigou=pd.crosstab(regi["id_student"],regi["code_presentation"]).reset_index()

《[Python]实现用户注册数据的分析,计算复购率,回购率注销率等--在线教育行业分析案例连载2》

如下算法的依据是，若学生A在2013B和2013J都购买了课程，那么学生A的这一行记录里，2013B列和2013J列均为1或以上，则相加大于1。

huigou["2013J_huigou"]=huigou.apply(lambda x:1 if x["2013J"]+x["2013B"] >1 else 0, axis=1)

huigou["2014B_huigou"]=huigou.apply(lambda x:1 if x["2013J"]+x["2014B"] >1 else 0, axis=1)

huigou["2014J_huigou"]=huigou.apply(lambda x:1 if x["2014B"]+x["2014J"] >1 else 0, axis=1)

《[Python]实现用户注册数据的分析,计算复购率,回购率注销率等--在线教育行业分析案例连载2》

接着用字典循环计算以上表的2013J_huigou至2014Jhuigou，把里面=1的数（即回购人数）除以对应的前一学期大于0的数（即前一学期的报名人数）

huigou_rate={ }
for i in range(5,8):
    huigou_rate[huigou.columns[i]]=[huigou[huigou[huigou.columns[i]]==1].shape[0]/huigou[huigou[huigou.columns[i-4]]>0].shape[0]]


huigou_df=pd.DataFrame(huigou_rate)
huigou_df

《[Python]实现用户注册数据的分析,计算复购率,回购率注销率等--在线教育行业分析案例连载2》

用户回购率有在不断上升，结合用户复购率都是一个比较用户经营向好的数据，但是回顾流失率逐学期增高，还需重点找到流失的问题原因所在。

    原文作者：今天我有更博学吗？
    原文地址: https://blog.csdn.net/LYY1045691954/article/details/110311583
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。

文章目录

前言

一、导入库

二、 用户注册日期分布

二、用户延迟注册率

三、用户流失率

五、用户复购率

六、用户回购率

二、用户注册日期分布