HiveQL实现累积求和

1.需求
有如下访客访问次数的统计表 t_access

访客  月份  访问次数  
A   2015-01 5  
A   2015-01 15  
B   2015-01 5  
A   2015-01 8  
B   2015-01 25  
A   2015-01 5  
A   2015-02 4  
A   2015-02 6  
B   2015-02 10  
B   2015-02 5  
……  ……  ……  

要求输出每个客户在每个月的总访问次数,以及在当前月份之前所有月份的累积访问次数。
输出表

访客  月份  月访问总计   累计访问总计  
A   2015-01 33  33  
A   2015-02 10  43  
……. ……. ……. …….  
B   2015-01 30  30  
B   2015-02 15  45  
……. ……. ……. …….  

2.思路

1)第一步,先求每个用户的月总访问次数

select username,month,sum(count) as salary from t_access_times group by username,month

+-----------+----------+---------+--+  
| username  |  month   | count   |  
+-----------+----------+---------+--+  
| A         | 2015-01  | 33      |  
| A         | 2015-02  | 10      |  
| B         | 2015-01  | 30      |  
| B         | 2015-02  | 15      |  
+-----------+----------+---------+--+  

2)第二步,将月总访问次数表 自己连接 自己连接(内连接)

(select username,month,sum(count) as salary from t_access_times group by username,month) A 
join 
(select username,month,sum(count) as salary from t_access_times group by username,month) B
on 
A.username=B.username

+-------------+----------+-----------+-------------+----------+-----------+--+  
| a.username  | a.month  | a.salary  | b.username  | b.month  | b.salary  |  
+-------------+----------+-----------+-------------+----------+-----------+--+  
| A           | 2015-01  | 33        | A           | 2015-01  | 33        |  
| A           | 2015-01  | 33        | A           | 2015-02  | 10        |  
| A           | 2015-02  | 10        | A           | 2015-01  | 33        |  
| A           | 2015-02  | 10        | A           | 2015-02  | 10        |  
| B           | 2015-01  | 30        | B           | 2015-01  | 30        |  
| B           | 2015-01  | 30        | B           | 2015-02  | 15        |  
| B           | 2015-02  | 15        | B           | 2015-01  | 30        |  
| B           | 2015-02  | 15        | B           | 2015-02  | 15        |  
+-------------+----------+-----------+-------------+----------+-----------+--+  

3)第三步,从上一步的结果中进行分组查询,分组的字段是a.username a.month,求月累计值: 将b.month <= a.month的所有b.salary求和即可

3.HQL

select A.username,A.month,max(A.count) ,sum(B.count)   
from   
(select username,month,sum(count) as count from t_accessgroup by username,month) A   
inner join   
(select username,month,sum(count) as count from t_access group by username,month) B  
on  
A.username=B.username  
where B.month <= A.month  
group by A.username,A.month  
order by A.username,A.month;  
    原文作者:石晓文的学习日记
    原文地址: https://www.jianshu.com/p/c2ec67144673
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞