hive级联累计

根据访问次数统计表,得到累计访问总计

  • 建表,load数据
t_access_times.dat
A,2015-01,5
A,2015-01,15
B,2015-01,5
A,2015-01,8
B,2015-01,25
A,2015-01,5
A,2015-02,4
A,2015-02,6
B,2015-02,10
B,2015-02,5

create table t_access_times(username string,month string,salary int)
row format delimited fields terminated by ',';

load data local inpath '/home/hadoop/t_access_times.dat' into table t_access_times;
+--------------------------+-----------------------+------------------------+--+
| t_access_times.username  | t_access_times.month  | t_access_times.salary  |
+--------------------------+-----------------------+------------------------+--+
| A                        | 2015-01               | 5                      |
| A                        | 2015-01               | 15                     |
| B                        | 2015-01               | 5                      |
| A                        | 2015-01               | 8                      |
| B                        | 2015-01               | 25                     |
| A                        | 2015-01               | 5                      |
| A                        | 2015-02               | 4                      |
| A                        | 2015-02               | 6                      |
| B                        | 2015-02               | 10                     |
| B                        | 2015-02               | 5                      |
+--------------------------+-----------------------+------------------------+--
  • 求每个用户的月总金额
select username,month,sum(salary) from t_access_times group by username,month;
+-----------+----------+------+--+
| username  |  month   | _c2  |
+-----------+----------+------+--+
| A         | 2015-01  | 33   |
| A         | 2015-02  | 10   |
| B         | 2015-01  | 30   |
| B         | 2015-02  | 15   |
+-----------+----------+------+--+
  • 把表自己inner join
select a.*,b.* from
(select username,month,sum(salary) as salary from t_access_times group by username,month) A 
inner join 
(select username,month,sum(salary) as salary from t_access_times group by username,month) B
on
A.username=B.username
+-------------+----------+-----------+-------------+----------+-----------+--+
| a.username  | a.month  | a.salary  | b.username  | b.month  | b.salary  |
+-------------+----------+-----------+-------------+----------+-----------+--+
| A           | 2015-01  | 33        | A           | 2015-01  | 33        |
| A           | 2015-01  | 33        | A           | 2015-02  | 10        |
| A           | 2015-02  | 10        | A           | 2015-01  | 33        |
| A           | 2015-02  | 10        | A           | 2015-02  | 10        |
| B           | 2015-01  | 30        | B           | 2015-01  | 30        |
| B           | 2015-01  | 30        | B           | 2015-02  | 15        |
| B           | 2015-02  | 15        | B           | 2015-01  | 30        |
| B           | 2015-02  | 15        | B           | 2015-02  | 15        |
+-------------+----------+-----------+-------------+----------+-----------+--+
  • 生成累计值
select a.username,a.month,max(a.salary) as salary,sum(b.salary) as accumulate from
(select username,month,sum(salary) as salary from t_access_times group by username,month) A inner join (select username,month,sum(salary) as salary from t_access_times group by username,month) B on a.username=b.username
where b.month <= a.month
group by a.username,a.month
order by a.username,a.month;

+-------------+----------+---------+-------------+--+
| a.username  | a.month  | salary  | accumulate  |
+-------------+----------+---------+-------------+--+
| A           | 2015-01  | 33      | 33          |
| A           | 2015-02  | 10      | 43          |
| B           | 2015-01  | 30      | 30          |
| B           | 2015-02  | 15      | 45          |
+-------------+----------+---------+-------------+

分组查询求月累计值。
为什么要max(salary)?
salary不是分组字段,只能由聚合函数求得,不然不知道选哪个。sum avg max

最后order by 使全局有序,原始数据无序,最后有可能无序。

    原文作者:pamperxg
    原文地址: https://www.jianshu.com/p/2ad4aa7d07fc
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞