Hive窗口函数05-GROUPING SETS、GROUPING__ID、CUBE、ROLLUP

Hive窗口函数GROUPING SETS、GROUPING__ID、CUBE、ROLLUP入门

1. 数据说明

现有 hive 表 cookie5, 内容如下:

hive> select * from cookie5;

cookie5.month   cookie5.day     cookie5.cookieid
2015-03         2015-03-10      cookie1
2015-03         2015-03-10      cookie5
2015-03         2015-03-12      cookie7
2015-04         2015-04-12      cookie3
2015-04         2015-04-13      cookie2
2015-04         2015-04-13      cookie4
2015-04         2015-04-16      cookie4
2015-03         2015-03-10      cookie2
2015-03         2015-03-10      cookie3
2015-04         2015-04-12      cookie5
2015-04         2015-04-13      cookie6
2015-04         2015-04-15      cookie3
2015-04         2015-04-15      cookie2
2015-04         2015-04-16      cookie1
  • 其中字段意义:
    month(string), day(string), cookieid(string)
  • 分别代表:
    月份, 日期, cookieid

2. grouping sets()和grouping__id

在一个group by查询中,根据不同的维度组合进行聚合,等价于将不同维度的group by结果集进行union all
grouping__id,表示结果属于哪一个分组集合
注意:
grouping__id中的__是两个下划线
grouping__id需要与grouping sets()或cube或rollup一起使用,否则会报错

select month, day, count(distinct cookieid) as uv
from cookie5 
group by month,day 
grouping sets (month,day);

结果:
month   day             uv
NULL    2015-03-10      4
NULL    2015-03-12      1
NULL    2015-04-12      2
NULL    2015-04-13      3
NULL    2015-04-15      2
NULL    2015-04-16      2
2015-03 NULL            5
2015-04 NULL            6

结果分析:

《Hive窗口函数05-GROUPING SETS、GROUPING__ID、CUBE、ROLLUP》

select month, day, count(distinct cookieid) as uv, grouping__id 
from cookie5 
group by month,day 
grouping sets (month,day,(month,day)) 
order by grouping__id;

结果:
month   day         uv  grouping__id
2015-03 2015-03-10  4   0
2015-04 2015-04-16  2   0
2015-04 2015-04-13  3   0
2015-04 2015-04-12  2   0
2015-04 2015-04-15  2   0
2015-03 2015-03-12  1   0
2015-03 NULL        5   1
2015-04 NULL        6   1
NULL    2015-04-16  2   2
NULL    2015-04-15  2   2
NULL    2015-04-13  3   2
NULL    2015-04-12  2   2
NULL    2015-03-12  1   2
NULL    2015-03-10  4   2

3. cube

根据group by的维度的所有组合进行聚合

select  month, day, count(distinct cookieid) as uv, grouping__id 
from cookie5 
group by month,day 
with cube 
order by grouping__id;

结果:
month   day         uv  grouping__id
2015-03 2015-03-10  4   0
2015-04 2015-04-16  2   0
2015-04 2015-04-13  3   0
2015-04 2015-04-12  2   0
2015-04 2015-04-15  2   0
2015-03 2015-03-12  1   0
2015-03 NULL        5   1
2015-04 NULL        6   1
NULL    2015-04-16  2   2
NULL    2015-04-15  2   2
NULL    2015-04-13  3   2
NULL    2015-04-12  2   2
NULL    2015-03-12  1   2
NULL    2015-03-10  4   2
NULL    NULL        7   3

等价于:
select month, day, count(distinct cookieid) as uv, 0 as grouping__id from cookie5 group by month, day
union all
select month, null, count(distinct cookieid) as uv, 1 as grouping__id from cookie5 group by month 
union all 
select null, day, count(distinct cookieid) as uv, 2 as grouping__id from cookie5 group by day
union all 
select null, null, count(distinct cookieid) as uv, 3 as grouping__id from cookie5;

4. rollup

是cube的子集,以最左侧的维度为主,从该维度进行层级聚合

--(1) 以month维度进行层级聚合(月天的uv、月的uv、总uv)
select month, day, count(distinct cookieid) as uv, grouping__id  
from cookie5 
group by month,day 
with rollup 
order by grouping__id;

结果:
month   day         uv      grouping__id
2015-04 2015-04-16  2       0
2015-04 2015-04-15  2       0
2015-04 2015-04-13  3       0
2015-04 2015-04-12  2       0
2015-03 2015-03-12  1       0
2015-03 2015-03-10  4       0
2015-04 NULL        6       1
2015-03 NULL        5       1
NULL    NULL        7       3

--(2) 以day维度进行层级聚合(天月的uv、天的uv、总uv)
select day, month, count(distinct cookieid) as uv, grouping__id  
from cookie5 
group by day,month 
with rollup 
order by grouping__id;

结果:
day         month       uv  grouping__id
2015-04-16  2015-04     2   0
2015-04-15  2015-04     2   0
2015-04-13  2015-04     3   0
2015-04-12  2015-04     2   0
2015-03-12  2015-03     1   0
2015-03-10  2015-03     4   0
2015-04-12  NULL        2   1
2015-04-16  NULL        2   1
2015-03-10  NULL        4   1
2015-04-15  NULL        2   1
2015-03-12  NULL        1   1
2015-04-13  NULL        3   1
NULL        NULL        7   3

参考文章:Hive分析窗口函数(五) GROUPING SETS,GROUPING__ID,CUBE,ROLLUP

    原文作者:CoderJed
    原文地址: https://www.jianshu.com/p/885eaeabbf3a
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞