hive-sql高能技巧

1. get_json_object

示例:获取json中的sale_price字段
get_json_object(detail_json,'$.sale_price')

2. sum(case when…then…else end)

示例:获取第7天的总销售额
sum(case when by_day=7 then pay_amt else 0 end)

3.count(case when…then…else end)

示例:获取第7天的下单用户数
count(distinct case when by_day=7 then user_id end) as day_7,

4.min(case when…then…else end)

示例:获取vip用户下的第1单
min(case when is_vip=1 then order_dt end) ,

5.row_number() over([partition by col1] order by col2)

示例:获取订单中每个订单是用户下的第几单
row_number() over (partition by user_id order by order_time asc) as order_cnt

除Row_number外还有rank,dense_rank 

以下是语法: 
rank() over([partition by col1] order by col2) 
dense_rank() over([partition by col1] order by col2) 
row_number() over([partition by col1] order by col2)

未完待续。。

row_number():按行计数

《hive-sql高能技巧》 row_number() 图自:https://www.cnblogs.com/ianunspace/p/5057333.html

rank()同排名则跳过计数

《hive-sql高能技巧》 rank() 图自:https://www.cnblogs.com/ianunspace/p/5057333.html

dense_rank()同排名则合并计数

《hive-sql高能技巧》 dense_rank() 图自:https://www.cnblogs.com/ianunspace/p/5057333.html

6.lag(,) over([partition by col1] order by col2)

示例:4月1日-4月10日连续5天下单的人数
lag(order_dt,5) over(partition by user_id order by order_dt):找到按照user_id分组后间隔往前第5个日期。把order_dt跟它相减,如果是=5,说明正好连续下单5天,如果null,说明连续下单不满5天,如果>5,说明中间有间断无单的日期。

--连续5天下单
with 
base_data as(
    select 
    distinct
    user_id,
    order_dt
    from order_tb
    where order_dt between '20190401' and '20190410'
),
res1 as (
    select 
    user_id,
    order_dt,
    datediff(order_dt,lag(order_dt,5) over(partition by user_id order by order_dt)) as diff
    from base_data
)

select
count(distinct user_id) as num
from res1
where diff>=5
    原文作者:马淑
    原文地址: https://www.jianshu.com/p/ec555bfd47f0
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞