较之于
group by
,开窗函数over
的好处在于:over
返回的是group by
之后再join
的结果。也就是说,over
返回的大小和原表格应该是一致的,且能够获取到除了group by
之外的column。
NOTICE:
- order by 执行顺序很靠后,在row_number后执行;
- row_number() over 中,partition by、order by可以只有其一,或者都有,或者都没有,都没有错
create table temp.orders_temp(name VARCHAR(20), cate int, s int, d int);
insert into orders_temp values
("a", 1, 1, 13),
("b", 2, 2, 112),
("c", 3, 3, 12),
("d", 2, 4, 1),
("e", 1, 5, 3),
("f", 3, 6, 4),
("g", 2, 7, 76),
("h", 3, 8, 5),
("i", 2, 9, 1);
select
name as nm,
cate as ct,
s,
d,
max(s) over() as a,
max(s) over(partition by cate) b,
max(d) over(partition by name) c,
sum(s) over(partition by cate) d,
sum(d) over(partition by name) e,
sum(s) over(partition by cate order by s) f,
sum(s) over(partition by cate order by s rows between unbounded preceding and unbounded following) g,
sum(s) over(partition by cate order by s rows between unbounded preceding and current row) h,
sum(s) over(partition by cate order by d) i,
sum(s) over(partition by cate order by d rows between unbounded preceding and unbounded following) j,
sum(s) over(partition by cate order by d rows between unbounded preceding and current row) k,
rank() over(partition by cate order by s) l,
row_number() over() m,
row_number() over(partition by cate order by s) n,
row_number() over(partition by cate order by d) o
from temp.orders_temp;
nm ct s d a b c d e f g h i j k l m n o
a 1 1 13 9 5 13 6 13 1 6 1 6 6 6 1 1 1 2
e 1 5 3 9 5 3 6 3 6 6 6 5 6 5 2 5 2 1
b 2 2 112 9 9 112 22 112 2 22 2 22 22 22 1 2 1 4
d 2 4 1 9 9 1 22 1 6 22 6 13 22 4 2 4 2 1
g 2 7 76 9 9 76 22 76 13 22 13 20 22 20 3 7 3 3
i 2 9 1 9 9 1 22 1 22 22 22 13 22 13 4 9 4 2
c 3 3 12 9 8 12 17 12 3 17 3 17 17 17 1 3 1 3
f 3 6 4 9 8 4 17 4 9 17 9 6 17 6 2 6 2 1
h 3 8 5 9 8 5 17 5 17 17 17 14 17 14 3 8 3 2
over
总结:
-
over()
是对全局进行操作; -
over(partition by oo)
是将数据集按oo
的值不同切分成若干组,分别对每个组整组(从头至尾)进行操作,相当于省略了rows between unbounded preceding and unbounded following
; -
over(partition by oo order by xx)
是oo
切分好的组,按照xx
排序后,对当前组当前“值”(而不是“行” (current row
),对比i
和k
):当分组里的xx
列都是不重复的值,此时,相当于省略了rows between unbounded preceding and current row
,如h
和f
;当分组里的xx
列出现重复的值,相同的值 是最小的操作单位,而rows between unbounded preceding and current row
严格按照当前行进行操作。
对比over
中 order by
和rows between unbounded preceding and current row
:
-
order by
是 按值操作,如果省略rows
,则是从该组第一行到当前“值”(如果有重复,则是到相同值的最后一行,比如i
); -
rows
是 按行操作。
数据切片
f
和i
nm ct s d f i
a 1 1 13 1 6
e 1 5 3 6 5
b 2 2 112 2 22
d 2 4 1 6 13
g 2 7 76 13 20
i 2 9 1 22 13
c 3 3 12 3 17
f 3 6 4 9 6
h 3 8 5 17 14
i
和k
nm ct s d i k
a 1 1 13 6 6
e 1 5 3 5 5
b 2 2 112 22 22
d 2 4 1 13 4
g 2 7 76 20 20
i 2 9 1 13 13
c 3 3 12 17 17
f 3 6 4 6 6
h 3 8 5 14 14
d
按升序排序后的i
ct s d i
1 5 3 5
1 1 13 6
2 4 1 13
2 9 1 13
2 7 76 20
2 2 112 22
3 6 4 6
3 8 5 14
3 3 12 17
d
按升序排序后的k
ct s d k
1 5 3 5
1 1 13 6
2 4 1 4
2 9 1 13
2 7 76 20
2 2 112 22
3 6 4 6
3 8 5 14
3 3 12 17