Oracle SQL高级编程——分析函数(窗口函数)全面讲解
注:本文来源于:《Oracle SQL高级编程——分析函数(窗口函数)全面讲解》
概述
分析函数是以一定的方法在一个与当前行相关的结果子集中进行计算,也称为窗口函数。
一般结构为
Function(arg1 , arg2 ……) over(partition by clause order by clause windowing clause ) Windowing clause : rows | range between start_expr and end_expr Start_expr is unbounded preceding | current row | n preceding | n following End_expr is unbounded following | current row | n preceding | n following
不是所有的分析函数都支持开窗子句。
创建测试表
create table sales_fact as select country_name country , country_subregion region , prod_name product , calendar_year year , calendar_week_number week , sum(amount_sold) sale , sum(amount_sold* (case when mod(rownum , 10 ) = 0 then 1.4 when mod(rownum , 5)= 0 then 0.6 when mod(rownum , 2)= 0 then 0.9 when mod(rownum , 2)=1 then 1.2 else 1 end ) ) receipts from sales , times , customers , countries , products where sales.time_id = times.time_id and sales.prod_id = products.prod_id and sales.cust_id = customers.cust_id and customers.country_id = countries.country_id group by country_name , country_subregion , prod_name , calendar_year , calendar_week_number ;
把聚合函数当作分析函数使用
分析函数列只是一列数值,每一行对应一个值,对于查询的其它方面没有任何影响。
从以下查询可以得出以下几点:
1.over分区条件中的列可以不在select列表中,但是必须在数据源中。
2.over排序条件中的列可以不在select列表中,但是必须在数据源中。
3.over排序条件是对所在分区中的数据进行排序,与select语句中的排序无关。但是会影响到分析函数的结果。
4.over中的开窗条件的范围一般仅限于分区本身。rows between unbounded preceding and current row表示从分区的最开始到当前行。
5.分析函数的数据来自结果集(施加了where条件之后的)。下面的查询中的分析列表示该年从开始到该周的销售累计。
SH@ prod> select year , week , sale , sum(sale) over( partition by region , year order by week rows between unbounded preceding and current row ) running_sum_ytd from sales_fact where country in ('Australia') and product='Xtend Memory' and week < 10 order by year , week ; YEAR WEEK SALE RUNNING_SUM_YTD ---------- ---------- ---------- --------------- 1998 1 58.15 58.15 1998 2 29.39 87.54 1998 3 29.49 117.03 1998 4 29.49 146.52 1998 5 29.8 176.32 1998 6 58.78 235.1 1998 9 58.78 293.88 1999 1 53.52 53.52 1999 3 94.6 148.12 1999 4 40.5 188.62 1999 5 80.01 268.63 1999 6 40.5 309.13 1999 8 103.11 412.24 1999 9 53.34 465.58 2000 1 46.7 46.7 2000 3 93.41 140.11 2000 4 46.54 186.65 2000 5 46.7 233.35 2000 7 70.8 304.15 2000 8 46.54 350.69 2001 1 92.26 92.26 2001 2 118.38 210.64 2001 3 47.24 257.88 2001 4 256.7 514.58 2001 5 93.44 608.02 2001 6 22.44 630.46 2001 7 69.96 700.42 YEAR WEEK SALE RUNNING_SUM_YTD ---------- ---------- ---------- --------------- 2001 8 46.06 746.48 2001 9 92.67 839.15 29 rows selected.结果与上面相同,只是排序不同方式,分析列看起来就没有规律了。
SH@ prod> select year , week , sale , sum(sale) over( partition by region , year order by week rows between unbounded preceding and current row ) running_sum_ytdfrom sales_fact where country in ('Australia') and product='Xtend Memory' and week < 10 order by year , sale ; YEAR WEEK SALE RUNNING_SUM_YTD ---------- ---------- ---------- --------------- 1998 2 29.39 87.54 1998 4 29.49 146.52 1998 3 29.49 117.03 1998 5 29.8 176.32 1998 1 58.15 58.15 1998 6 58.78 235.1 1998 9 58.78 293.88 1999 4 40.5 188.62 1999 6 40.5 309.13 1999 9 53.34 465.58 1999 1 53.52 53.52 1999 5 80.01 268.63 1999 3 94.6 148.12 1999 8 103.11 412.24 2000 4 46.54 186.65 2000 8 46.54 350.69 2000 1 46.7 46.7 2000 5 46.7 233.35 2000 7 70.8 304.15 2000 3 93.41 140.11 2001 6 22.44 630.46 2001 8 46.06 746.48 2001 3 47.24 257.88 2001 7 69.96 700.42 2001 1 92.26 92.26 2001 9 92.67 839.15 2001 5 93.44 608.02 YEAR WEEK SALE RUNNING_SUM_YTD ---------- ---------- ---------- --------------- 2001 2 118.38 210.64 2001 4 256.7 514.58 29 rows selected.分区中的排序选取不恰当,则分析列结果没有什么意义了。分区开窗排序的选取与分析列的结果密切相关。
SH@ prod> select year , week , sale , sum(sale) over( partition by region , year order by sale rows between unbounded preceding and current row ) running_sum_ytd from sales_fact where country in ('Australia') and product='Xtend Memory' and week < 10 order by year , week ; YEAR WEEK SALE RUNNING_SUM_YTD ---------- ---------- ---------- --------------- 1998 1 58.15 176.32 1998 2 29.39 29.39 1998 3 29.49 88.37 1998 4 29.49 58.88 1998 5 29.8 118.17 1998 6 58.78 235.1 1998 9 58.78 293.88 1999 1 53.52 187.86 1999 3 94.6 362.47 1999 4 40.5 40.5 1999 5 80.01 267.87 1999 6 40.5 81 1999 8 103.11 465.58 1999 9 53.34 134.34 2000 1 46.7 186.48 2000 3 93.41 350.69 2000 4 46.54 46.54 2000 5 46.7 139.78 2000 7 70.8 257.28 2000 8 46.54 93.08 2001 1 92.26 277.96 2001 2 118.38 582.45 2001 3 47.24 115.74 2001 4 256.7 839.15 2001 5 93.44 464.07 2001 6 22.44 22.44 2001 7 69.96 185.7 YEAR WEEK SALE RUNNING_SUM_YTD ---------- ---------- ---------- --------------- 2001 8 46.06 68.5 2001 9 92.67 370.63 29 rows selected.
分析函数的执行计划
虽然有分析函数还是只需要一次全表扫描,但是需要排序。
WINDOW SORT是分析函数的典型特征。
SH@ prod> explain plan for select year , week , sale , sum(sale) over( partition by region , year order by sale rows between unbounded preceding and current row ) running_sum_ytdfrom sales_fact where country in ('Australia') and product='Xtend Memory' and week < 10 order by year , week ; Explained. SH@ prod> select * from table(dbms_xplan.display()) ; PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------ Plan hash value: 173857439 ---------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 18 | 1890 | 311 (1)| 00:00:04 | | 1 | SORT ORDER BY | | 18 | 1890 | 311 (1)| 00:00:04 | | 2 | WINDOW SORT | | 18 | 1890 | 311 (1)| 00:00:04 | |* 3 | TABLE ACCESS FULL| SALES_FACT | 18 | 1890 | 309 (1)| 00:00:04 | ---------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("COUNTRY"='Australia' AND "PRODUCT"='Xtend Memory' AND "WEEK"<10) Note ----- - dynamic sampling used for this statement (level=2) 说明该表还没有统计信息。 20 rows selected.不加分析列,只是少了一步window sort。
SH@ prod> explain plan for 2 select year , week , sale 3 from sales_fact 4 where country in ('Australia') and product='Xtend Memory' and week < 10 5 order by year , week ; Explained. SH@ prod> select * from table(dbms_xplan.display()) ; PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------ Plan hash value: 1978576542 --------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 18 | 1584 | 310 (1)| 00:00:04 | | 1 | SORT ORDER BY | | 18 | 1584 | 310 (1)| 00:00:04 | |* 2 | TABLE ACCESS FULL| SALES_FACT | 18 | 1584 | 309 (1)| 00:00:04 | --------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("COUNTRY"='Australia' AND "PRODUCT"='Xtend Memory' AND "WEEK"<10) Note ----- - dynamic sampling used for this statement (level=2) 19 rows selected.
如何使窗口充满整个分区
SH@ prod> select year , week , sale , max(sale) over(partition by product , country , region , year 2 order by week 3 rows between unbounded preceding and unbounded following ) 4 max_sale 5 from sales_fact 6 where country in ( 'Australia') and product = 'Xtend Memory' and week < 10 7 order by product , country , year , week ; YEAR WEEK SALE MAX_SALE ---------- ---------- ---------- ---------- 1998 1 58.15 58.78 1998 2 29.39 58.78 1998 3 29.49 58.78 1998 4 29.49 58.78 1998 5 29.8 58.78 1998 6 58.78 58.78 1998 9 58.78 58.78 1999 1 53.52 103.11 1999 3 94.6 103.11 1999 4 40.5 103.11 1999 5 80.01 103.11 1999 6 40.5 103.11 1999 8 103.11 103.11 1999 9 53.34 103.11 2000 1 46.7 93.41 2000 3 93.41 93.41 2000 4 46.54 93.41 2000 5 46.7 93.41 2000 7 70.8 93.41 2000 8 46.54 93.41 2001 1 92.26 256.7 2001 2 118.38 256.7 2001 3 47.24 256.7 2001 4 256.7 256.7 2001 5 93.44 256.7 2001 6 22.44 256.7 2001 7 69.96 256.7 YEAR WEEK SALE MAX_SALE ---------- ---------- ---------- ---------- 2001 8 46.06 256.7 2001 9 92.67 256.7 29 rows selected.
两个边界都滑动的窗口
下面语句的窗口是往前两周,加往后两周,加当前周,一共五周。(到达边界时窗口会自动缩小)
SH@ prod> select year , week , sale , max(sale) over(partition by product , country , region , year 2 order by week 3 rows between 2 preceding and 2 following ) 4 max_sale 5 from sales_fact 6 where country in ( 'Australia') and product = 'Xtend Memory' and week < 10 7 order by product , country , year , week ; YEAR WEEK SALE MAX_SALE ---------- ---------- ---------- ---------- 1998 1 58.15 58.15 1998 2 29.39 58.15 1998 3 29.49 58.15 1998 4 29.49 58.78 1998 5 29.8 58.78 1998 6 58.78 58.78 1998 9 58.78 58.78 1999 1 53.52 94.6 1999 3 94.6 94.6 1999 4 40.5 94.6 1999 5 80.01 103.11 1999 6 40.5 103.11 1999 8 103.11 103.11 1999 9 53.34 103.11 2000 1 46.7 93.41 2000 3 93.41 93.41 2000 4 46.54 93.41 2000 5 46.7 93.41 2000 7 70.8 70.8 2000 8 46.54 70.8 这里只所以是70.8因为窗口缩小了。 2001 1 92.26 118.38 2001 2 118.38 256.7 2001 3 47.24 256.7 2001 4 256.7 256.7 2001 5 93.44 256.7 2001 6 22.44 256.7 2001 7 69.96 93.44 YEAR WEEK SALE MAX_SALE ---------- ---------- ---------- ---------- 2001 8 46.06 92.67 2001 9 92.67 92.67 29 rows selected.
默认窗口是什么?
一看便知。
SH@ prod> select year , week , sale , max(sale) over(partition by product , country , region , year 2 order by week ) 3 max_sale 4 from sales_fact 5 where country in ( 'Australia') and product = 'Xtend Memory' and week < 10 6 order by product , country , year , week ; YEAR WEEK SALE MAX_SALE ---------- ---------- ---------- ---------- 1998 1 58.15 58.15 1998 2 29.39 58.15 1998 3 29.49 58.15 1998 4 29.49 58.15 1998 5 29.8 58.15 1998 6 58.78 58.78 1998 9 58.78 58.78 1999 1 53.52 53.52 1999 3 94.6 94.6 1999 4 40.5 94.6 1999 5 80.01 94.6 1999 6 40.5 94.6 1999 8 103.11 103.11 1999 9 53.34 103.11 2000 1 46.7 46.7 2000 3 93.41 93.41 2000 4 46.54 93.41 2000 5 46.7 93.41 2000 7 70.8 93.41 2000 8 46.54 93.41 2001 1 92.26 92.26 2001 2 118.38 118.38 2001 3 47.24 118.38 2001 4 256.7 256.7 2001 5 93.44 256.7 2001 6 22.44 256.7 2001 7 69.96 256.7 YEAR WEEK SALE MAX_SALE ---------- ---------- ---------- ---------- 2001 8 46.06 256.7 2001 9 92.67 256.7 29 rows selected.
Lead和Lag(不支持开窗的函数)
有开窗语句时会报这样的错
2 3 4 rows between 2 preceding and 2 following ) * ERROR at line 3: ORA-00907: missing right parenthesisLEAD是求下一个,而不是前一个。在分区的下边界处,LEAD处回空值。
SH@ prod> select year , week , sale , lead(sale) over(partition by product , country , region , year 2 order by week ) 3 former_sale 4 from sales_fact 5 where country in ( 'Australia') and product = 'Xtend Memory' and week < 10 6 order by product , country , year , week ; YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 1998 1 58.15 29.39 1998 2 29.39 29.49 1998 3 29.49 29.49 1998 4 29.49 29.8 1998 5 29.8 58.78 1998 6 58.78 58.78 1998 9 58.78 1999 1 53.52 94.6 1999 3 94.6 40.5 1999 4 40.5 80.01 1999 5 80.01 40.5 1999 6 40.5 103.11 1999 8 103.11 53.34 1999 9 53.34 2000 1 46.7 93.41 2000 3 93.41 46.54 2000 4 46.54 46.7 2000 5 46.7 70.8 2000 7 70.8 46.54 2000 8 46.54 2001 1 92.26 118.38 2001 2 118.38 47.24 2001 3 47.24 256.7 2001 4 256.7 93.44 2001 5 93.44 22.44 2001 6 22.44 69.96 2001 7 69.96 46.06 YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 2001 8 46.06 92.67 2001 9 92.67 29 rows selected.LAG求上一个,也就是前一个。在分区的上边界处返回空值。
SH@ prod> select year , week , sale , lag(sale) over(partition by product , country , region , year 2 order by week ) 3 former_sale 4 from sales_fact 5 where country in ( 'Australia') and product = 'Xtend Memory' and week < 10 6 order by product , country , year , week ; YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 1998 1 58.15 1998 2 29.39 58.15 1998 3 29.49 29.39 1998 4 29.49 29.49 1998 5 29.8 29.49 1998 6 58.78 29.8 1998 9 58.78 58.78 1999 1 53.52 1999 3 94.6 53.52 1999 4 40.5 94.6 1999 5 80.01 40.5 1999 6 40.5 80.01 1999 8 103.11 40.5 1999 9 53.34 103.11 2000 1 46.7 2000 3 93.41 46.7 2000 4 46.54 93.41 2000 5 46.7 46.54 2000 7 70.8 46.7 2000 8 46.54 70.8 2001 1 92.26 2001 2 118.38 92.26 2001 3 47.24 118.38 2001 4 256.7 47.24 2001 5 93.44 256.7 2001 6 22.44 93.44 2001 7 69.96 22.44 YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 2001 8 46.06 69.96 2001 9 92.67 46.06 29 rows selected.
复杂的Lead和Lag
Lead和lag函数的第一参数为返回的列,第二参数为相隔行数(非负),第三个参数为不存在时的默认值(可以指定为当前行的值)。
SH@ prod> select year , week , sale , lag(sale , 2 , 0 ) over(partition by product , country , region , year 2 order by week ) 3 former_sale 4 from sales_fact 5 where country in ( 'Australia') and product = 'Xtend Memory' and week < 10 6 order by product , country , year , week ; YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 1998 1 58.15 0 1998 2 29.39 0 1998 3 29.49 58.15 1998 4 29.49 29.39 1998 5 29.8 29.49 1998 6 58.78 29.49 1998 9 58.78 29.8 1999 1 53.52 0 1999 3 94.6 0 1999 4 40.5 53.52 1999 5 80.01 94.6 1999 6 40.5 40.5 1999 8 103.11 80.01 1999 9 53.34 40.5 2000 1 46.7 0 2000 3 93.41 0 2000 4 46.54 46.7 2000 5 46.7 93.41 2000 7 70.8 46.54 2000 8 46.54 46.7 2001 1 92.26 0 2001 2 118.38 0 2001 3 47.24 92.26 2001 4 256.7 118.38 2001 5 93.44 47.24 2001 6 22.44 256.7 2001 7 69.96 93.44 YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 2001 8 46.06 22.44 2001 9 92.67 69.96 29 rows selected.将默认值指定为当前行的值。
SH@ prod> select year , week , sale , lag(sale , 2 , sale ) over(partition by product , country , region , year 2 order by week ) 3 former_sale 4 from sales_fact 5 where country in ( 'Australia') and product = 'Xtend Memory' and week < 10 6 order by product , country , year , week ; YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 1998 1 58.15 58.15 1998 2 29.39 29.39 1998 3 29.49 58.15 1998 4 29.49 29.39 1998 5 29.8 29.49 1998 6 58.78 29.49 1998 9 58.78 29.8 1999 1 53.52 53.52 1999 3 94.6 94.6 1999 4 40.5 53.52 1999 5 80.01 94.6 1999 6 40.5 40.5 1999 8 103.11 80.01 1999 9 53.34 40.5 2000 1 46.7 46.7 2000 3 93.41 93.41 2000 4 46.54 46.7 2000 5 46.7 93.41 2000 7 70.8 46.54 2000 8 46.54 46.7 2001 1 92.26 92.26 2001 2 118.38 118.38 2001 3 47.24 92.26 2001 4 256.7 118.38 2001 5 93.44 47.24 2001 6 22.44 256.7 2001 7 69.96 93.44 YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 2001 8 46.06 22.44 2001 9 92.67 69.96 29 rows selected.LEAD与LAG关于数据缺口的问题
LAG(sale , 10 ) 这表示与它相隔10行的数据,可是我想访问的10周前的数据。如果中间数据有缺口会出现严重的问题。
FIRST_VALUE和LAST_VALUE
这两个函数都可以与order by条件配合得到最大值和最小值。
First_value返回窗口中的第一个值。Ignore nulls表示忽略空值,如果第一个是空值返回第二个。SH@ prod> select year , week , sale , first_value(sale ignore nulls) over(partition by product , country , region , year 2 order by week 3 rows between unbounded preceding and unbounded following ) 4 former_sale 5 from sales_fact 6 where country in ( 'Australia') and product = 'Xtend Memory' and week < 10 7 order by product , country , year , week ; YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 1998 1 58.15 58.15 1998 2 29.39 58.15 1998 3 29.49 58.15 1998 4 29.49 58.15 1998 5 29.8 58.15 1998 6 58.78 58.15 1998 9 58.78 58.15 1999 1 53.52 53.52 1999 3 94.6 53.52 1999 4 40.5 53.52 1999 5 80.01 53.52 1999 6 40.5 53.52 1999 8 103.11 53.52 1999 9 53.34 53.52 2000 1 46.7 46.7 2000 3 93.41 46.7 2000 4 46.54 46.7 2000 5 46.7 46.7 2000 7 70.8 46.7 2000 8 46.54 46.7 2001 1 92.26 92.26 2001 2 118.38 92.26 2001 3 47.24 92.26 2001 4 256.7 92.26 2001 5 93.44 92.26 2001 6 22.44 92.26 2001 7 69.96 92.26 YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 2001 8 46.06 92.26 2001 9 92.67 92.26 29 rows selected.Last_value返回窗口中的最后一个值。Respect nulls表示识别空值,如果最后一个是空值也将其返回。
SH@ prod> select year , week , sale , last_value(sale respect nulls) over(partition by product , country , region , year 2 order by week 3 rows between unbounded preceding and unbounded following ) 4 former_sale 5 from sales_fact 6 where country in ( 'Australia') and product = 'Xtend Memory' and week < 10 7 order by product , country , year , week ; YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 1998 1 58.15 58.78 1998 2 29.39 58.78 1998 3 29.49 58.78 1998 4 29.49 58.78 1998 5 29.8 58.78 1998 6 58.78 58.78 1998 9 58.78 58.78 1999 1 53.52 53.34 1999 3 94.6 53.34 1999 4 40.5 53.34 1999 5 80.01 53.34 1999 6 40.5 53.34 1999 8 103.11 53.34 1999 9 53.34 53.34 2000 1 46.7 46.54 2000 3 93.41 46.54 2000 4 46.54 46.54 2000 5 46.7 46.54 2000 7 70.8 46.54 2000 8 46.54 46.54 2001 1 92.26 92.67 2001 2 118.38 92.67 2001 3 47.24 92.67 2001 4 256.7 92.67 2001 5 93.44 92.67 2001 6 22.44 92.67 2001 7 69.96 92.67 YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 2001 8 46.06 92.67 2001 9 92.67 92.67 29 rows selected.
NTH_VALUE访问分区别的任意指定行
FIRST_VALUE相当于NTH_VALUE(sale , 1 )或者NTH_VALUE(sale , 1 )from first respect nulls。
可以与排序配合求第几大,第几小。SH@ prod> select year , week , sale , nth_value(sale , 1 ) from last ignore nulls over(partition by product , country , region , year 2 order by week 3 rows between unbounded preceding and unbounded following ) 4 former_sale 5 from sales_fact 6 where country in ( 'Australia') and product = 'Xtend Memory' and week < 10 7 order by product , country , year , week ; YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 1998 1 58.15 58.78 1998 2 29.39 58.78 1998 3 29.49 58.78 1998 4 29.49 58.78 1998 5 29.8 58.78 1998 6 58.78 58.78 1998 9 58.78 58.78 1999 1 53.52 53.34 1999 3 94.6 53.34 1999 4 40.5 53.34 1999 5 80.01 53.34 1999 6 40.5 53.34 1999 8 103.11 53.34 1999 9 53.34 53.34 2000 1 46.7 46.54 2000 3 93.41 46.54 2000 4 46.54 46.54 2000 5 46.7 46.54 2000 7 70.8 46.54 2000 8 46.54 46.54 2001 1 92.26 92.67 2001 2 118.38 92.67 2001 3 47.24 92.67 2001 4 256.7 92.67 2001 5 93.44 92.67 2001 6 22.44 92.67 2001 7 69.96 92.67 YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 2001 8 46.06 92.67 2001 9 92.67 92.67 29 rows selected.
RANK函数(不能开窗,作用于整个分区)
必须有排序条件,rank就是根据order by条件中的列来定排名的。
RANK函数的排名中,如果出现并列,排名将不连续。
如:1 2(2) 4 5 6 7 8 9 。 如果有两个第二名,那么第三名就不存在了。
请注意空值,在排序子句中可以使用NULLS LAST来把空值放在最后面SH@ prod> select year , week , sale , rank() over(partition by product , country , region , year 2 order by sale ) 3 former_sale 4 from sales_fact 5 where country in ( 'Australia') and product = 'Xtend Memory' and week < 10 6 order by product , country , year , week ; YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 1998 1 58.15 5 没有3 1998 2 29.39 1 1998 3 29.49 2 1998 4 29.49 2 1998 5 29.8 4 1998 6 58.78 6 1998 9 58.78 6 1999 1 53.52 4 1999 3 94.6 6 1999 4 40.5 1 1999 5 80.01 5 1999 6 40.5 1 1999 8 103.11 7 1999 9 53.34 3 2000 1 46.7 3 2000 3 93.41 6 2000 4 46.54 1 2000 5 46.7 3 2000 7 70.8 5 2000 8 46.54 1 2001 1 92.26 5 2001 2 118.38 8 2001 3 47.24 3 2001 4 256.7 9 2001 5 93.44 7 2001 6 22.44 1 2001 7 69.96 4 YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 2001 8 46.06 2 2001 9 92.67 6 29 rows selected.
DENSE_RANK(与RANK的区别在于排名一是连续的)
SH@ prod> select year , week , sale , dense_rank() over(partition by product , country , region , year 2 order by sale ) 3 former_sale 4 from sales_fact 5 where country in ( 'Australia') and product = 'Xtend Memory' and week < 10 6 order by product , country , year , week ; YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 1998 1 58.15 4 第三名是存在的 1998 2 29.39 1 1998 3 29.49 2 1998 4 29.49 2 1998 5 29.8 3 1998 6 58.78 5 1998 9 58.78 5 1999 1 53.52 3 1999 3 94.6 5 1999 4 40.5 1 1999 5 80.01 4 1999 6 40.5 1 1999 8 103.11 6 1999 9 53.34 2 2000 1 46.7 2 2000 3 93.41 4 2000 4 46.54 1 2000 5 46.7 2 2000 7 70.8 3 2000 8 46.54 1 2001 1 92.26 5 2001 2 118.38 8 2001 3 47.24 3 2001 4 256.7 9 2001 5 93.44 7 2001 6 22.44 1 2001 7 69.96 4 YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 2001 8 46.06 2 2001 9 92.67 6 29 rows selected.
ROW_NUMBER(不支持开窗,不确定性函数)
为分区中的每一行指定一个递增的编号,如果排序的列的值相同,谁先谁后是随机的。
SH@ prod> select year , week , sale , row_number() over(partition by product , country , region , year 2 order by sale ) 3 former_sale 4 from sales_fact 5 where country in ( 'Australia') and product = 'Xtend Memory' and week < 10 6 order by product , country , year , sale ; YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 1998 2 29.39 1 1998 4 29.49 2 1998 3 29.49 3 1998 5 29.8 4 1998 1 58.15 5 1998 6 58.78 6 1998 9 58.78 7 1999 4 40.5 1 1999 6 40.5 2 1999 9 53.34 3 1999 1 53.52 4 1999 5 80.01 5 1999 3 94.6 6 1999 8 103.11 7 2000 4 46.54 1 2000 8 46.54 2 2000 5 46.7 3 2000 1 46.7 4 2000 7 70.8 5 2000 3 93.41 6 2001 6 22.44 1 2001 8 46.06 2 2001 3 47.24 3 2001 7 69.96 4 2001 1 92.26 5 2001 9 92.67 6 2001 5 93.44 7 YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 2001 2 118.38 8 2001 4 256.7 9 29 rows selected.
Ratio_to_report(当前行的值与分区总和的比值)
这个函数不支持排序和开窗。
求各周的销量在每年中的比例以及在整个产品销量中的比例。
SH@ prod> select year , week , sale , 2 trunc(100* ratio_to_report(sale) over(partition by year ) , 2) sales_yr , 3 trunc(100* ratio_to_report(sale) over() , 2 ) sales_prod 4 from sales_fact 5 where country in ('Australia') and product = 'Xtend Memory' and week < 10 6 order by year , week ; YEAR WEEK SALE SALES_YR SALES_PROD ---------- ---------- ---------- ---------- ---------- 1998 1 58.15 19.78 2.98 1998 2 29.39 10 1.5 1998 3 29.49 10.03 1.51 1998 4 29.49 10.03 1.51 1998 5 29.8 10.14 1.52 1998 6 58.78 20 3.01 1998 9 58.78 20 3.01 1999 1 53.52 11.49 2.74 1999 3 94.6 20.31 4.85 1999 4 40.5 8.69 2.07 1999 5 80.01 17.18 4.1 1999 6 40.5 8.69 2.07 1999 8 103.11 22.14 5.28 1999 9 53.34 11.45 2.73 2000 1 46.7 13.31 2.39 2000 3 93.41 26.63 4.79 2000 4 46.54 13.27 2.38 2000 5 46.7 13.31 2.39 2000 7 70.8 20.18 3.63 2000 8 46.54 13.27 2.38 2001 1 92.26 10.99 4.73 2001 2 118.38 14.1 6.07 2001 3 47.24 5.62 2.42 2001 4 256.7 30.59 13.16 2001 5 93.44 11.13 4.79 2001 6 22.44 2.67 1.15 2001 7 69.96 8.33 3.58 YEAR WEEK SALE SALES_YR SALES_PROD ---------- ---------- ---------- ---------- ---------- 2001 8 46.06 5.48 2.36 2001 9 92.67 11.04 4.75 29 rows selected.
Percent_rank(排在前百分之几)
用来求当前行的排名的相对百分位置。
比如你对人说自己是第10名,别人可能觉得没什么,如果是100000中的第10名,那就是前1/10000,那就非常牛了。
这个函数与RANK的推导公式为:
PERCENT_RANK = (RANK – 1) / (N – 1) , N代表总行数。
RANK – 1代表排名大于自己的人数。
N – 1代表除自己以外的总人数。
总体的意思是除自己之外的其它中人,排名比自己高的人所占的比例。
SH@ prod> select year , week , sale , rank() over(partition by product , country , region , year 2 order by sale ) 3 former_sale 4 from sales_fact 5 where country in ( 'Australia') and product = 'Xtend Memory' and week < 10 6 order by product , country , year , sale ; YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 1998 2 29.39 1 1998 4 29.49 2 1998 3 29.49 2 1998 5 29.8 4 1998 1 58.15 5 1998 6 58.78 6 1998 9 58.78 6 1999 4 40.5 1 1999 6 40.5 1 1999 9 53.34 3 1999 1 53.52 4 1999 5 80.01 5 1999 3 94.6 6 1999 8 103.11 7 2000 4 46.54 1 2000 8 46.54 1 2000 5 46.7 3 2000 1 46.7 3 2000 7 70.8 5 2000 3 93.41 6 2001 6 22.44 1 2001 8 46.06 2 2001 3 47.24 3 2001 7 69.96 4 2001 1 92.26 5 2001 9 92.67 6 2001 5 93.44 7 YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 2001 2 118.38 8 2001 4 256.7 9 29 rows selected. SH@ prod> select year , week , sale , 100*percent_rank() over(partition by product , country , region , year 2 order by sale ) 3 former_sale 4 from sales_fact 5 where country in ( 'Australia') and product = 'Xtend Memory' and week < 10 6 order by product , country , year , sale ; YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 1998 2 29.39 0 1998 4 29.49 16.6666667 1998 3 29.49 16.6666667 1998 5 29.8 50 1998 1 58.15 66.6666667 1998 6 58.78 83.3333333 1998 9 58.78 83.3333333 1999 4 40.5 0 1999 6 40.5 0 1999 9 53.34 33.3333333 1999 1 53.52 50 1999 5 80.01 66.6666667 1999 3 94.6 83.3333333 1999 8 103.11 100 2000 4 46.54 0 2000 8 46.54 0 2000 5 46.7 40 2000 1 46.7 40 2000 7 70.8 80 2000 3 93.41 100 2001 6 22.44 0 2001 8 46.06 12.5 2001 3 47.24 25 2001 7 69.96 37.5 2001 1 92.26 50 2001 9 92.67 62.5 2001 5 93.44 75 YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 2001 2 118.38 87.5 2001 4 256.7 100 29 rows selected.
Percentile_cont(大体意思求排在某个百分比时所需的数值)
也可以说是,现在说这样一个值,向分区里面插入这个值,其排名在百分之N(percent_rank为N%),求这个值。
如果有一个行的percent_rank正好等于N,那么就是这个么的值。如果没有匹配的,则要计算概率最大的。
SH@ prod> select year , week , sale , 2 percentile_cont(0.5) within group(order by sale desc )over(partition by year) pc , 3 percent_rank() over( partition by year order by sale desc ) pr 4 from sales_fact 5 where country in ('Australia') and product = 'Xtend Memory' and week < 11 ; YEAR WEEK SALE PC PR ---------- ---------- ---------- ---------- ---------- 1998 10 117.76 43.975 0 1998 9 58.78 43.975 .142857143 1998 6 58.78 43.975 .142857143 1998 1 58.15 43.975 .428571429 1998 5 29.8 43.975 .571428571 1998 3 29.49 43.975 .714285714 1998 4 29.49 43.975 .714285714 1998 2 29.39 43.975 1 1999 8 103.11 62.76 0 1999 3 94.6 62.76 .142857143 1999 5 80.01 62.76 .285714286 1999 10 72 62.76 .428571429 1999 1 53.52 62.76 .571428571 1999 9 53.34 62.76 .714285714 1999 6 40.5 62.76 .857142857 1999 4 40.5 62.76 .857142857 2000 3 93.41 46.7 0 2000 7 70.8 46.7 .2 2000 5 46.7 46.7 .4 2000 1 46.7 46.7 .4 2000 4 46.54 46.7 .8 2000 8 46.54 46.7 .8 2001 4 256.7 81.11 0 2001 2 118.38 81.11 .111111111 2001 5 93.44 81.11 .222222222 2001 9 92.67 81.11 .333333333 2001 1 92.26 81.11 .444444444 YEAR WEEK SALE PC PR ---------- ---------- ---------- ---------- ---------- 2001 7 69.96 81.11 .555555556 2001 10 69.05 81.11 .666666667 2001 3 47.24 81.11 .777777778 2001 8 46.06 81.11 .888888889 2001 6 22.44 81.11 1 32 rows selected.
Percentile_disc(功能与Percentile_cont大体相同)
区别在于这个函数取到的值一定是在这个分区的行中的。
如果没有匹配的,Percentile_disc会按照排序取上一个。SH@ prod> select year , week , sale , 2 percentile_disc(0.5) within group(order by sale desc )over(partition by year) pc , 3 percent_rank() over( partition by year order by sale desc ) pr 4 from sales_fact 5 where country in ('Australia') and product = 'Xtend Memory' and week < 11 ; YEAR WEEK SALE PC PR ---------- ---------- ---------- ---------- ---------- 1998 10 117.76 58.15 0 1998 9 58.78 58.15 .142857143 1998 6 58.78 58.15 .142857143 1998 1 58.15 58.15 .428571429 1998 5 29.8 58.15 .571428571 1998 3 29.49 58.15 .714285714 1998 4 29.49 58.15 .714285714 1998 2 29.39 58.15 1 1999 8 103.11 72 0 1999 3 94.6 72 .142857143 1999 5 80.01 72 .285714286 1999 10 72 72 .428571429 1999 1 53.52 72 .571428571 1999 9 53.34 72 .714285714 1999 6 40.5 72 .857142857 1999 4 40.5 72 .857142857 2000 3 93.41 46.7 0 2000 7 70.8 46.7 .2 2000 5 46.7 46.7 .4 2000 1 46.7 46.7 .4 2000 4 46.54 46.7 .8 2000 8 46.54 46.7 .8 2001 4 256.7 92.26 0 2001 2 118.38 92.26 .111111111 2001 5 93.44 92.26 .222222222 2001 9 92.67 92.26 .333333333 2001 1 92.26 92.26 .444444444 YEAR WEEK SALE PC PR ---------- ---------- ---------- ---------- ---------- 2001 7 69.96 92.26 .555555556 2001 10 69.05 92.26 .666666667 2001 3 47.24 92.26 .777777778 2001 8 46.06 92.26 .888888889 2001 6 22.44 92.26 1 32 rows selected. SH@ prod> select year , week , sale , 2 percentile_cont(0.5) within group(order by sale desc )over(partition by year) pc , 3 percent_rank() over( partition by year order by sale desc ) pr 4 from sales_fact 5 where country in ('Australia') and product = 'Xtend Memory' and week < 11 ; YEAR WEEK SALE PC PR ---------- ---------- ---------- ---------- ---------- 1998 10 117.76 43.975 0 1998 9 58.78 43.975 .142857143 1998 6 58.78 43.975 .142857143 1998 1 58.15 43.975 .428571429 1998 5 29.8 43.975 .571428571 1998 3 29.49 43.975 .714285714 1998 4 29.49 43.975 .714285714 1998 2 29.39 43.975 1 1999 8 103.11 62.76 0 1999 3 94.6 62.76 .142857143 1999 5 80.01 62.76 .285714286 1999 10 72 62.76 .428571429 1999 1 53.52 62.76 .571428571 1999 9 53.34 62.76 .714285714 1999 6 40.5 62.76 .857142857 1999 4 40.5 62.76 .857142857 2000 3 93.41 46.7 0 2000 7 70.8 46.7 .2 2000 5 46.7 46.7 .4 2000 1 46.7 46.7 .4 2000 4 46.54 46.7 .8 2000 8 46.54 46.7 .8 2001 4 256.7 81.11 0 2001 2 118.38 81.11 .111111111 2001 5 93.44 81.11 .222222222 2001 9 92.67 81.11 .333333333 2001 1 92.26 81.11 .444444444 YEAR WEEK SALE PC PR ---------- ---------- ---------- ---------- ---------- 2001 7 69.96 81.11 .555555556 2001 10 69.05 81.11 .666666667 2001 3 47.24 81.11 .777777778 2001 8 46.06 81.11 .888888889 2001 6 22.44 81.11 1 32 rows selected.
NTILE(类型于建立直方图,不支持开窗)
将排序后的数据均匀分配到指定个数据桶中,返回桶编号,如果不能等分,各个桶中的行数最多相差一行。
在以后的处理中可以通过去除首桶或尾去除异常值。
注意:并不是按值分配的。SH@ prod> select year , week , sale , 2 ntile(10) over(order by sale ) group# 3 from sales_fact 4 where country in ('Australia') and product = 'Xtend Memory' and year = 1998 order by year , sale; YEAR WEEK SALE GROUP# ---------- ---------- ---------- ---------- 1998 50 28.76 1 1998 2 29.39 1 1998 4 29.49 1 1998 3 29.49 1 1998 5 29.8 2 1998 43 57.52 2 1998 35 57.52 2 1998 40 57.52 2 1998 46 57.52 3 1998 27 57.52 3 1998 45 57.52 3 1998 44 57.52 3 1998 47 57.72 4 1998 29 57.72 4 1998 28 57.72 4 1998 1 58.15 4 1998 41 58.32 5 1998 51 58.32 5 1998 14 58.78 5 1998 9 58.78 5 1998 15 58.78 6 1998 17 58.78 6 1998 6 58.78 6 1998 19 58.98 6 1998 21 59.6 7 1998 12 59.6 7 1998 52 86.38 7 YEAR WEEK SALE GROUP# ---------- ---------- ---------- ---------- 1998 34 115.44 8 1998 39 115.84 8 1998 42 115.84 8 1998 38 115.84 9 1998 23 117.56 9 1998 18 117.56 9 1998 26 117.56 10 1998 10 117.76 10 1998 48 172.56 10 36 rows selected.
Stddev计算标准差(方差的平方根,支持开窗)
SH@ prod> select year , week , sale , 2 stddev(sale) over( 3 partition by product , country , region , year 4 order by sale desc 5 rows between 2 preceding and 2 following ) stddv 6 from sales_fact 7 where country in ('Australia') and product = 'Xtend Memory' and week < 10 8 order by year , week ; YEAR WEEK SALE STDDV ---------- ---------- ---------- ---------- 1998 1 58.15 15.8453416 1998 2 29.39 .057735027 1998 3 29.49 .178021534 1998 4 29.49 12.7945918 1998 5 29.8 15.815738 1998 6 58.78 .36373067 1998 9 58.78 14.3880654 1999 1 53.52 22.178931 1999 3 94.6 21.7319902 1999 4 40.5 7.46550065 1999 5 80.01 22.9761992 1999 6 40.5 7.41317746 1999 8 103.11 11.6825953 1999 9 53.34 16.1305511 2000 1 46.7 21.0022332 2000 3 93.41 23.3589605 2000 4 46.54 .092376043 2000 5 46.7 10.8139207 2000 7 70.8 22.4285538 2000 8 46.54 .092376043 2001 1 92.26 20.3811452 2001 2 118.38 78.5152276 2001 3 47.24 26.5077898 2001 4 256.7 87.947194 2001 5 93.44 71.309193 2001 6 22.44 13.9900965 2001 7 69.96 22.9124643 YEAR WEEK SALE STDDV ---------- ---------- ---------- ---------- 2001 8 46.06 19.407678 2001 9 92.67 17.1409691 29 rows selected.
Listagg(把分区中的列按照顺序拼接起来,不支持开窗)
SH@ prod> col stddv for a60 SH@ prod> select year , week , sale , 2 listagg(sale , ' , ')within group(order by sale desc) over( 3 partition by product , country , region , year ) stddv 4 from sales_fact 5 where country in ('Australia') and product = 'Xtend Memory' and week < 5 6 order by year , week ; YEAR WEEK SALE STDDV ---------- ---------- ---------- ------------------------------------------------------------ 1998 1 58.15 58.15 , 29.49 , 29.49 , 29.39 1998 2 29.39 58.15 , 29.49 , 29.49 , 29.39 1998 3 29.49 58.15 , 29.49 , 29.49 , 29.39 1998 4 29.49 58.15 , 29.49 , 29.49 , 29.39 1999 1 53.52 94.6 , 53.52 , 40.5 1999 3 94.6 94.6 , 53.52 , 40.5 1999 4 40.5 94.6 , 53.52 , 40.5 2000 1 46.7 93.41 , 46.7 , 46.54 2000 3 93.41 93.41 , 46.7 , 46.54 2000 4 46.54 93.41 , 46.7 , 46.54 2001 1 92.26 256.7 , 118.38 , 92.26 , 47.24 2001 2 118.38 256.7 , 118.38 , 92.26 , 47.24 2001 3 47.24 256.7 , 118.38 , 92.26 , 47.24 2001 4 256.7 256.7 , 118.38 , 92.26 , 47.24 14 rows selected.
分析函数对谓词前推的影响
使用了分析函数的视图,会影响视图前推,因为分析函数的结果是跨行引用得来的,如果对数据源进行的剪裁,结果可能会不一样
SH@ prod> create or replace view max_5_weeks_vw as 2 select country , product , region , year , week , sale , 3 max(sale) over( 4 partition by product , country , region , year order by year , week 5 rows between 2 preceding and 2 following ) max_weeks_5 6 from sales_fact ; View created. SH@ prod> select year , week , sale , max_weeks_5 from max_5_weeks_vw 2 where country in ('Australia' ) and product = 'Xtend Memory' 3 and region = 'Australia' and year = 2000 and week < 14 4 order by year , week ; YEAR WEEK SALE MAX_WEEKS_5 ---------- ---------- ---------- ----------- 2000 1 46.7 93.41 2000 3 93.41 93.41 2000 4 46.54 93.41 2000 5 46.7 93.41 2000 7 70.8 93.74 2000 8 46.54 93.74 2000 11 93.74 117.5 2000 12 46.54 117.67 2000 13 117.5 117.67 9 rows selected. SH@ prod> explain plan for 2 select year , week , sale , max_weeks_5 from max_5_weeks_vw 3 where country in ('Australia' ) and product = 'Xtend Memory' 4 and region = 'Australia' and year = 2000 and week < 14 5 order by year , week ; Explained. SH@ prod> select * from table(dbms_xplan.display()); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------ Plan hash value: 4167461139 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 90 | 5220 | 310 (1)| 00:00:04 | |* 1 | VIEW | MAX_5_WEEKS_VW | 90 | 5220 | 310 (1)| 00:00:04 | | 2 | WINDOW SORT | | 90 | 9450 | 310 (1)| 00:00:04 | |* 3 | TABLE ACCESS FULL| SALES_FACT | 90 | 9450 | 309 (1)| 00:00:04 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("WEEK"<14) 3 - filter("COUNTRY"='Australia' AND "PRODUCT"='Xtend Memory' AND "REGION"='Australia' AND "YEAR"=2000) Note ----- - dynamic sampling used for this statement (level=2) 21 rows selected.对比没有分析函数的视图。直接将谓词推入到视图里面。
SH@ prod> create or replace view max_5_weeks_vw1 as 2 select country , product , region , year , week , sale 3 from sales_fact ; View created. SH@ prod> explain plan for 2 select year , week , sale from max_5_weeks_vw1 3 where country in ('Australia' ) and product = 'Xtend Memory' 4 and region = 'Australia' and year = 2000 and week < 14 5 order by year , week ; Explained. SH@ prod> select * from table(dbms_xplan.display()); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------ Plan hash value: 1978576542 --------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 105 | 310 (1)| 00:00:04 | | 1 | SORT ORDER BY | | 1 | 105 | 310 (1)| 00:00:04 | |* 2 | TABLE ACCESS FULL| SALES_FACT | 1 | 105 | 309 (1)| 00:00:04 | --------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("COUNTRY"='Australia' AND "PRODUCT"='Xtend Memory' AND "REGION"='Australia' AND "YEAR"=2000 AND "WEEK"<14) Note ----- - dynamic sampling used for this statement (level=2) 19 rows selected.
分析函数用在动态SQL中
SH@ prod> create or replace procedure analytic_dynamic_prc ( part_col_string varchar2 , v_country varchar2 , v_product varchar2 ) 2 is 3 type numtab is table of number(18 , 2) index by binary_integer ; 4 l_year numtab ; 5 l_week numtab ; 6 l_sale numtab ; 7 l_rank numtab ; 8 l_sql_string varchar2(512) ; 9 begin 10 l_sql_string := 'select * from ( select year , week , sale , rank() over( partition by ' || part_col_string 11 || ' order by sale desc ) sales_rank from sales_fact where country in (' 12 || chr(39) || v_country || chr(39) 13 || ' ) and product = ' || chr(39) || v_product || chr(39) 14 || 'order by product , country , year , week ) where sales_rank <= 10 order by 1,4' ; 15 execute immediate l_sql_string bulk collect into l_year , l_week , l_sale , l_rank ; 16 for i in 1..l_year.count loop 17 dbms_output.put_line( l_year(i) || ' | ' || l_week(i) || ' | ' || l_sale(i) || ' | ' || l_rank(i) ) ; 18 end loop ; 19 end ; 20 / Procedure created. SH@ prod> exec analytic_dynamic_prc('product , country , region' , 'Australia' , 'Xtend Memory' ) ; 1998 | 48 | 172.56 | 9 2000 | 46 | 246.74 | 3 2000 | 21 | 187.48 | 5 2000 | 43 | 179.12 | 7 2000 | 34 | 178.52 | 8 2001 | 16 | 278.44 | 1 2001 | 4 | 256.7 | 2 2001 | 21 | 233.7 | 4 2001 | 48 | 182.96 | 6 2001 | 30 | 162.91 | 10 2001 | 14 | 162.91 | 10 PL/SQL procedure successfully completed.
分析函数的“嵌套”
分析函数不能直接嵌套,可能通过子查询来实现。
select year , week , top_sale_year , lag( top_sale_year ) over ( order by year desc ) prev_top_sale_yer from ( select distinct first_value(year) over ( 这里的作用不能用MAX代替,这里取列与排序的列是不同的。 partition by product , country , region , year order by sale desc rows between unbounded preceding and unbounded following ) year , first_value(week) over ( partition by product , country , region , year order by sale desc rows between unbounded preceding and unbounded following ) week , first_value(sale) over ( partition by product , country , region , year order by sale desc rows between unbounded preceding and unbounded following ) top_sale_year from sales_fact where country in ('Australia') and product = 'Xtend Memory' ) order by year , week ;执行结果。
SH@ prod> select year , week , top_sale_year , 2 lag( top_sale_year ) over ( order by year desc ) prev_top_sale_yer 3 from ( 4 select distinct 5 first_value(year) over ( 6 partition by product , country , region , year 7 order by sale desc 8 rows between unbounded preceding and unbounded following ) year , 9 first_value(week) over ( 10 partition by product , country , region , year 11 order by sale desc 12 rows between unbounded preceding and unbounded following ) week , 13 first_value(sale) over ( 14 partition by product , country , region , year 15 order by sale desc 16 rows between unbounded preceding and unbounded following ) top_sale_year 17 from sales_fact 18 where country in ('Australia') and product = 'Xtend Memory' ) 19 order by year , week ; YEAR WEEK TOP_SALE_YEAR PREV_TOP_SALE_YER ---------- ---------- ------------- ----------------- 1998 48 172.56 148.12 1999 17 148.12 246.74 2000 46 246.74 278.44 2001 16 278.44
分析函数的并行
SH@ prod> explain plan for 2 select year , week , top_sale_year , 3 lag( top_sale_year ) over ( order by year desc ) prev_top_sale_yer 4 from ( 5 select distinct 6 first_value(year) over ( 7 partition by product , country , region , year 8 order by sale desc 9 rows between unbounded preceding and unbounded following ) year , 10 first_value(week) over ( 11 partition by product , country , region , year 12 order by sale desc 13 rows between unbounded preceding and unbounded following ) week , 14 first_value(sale) over ( 15 partition by product , country , region , year 16 order by sale desc 17 rows between unbounded preceding and unbounded following ) top_sale_year 18 from sales_fact 19 where country in ('Australia') and product = 'Xtend Memory' ) 20 order by year , week ; Explained. SH@ prod> select * from table(dbms_xplan.display()); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------ Plan hash value: 2124823565 ------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 197 | 7683 | 313 (2)| 00:00:04 | | 1 | SORT ORDER BY | | 197 | 7683 | 313 (2)| 00:00:04 | | 2 | WINDOW SORT | | 197 | 7683 | 313 (2)| 00:00:04 | | 3 | VIEW | | 197 | 7683 | 311 (1)| 00:00:04 | | 4 | HASH UNIQUE | | 197 | 20685 | 311 (1)| 00:00:04 | | 5 | WINDOW SORT | | 197 | 20685 | 311 (1)| 00:00:04 | |* 6 | TABLE ACCESS FULL| SALES_FACT | 197 | 20685 | 309 (1)| 00:00:04 | ------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 6 - filter("COUNTRY"='Australia' AND "PRODUCT"='Xtend Memory') Note ----- - dynamic sampling used for this statement (level=2) 22 rows selected. (注意DISTINCT操作采用的是HASH UNIQUE而不是排序)
为上面的语句添加并行提示。
SH@ prod> explain plan for 2 select /*+ parallel(3)*/ year , week , top_sale_year , 3 lag( top_sale_year ) over ( order by year desc ) prev_top_sale_yer 4 from ( 5 select distinct 6 first_value(year) over ( 7 partition by product , country , region , year 8 order by sale desc 9 rows between unbounded preceding and unbounded following ) year , 10 first_value(week) over ( 11 partition by product , country , region , year 12 order by sale desc 13 rows between unbounded preceding and unbounded following ) week , 14 first_value(sale) over ( 15 partition by product , country , region , year 16 order by sale desc 17 rows between unbounded preceding and unbounded following ) top_sale_year 18 from sales_fact 19 where country in ('Australia') and product = 'Xtend Memory' ) 20 order by year , week ; Explained. SH@ prod> set linesize 180 SH@ prod> select * from table(dbms_xplan.display()); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Plan hash value: 2880616722 ---------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib | ---------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 197 | 7683 | 119 (5)| 00:00:02 | | | | | 1 | SORT ORDER BY | | 197 | 7683 | 119 (5)| 00:00:02 | | | | | 2 | WINDOW BUFFER | | 197 | 7683 | 119 (5)| 00:00:02 | | | | | 3 | PX COORDINATOR | | | | | | | | | | 4 | PX SEND QC (ORDER) | :TQ10003 | 197 | 7683 | 119 (5)| 00:00:02 | Q1,03 | P->S | QC (ORDER) | | 5 | SORT ORDER BY | | 197 | 7683 | 119 (5)| 00:00:02 | Q1,03 | PCWP | | | 6 | PX RECEIVE | | 197 | 7683 | 117 (3)| 00:00:02 | Q1,03 | PCWP | | | 7 | PX SEND RANGE | :TQ10002 | 197 | 7683 | 117 (3)| 00:00:02 | Q1,02 | P->P | RANGE | | 8 | VIEW | | 197 | 7683 | 117 (3)| 00:00:02 | Q1,02 | PCWP | | | 9 | HASH UNIQUE | | 197 | 20685 | 117 (3)| 00:00:02 | Q1,02 | PCWP | | | 10 | PX RECEIVE | | 197 | 20685 | 117 (3)| 00:00:02 | Q1,02 | PCWP | | | 11 | PX SEND HASH | :TQ10001 | 197 | 20685 | 117 (3)| 00:00:02 | Q1,01 | P->P | HASH | | 12 | WINDOW SORT | | 197 | 20685 | 117 (3)| 00:00:02 | Q1,01 | PCWP | | | 13 | PX RECEIVE | | 197 | 20685 | 114 (0)| 00:00:02 | Q1,01 | PCWP | | | 14 | PX SEND HASH | :TQ10000 | 197 | 20685 | 114 (0)| 00:00:02 | Q1,00 | P->P | HASH | | 15 | PX BLOCK ITERATOR | | 197 | 20685 | 114 (0)| 00:00:02 | Q1,00 | PCWC | | |* 16 | TABLE ACCESS FULL| SALES_FACT | 197 | 20685 | 114 (0)| 00:00:02 | Q1,00 | PCWP | | ---------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 16 - filter("COUNTRY"='Australia' AND "PRODUCT"='Xtend Memory') Note ----- - dynamic sampling used for this statement (level=2) - Degree of Parallelism is 3 because of hint 33 rows selected.
Oracle 高级排序函数 和 高级分组函数
注:本内容来源于《Oracle 高级排序函数 和 高级分组函数》
高级排序函数:
[ ROW_NUMBER()| RANK() | DENSE_RANK ] OVER (partition by xx order by xx)
1.row_number() 连续且递增的数字 1 2 3 4
row_number() over (partition by xx order by xx )
–学生表中按照所在专业分组,同专业内按成绩倒序排序,成绩相同则按学号正序排序,并给予组内等级
select row_number() over(partition by class_id order by score desc)rn,t.* from student2016 t
2.rank() 跳跃排序 若有相同数据则排名相同 然后跳跃排序 1 2 2 2 5
rank() over (partition by xx order by xx )
select rank() over(partition by class_id order by score desc)rn,t.* from student2016 t
3.dense_rank 若有相同数据则排名相同 然后递增排序
dense_rank over (partition by xx order by xx ) 1 2 2 2 3
select dense_rank() over(partition by class_id order by score desc)rn,t.* from student2016 t
—————————————————————————————————————————-
高级分组函数
group by rollup(a,b,c)
select a,b,c,sum(d) from test group by rollup(a,b,c)
对rollup后面的列 按从右到左以少一列的方式进行分组直到所有列都去掉后的分组(也就是全表分组)
对于n个参数的 rollup,有n+1次分组即按a,b,c,分组,union all a,b分组 union all a分组 union from test
———————————————————————————-
group by cube(a,b,c)
对n个参数,有2^n次分组
即按 ab,ac,a,bc,b,c最后对 全部分组
———————————————————————————-
group by grouping sets(a,b)
即只列出 对 a分组后,和对 b分组的结果集
-- 创建销售表 create table sales_tab( year_id number not null, month_id number not null, day_id number not null, sales_value number(10,2) not null ); -- 插入数据 insert into sales_tab select trunc(dbms_random.value(low=>2010,high=>2012)) as year_id, trunc(dbms_random.value(low=>1,high=>13)) as month_id, trunc(dbms_random.value(low=>1,high=>32)) as day_id, round(dbms_random.value(low=>1,high=>100)) as sales_value from dual connect by level <=1000; -- 查询 group by 后的数据 select sum(t.sales_value) from SALES_TAB t -- 1行 select t.year_id,t.month_id,t.day_id,sum(t.sales_value) sales from SALES_TAB t group by t.year_id,t.month_id,t.day_id order by t.year_id,t.month_id,t.day_id desc; -- 540行 select t.year_id,t.month_id,sum(t.sales_value) sales from SALES_TAB t group by t.year_id,t.month_id order by t.year_id,t.month_id desc; -- 24 行 select t.year_id,sum(t.sales_value) sales from SALES_TAB t group by t.year_id order by t.year_id desc; -- 2 行 -- 使用高级分组函数 -- group by rollup(a,b,c) select t.year_id,t.month_id,t.day_id,sum(t.sales_value) sales from SALES_TAB t group by rollup(t.year_id,t.month_id,t.day_id) order by t.year_id,t.month_id,t.day_id; -- 567 行 = 同上面 1+540+24+2 -- group by cube(a,b,c) select t.year_id,t.month_id,t.day_id,sum(t.sales_value) sales from SALES_TAB t group by cube(t.year_id,t.month_id,t.day_id) order by t.year_id,t.month_id,t.day_id; --group by grouping sets(a,b,c) select t.year_id,t.month_id,sum(t.sales_value) sales from SALES_TAB t group by cube(t.year_id,t.month_id) order by 1,2; -- 39 行 select t.year_id,t.month_id,sum(t.sales_value) sales from SALES_TAB t group by grouping sets(t.year_id,t.month_id) order by 1,2; -- 14 行
Oracle高级查询之OVER (PARTITION BY ..)
注:本内容来源于《Oracle高级查询之OVER (PARTITION BY ..)》
为了方便大家学习和测试,所有的例子都是在Oracle自带用户Scott下建立的。
注:标题中的红色order by是说明在使用该方法的时候必须要带上order by。
一、rank()/dense_rank() over(partition by …order by …)
现在客户有这样一个需求,查询每个部门工资最高的雇员的信息,相信有一定oracle应用知识的同学都能写出下面的SQL语句:
[sql]
view plain
copy
- select e.ename, e.job, e.sal, e.deptno
- from scott.emp e,
- (select e.deptno, max(e.sal) sal from scott.emp e group by e.deptno) me
- where e.deptno = me.deptno
- and e.sal = me.sal;
在满足客户需求的同时,大家应该习惯性的思考一下是否还有别的方法。这个是肯定的,就是使用本小节标题中rank() over(partition by…)或dense_rank() over(partition by…)语法,SQL分别如下:
[sql]
view plain
copy
- select e.ename, e.job, e.sal, e.deptno
- from (select e.ename,
- e.job,
- e.sal,
- e.deptno,
- rank() over(partition by e.deptno order by e.sal desc) rank
- from scott.emp e) e
- where e.rank = 1;
[sql]
view plain
copy
- select e.ename, e.job, e.sal, e.deptno
- from (select e.ename,
- e.job,
- e.sal,
- e.deptno,
- dense_rank() over(partition by e.deptno order by e.sal desc) rank
- from scott.emp e) e
- where e.rank = 1;
为什么会得出跟上面的语句一样的结果呢?这里补充讲解一下rank()/dense_rank() over(partition by e.deptno order by e.sal desc)语法。
over: 在什么条件之上。
partition by e.deptno: 按部门编号划分(分区)。
order by e.sal desc: 按工资从高到低排序(使用rank()/dense_rank() 时,必须要带order by否则非法)
rank()/dense_rank(): 分级
整个语句的意思就是:在按部门划分的基础上,按工资从高到低对雇员进行分级,“级别”由从小到大的数字表示(最小值一定为1)。
那么rank()和dense_rank()有什么区别呢?
rank(): 跳跃排序,如果有两个第一级时,接下来就是第三级。
dense_rank(): 连续排序,如果有两个第一级时,接下来仍然是第二级。
小作业:查询部门最低工资的雇员信息。
二、min()/max() over(partition by …)
现在我们已经查询得到了部门最高/最低工资,客户需求又来了,查询雇员信息的同时算出雇员工资与部门最高/最低工资的差额。这个还是比较简单,在第一节的groupby语句的基础上进行修改如下:
[sql]
view plain
copy
- select e.ename,
- e.job,
- e.sal,
- e.deptno,
- e.sal – me.min_sal diff_min_sal,
- me.max_sal – e.sal diff_max_sal
- from scott.emp e,
- (select e.deptno, min(e.sal) min_sal, max(e.sal) max_sal
- from scott.emp e
- group by e.deptno) me
- where e.deptno = me.deptno
- order by e.deptno, e.sal;
上面我们用到了min()和max(),前者求最小值,后者求最大值。如果这两个方法配合over(partition by …)使用会是什么效果呢?大家看看下面的SQL语句:
[sql]
view plain
copy
- select e.ename,
- e.job,
- e.sal,
- e.deptno,
- nvl(e.sal – min(e.sal) over(partition by e.deptno), 0) diff_min_sal,
- nvl(max(e.sal) over(partition by e.deptno) – e.sal, 0) diff_max_sal
- from scott.emp e;
这两个语句的查询结果是一样的,大家可以看到min()和max()实际上求的还是最小值和最大值,只不过是在partition by分区基础上的。
小作业:如果在本例中加上order by,会得到什么结果呢?
三、lead()/lag() over(partition by … order by …)
中国人爱攀比,好面子,闻名世界。客户更是好这一口,在和最高/最低工资比较完之后还觉得不过瘾,这次就提出了一个比较变态的需求,计算个人工资与比自己高一位/低一位工资的差额。这个需求确实让我很是为难,在groupby语句中不知道应该怎么去实现。不过。。。。现在我们有了over(partition by …),一切看起来是那么的简单。如下:
[sql]
view plain
copy
- select e.ename,
- e.job,
- e.sal,
- e.deptno,
- lead(e.sal, 1, 0) over(partition by e.deptno order by e.sal) lead_sal,
- lag(e.sal, 1, 0) over(partition by e.deptno order by e.sal) lag_sal,
- nvl(lead(e.sal) over(partition by e.deptno order by e.sal) – e.sal,
- 0) diff_lead_sal,
- nvl(e.sal – lag(e.sal) over(partition by e.deptno order by e.sal), 0) diff_lag_sal
- from scott.emp e;
看了上面的语句后,大家是否也会觉得虚惊一场呢(惊出一身冷汗后突然鸡冻起来,这样容易感冒)?我们还是来讲解一下上面用到的两个新方法吧。
lead(列名,n,m): 当前记录后面第n行记录的<列名>的值,没有则默认值为m;如果不带参数n,m,则查找当前记录后面第一行的记录<列名>的值,没有则默认值为null。
lag(列名,n,m): 当前记录前面第n行记录的<列名>的值,没有则默认值为m;如果不带参数n,m,则查找当前记录前面第一行的记录<列名>的值,没有则默认值为null。
下面再列举一些常用的方法在该语法中的应用(注:带order by子句的方法说明在使用该方法的时候必须要带order by):
[sql]
view plain
copy
- select e.ename,
- e.job,
- e.sal,
- e.deptno,
- first_value(e.sal) over(partition by e.deptno) first_sal,
- last_value(e.sal) over(partition by e.deptno) last_sal,
- sum(e.sal) over(partition by e.deptno) sum_sal,
- avg(e.sal) over(partition by e.deptno) avg_sal,
- count(e.sal) over(partition by e.deptno) count_num,
- row_number() over(partition by e.deptno order by e.sal) row_num
- from scott.emp e;
重要提示:大家在读完本片文章之后可能会有点误解,就是OVER (PARTITION BY ..)比GROUP BY更好,实际并非如此,前者不可能替代后者,而且在执行效率上前者也没有后者高,只是前者提供了更多的功能而已,所以希望大家在使用中要根据需求情况进行选择。
路在脚下
==================================================================================================================================================
,
为了方便大家学习和测试,所有的例子都是在Oracle自带用户Scott下建立的。
注:标题中的红色order by是说明在使用该方法的时候必须要带上order by。
一、rank()/dense_rank() over(partition by …order by …)
现在客户有这样一个需求,查询每个部门工资最高的雇员的信息,相信有一定oracle应用知识的同学都能写出下面的SQL语句:
[sql]
view plain
copy
- select e.ename, e.job, e.sal, e.deptno
- from scott.emp e,
- (select e.deptno, max(e.sal) sal from scott.emp e group by e.deptno) me
- where e.deptno = me.deptno
- and e.sal = me.sal;
在满足客户需求的同时,大家应该习惯性的思考一下是否还有别的方法。这个是肯定的,就是使用本小节标题中rank() over(partition by…)或dense_rank() over(partition by…)语法,SQL分别如下:
[sql]
view plain
copy
- select e.ename, e.job, e.sal, e.deptno
- from (select e.ename,
- e.job,
- e.sal,
- e.deptno,
- rank() over(partition by e.deptno order by e.sal desc) rank
- from scott.emp e) e
- where e.rank = 1;
[sql]
view plain
copy
- select e.ename, e.job, e.sal, e.deptno
- from (select e.ename,
- e.job,
- e.sal,
- e.deptno,
- dense_rank() over(partition by e.deptno order by e.sal desc) rank
- from scott.emp e) e
- where e.rank = 1;
为什么会得出跟上面的语句一样的结果呢?这里补充讲解一下rank()/dense_rank() over(partition by e.deptno order by e.sal desc)语法。
over: 在什么条件之上。
partition by e.deptno: 按部门编号划分(分区)。
order by e.sal desc: 按工资从高到低排序(使用rank()/dense_rank() 时,必须要带order by否则非法)
rank()/dense_rank(): 分级
整个语句的意思就是:在按部门划分的基础上,按工资从高到低对雇员进行分级,“级别”由从小到大的数字表示(最小值一定为1)。
那么rank()和dense_rank()有什么区别呢?
rank(): 跳跃排序,如果有两个第一级时,接下来就是第三级。
dense_rank(): 连续排序,如果有两个第一级时,接下来仍然是第二级。
小作业:查询部门最低工资的雇员信息。
二、min()/max() over(partition by …)
现在我们已经查询得到了部门最高/最低工资,客户需求又来了,查询雇员信息的同时算出雇员工资与部门最高/最低工资的差额。这个还是比较简单,在第一节的groupby语句的基础上进行修改如下:
[sql]
view plain
copy
- select e.ename,
- e.job,
- e.sal,
- e.deptno,
- e.sal – me.min_sal diff_min_sal,
- me.max_sal – e.sal diff_max_sal
- from scott.emp e,
- (select e.deptno, min(e.sal) min_sal, max(e.sal) max_sal
- from scott.emp e
- group by e.deptno) me
- where e.deptno = me.deptno
- order by e.deptno, e.sal;
上面我们用到了min()和max(),前者求最小值,后者求最大值。如果这两个方法配合over(partition by …)使用会是什么效果呢?大家看看下面的SQL语句:
[sql]
view plain
copy
- select e.ename,
- e.job,
- e.sal,
- e.deptno,
- nvl(e.sal – min(e.sal) over(partition by e.deptno), 0) diff_min_sal,
- nvl(max(e.sal) over(partition by e.deptno) – e.sal, 0) diff_max_sal
- from scott.emp e;
这两个语句的查询结果是一样的,大家可以看到min()和max()实际上求的还是最小值和最大值,只不过是在partition by分区基础上的。
小作业:如果在本例中加上order by,会得到什么结果呢?
三、lead()/lag() over(partition by … order by …)
中国人爱攀比,好面子,闻名世界。客户更是好这一口,在和最高/最低工资比较完之后还觉得不过瘾,这次就提出了一个比较变态的需求,计算个人工资与比自己高一位/低一位工资的差额。这个需求确实让我很是为难,在groupby语句中不知道应该怎么去实现。不过。。。。现在我们有了over(partition by …),一切看起来是那么的简单。如下:
[sql]
view plain
copy
- select e.ename,
- e.job,
- e.sal,
- e.deptno,
- lead(e.sal, 1, 0) over(partition by e.deptno order by e.sal) lead_sal,
- lag(e.sal, 1, 0) over(partition by e.deptno order by e.sal) lag_sal,
- nvl(lead(e.sal) over(partition by e.deptno order by e.sal) – e.sal,
- 0) diff_lead_sal,
- nvl(e.sal – lag(e.sal) over(partition by e.deptno order by e.sal), 0) diff_lag_sal
- from scott.emp e;
看了上面的语句后,大家是否也会觉得虚惊一场呢(惊出一身冷汗后突然鸡冻起来,这样容易感冒)?我们还是来讲解一下上面用到的两个新方法吧。
lead(列名,n,m): 当前记录后面第n行记录的<列名>的值,没有则默认值为m;如果不带参数n,m,则查找当前记录后面第一行的记录<列名>的值,没有则默认值为null。
lag(列名,n,m): 当前记录前面第n行记录的<列名>的值,没有则默认值为m;如果不带参数n,m,则查找当前记录前面第一行的记录<列名>的值,没有则默认值为null。
下面再列举一些常用的方法在该语法中的应用(注:带order by子句的方法说明在使用该方法的时候必须要带order by):
[sql]
view plain
copy
- select e.ename,
- e.job,
- e.sal,
- e.deptno,
- first_value(e.sal) over(partition by e.deptno) first_sal,
- last_value(e.sal) over(partition by e.deptno) last_sal,
- sum(e.sal) over(partition by e.deptno) sum_sal,
- avg(e.sal) over(partition by e.deptno) avg_sal,
- count(e.sal) over(partition by e.deptno) count_num,
- row_number() over(partition by e.deptno order by e.sal) row_num
- from scott.emp e;
重要提示:大家在读完本片文章之后可能会有点误解,就是OVER (PARTITION BY ..)比GROUP BY更好,实际并非如此,前者不可能替代后者,而且在执行效率上前者也没有后者高,只是前者提供了更多的功能而已,所以希望大家在使用中要根据需求情况进行选择。