Tutorials 使用窗口功能分析信息

本教程中简要介绍了分析钻1.2,即ANSI兼容sql解析和窗口函数。钻支持以下SQL窗口函数:

  • PATRIRION BY和OVER条款

  • 各种聚合窗口函数,Sum,Max,Min,Count,Avg

  • 分析函数,如First_Value Last_Value、Lead、Lag,NTile,Row_Number和Rank

窗口功能多种多样。你可以减少连接、子查询和需要编写显式游标。窗口函数解决各种用例以最少的编码工作。

本教程建立在之前的教程,分析Yelp学术数据集 和 分析高度动态数据集,并使用相同的Yelp数据集。

开始

  1. 要开始,请下载 Yelp (业务审查)。

  2. 安装和启动钻。

  3. 在Drill中列出可用的模式列表。

SHOW schemas;
+---------------------+
|     SCHEMA_NAME     |
+---------------------+
| INFORMATION_SCHEMA  |
| cp.default          |
| dfs.default         |
| dfs.root            |
| dfs.tmp             |
| dfs.yelp            |
| sys                 |
+---------------------+

7 rows selected (1.755 seconds)
  1. 切换到使用Yelp的工作区数据加载。

    USE dfs.yelp;

    +——-+—————————————+
    | ok | summary |
    +——-+—————————————+
    | true | Default schema changed to [dfs.yelp] |
    +——-+—————————————+

    1 row selected (0.129 seconds)

  2. 开始探索的一个数据集在Yelp数据集——业务信息。

    SELECT * FROM business.json LIMIT 1;

    +————————+—————————————————–+—————————————————————————————————————————————————————————————————————————————————–+——+——————————–+———+————–+——————-+————-+——-+——-+———–+—————————————————————————————————————————————————–+———-+—————+
    | business_id | full_address | hours | open | categories | city | review_count | name | longitude | state | stars | latitude | attributes | type | neighborhoods |
    +————————+————–+——+——————————-+—————————————————————————————————————————————————————————————————————————————————–+——+——————————–+———+————–+——————-+————-+——-+——-+———–+—————————————————————————————————————————————————–+———-+—————+
    | vcNAWiLM4dR7D2nwwJ7nCA | 4840 E Indian School Rd Ste 101 Phoenix, AZ 85018 | {“Tuesday”:{“close”:”17:00″,”open”:”08:00″},”Friday”:{“close”:”17:00″,”open”:”08:00″},”Monday”:{“close”:”17:00″,”open”:”08:00″},”Wednesday”:{“close”:”17:00″,”open”:”08:00″},”Thursday”:{“close”:”17:00″,”open”:”08:00″},”Sunday”:{},”Saturday”:{}} | true | [“Doctors”,”Health & Medical”] | Phoenix | 7 | Eric Goldberg, MD | -111.983758 | AZ | 3.5 | 33.499313 | {“By Appointment Only”:true,”Good Ambience”:{},”Parking”:{},”Music”:{},”Hair Types Specialized In”:{},”Payment Types”:{},”Dietary Restrictions”:{}} | business | [] |
    +————-+————–+——-+——+————+——+————–+—————————————————————————————————————————————————————————————————————————————————–+——+——————————–+———+————–+——————-+————-+——-+——-+———–+—————————————————————————————————————————————————–+———-+—————+
    1 row selected (0.514 seconds)

使用窗口功能简单的查询

  1. 得到最高Yelp企业基于评论在每个城市数量和业务的行号。

    SELECT name, city, review_count, row_number()
    OVER (PARTITION BY city ORDER BY review_count DESC) AS rownum
    FROM business.json LIMIT 15;

    +—————————————-+————+—————+———+
    | name | city | review_count | rownum |
    +—————————————-+————+—————+———+
    | Cupz N’ Crepes | Ahwatukee | 124 | 1 |
    | My Wine Cellar | Ahwatukee | 98 | 2 |
    | Kathy’s Alterations | Ahwatukee | 12 | 3 |
    | McDonald’s | Ahwatukee | 7 | 4 |
    | U-Haul | Ahwatukee | 5 | 5 |
    | Hi-Health | Ahwatukee | 4 | 6 |
    | Healthy and Clean Living Environments | Ahwatukee | 4 | 7 |
    | Active Kids Pediatrics | Ahwatukee | 4 | 8 |
    | Roberto’s Authentic Mexican Food | Anthem | 117 | 1 |
    | Q to U BBQ | Anthem | 74 | 2 |
    | Outlets At Anthem | Anthem | 64 | 3 |
    | Dara Thai | Anthem | 56 | 4 |
    | Cafe Provence | Anthem | 53 | 5 |
    | Shanghai Club | Anthem | 50 | 6 |
    | Two Brothers Kitchen | Anthem | 43 | 7 |
    +—————————————-+————+—————+———+
    15 rows selected (0.67 seconds)

  2. 检查评论数量的每个业务相比评论的平均数量在所有业务。

    SELECT name, city,review_count,
    Avg(review_count) OVER (PARTITION BY City) AS city_reviews_avg
    FROM business.json LIMIT 15;

    +—————————————-+————+—————+———————+
    | name | city | review_count | city_reviews_avg |
    +—————————————-+————+—————+———————+
    | Hi-Health | Ahwatukee | 4 | 32.25 |
    | My Wine Cellar | Ahwatukee | 98 | 32.25 |
    | U-Haul | Ahwatukee | 5 | 32.25 |
    | Cupz N’ Crepes | Ahwatukee | 124 | 32.25 |
    | McDonald’s | Ahwatukee | 7 | 32.25 |
    | Kathy’s Alterations | Ahwatukee | 12 | 32.25 |
    | Healthy and Clean Living Environments | Ahwatukee | 4 | 32.25 |
    | Active Kids Pediatrics | Ahwatukee | 4 | 32.25 |
    | Anthem Community Center | Anthem | 4 | 14.492063492063492 |
    | Scrapbooks To Remember | Anthem | 4 | 14.492063492063492 |
    | Hungry Howie’s Pizza | Anthem | 7 | 14.492063492063492 |
    | Pinata Nueva | Anthem | 3 | 14.492063492063492 |
    | Starbucks Coffee Company | Anthem | 13 | 14.492063492063492 |
    | Pizza Hut | Anthem | 6 | 14.492063492063492 |
    | Rays Pizza | Anthem | 19 | 14.492063492063492 |
    +—————————————-+————+—————+———————+
    15 rows selected (0.395 seconds)

  3. 看看为每个业务贡献的评论数量对所有企业在城市的总数。

SELECT name, city,review_count,
Sum(review_count) OVER (PARTITION BY City) AS city_reviews_sum
FROM business.jsonlimit 15;

namecityreview_countcity_reviews_sum
+—————————————-+————+—————+——————-+
Hi-HealthAhwatukee4258
My Wine CellarAhwatukee98258
U-HaulAhwatukee5258
Cupz N’ CrepesAhwatukee124258
McDonald’sAhwatukee7258
Kathy’s AlterationsAhwatukee12258
Healthy and Clean Living EnvironmentsAhwatukee4258
Active Kids PediatricsAhwatukee4258
Anthem Community CenterAnthem4913
Scrapbooks To RememberAnthem4913
Hungry Howie’s PizzaAnthem7913
Pinata NuevaAnthem3913
Starbucks Coffee CompanyAnthem13913
Pizza HutAnthem6913
Rays PizzaAnthem19913

15 rows selected (0.543 seconds)

对复杂的查询使用窗口函数

  1. 排名前十的城市名单和他们的排名最高的企业数量的评论。使用钻窗口等功能等级,dense_rank这些查询。

    WITH X
    AS
    (SELECT name, city, review_count,
    RANK()
    OVER (PARTITION BY city
    ORDER BY review_count DESC) AS review_rank
    FROM business.json)
    SELECT X.name, X.city, X.review_count
    FROM X
    WHERE X.review_rank =1 ORDER BY review_count DESC LIMIT 10;

    +——————————————-+————-+—————+
    | name | city | review_count |
    +——————————————-+————-+—————+
    | Mon Ami Gabi | Las Vegas | 4084 |
    | Studio B | Henderson | 1336 |
    | Phoenix Sky Harbor International Airport | Phoenix | 1325 |
    | Four Peaks Brewing Co | Tempe | 1110 |
    | The Mission | Scottsdale | 783 |
    | Joe’s Farm Grill | Gilbert | 770 |
    | The Old Fashioned | Madison | 619 |
    | Cornish Pasty Company | Mesa | 578 |
    | SanTan Brewing Company | Chandler | 469 |
    | Yard House | Glendale | 321 |
    +——————————————-+————-+—————+
    10 rows selected (0.49 seconds)

  2. 比较评论的数量为每个业务在城市顶部和底部评论数。

    SELECT name, city, review_count,
    FIRST_VALUE(review_count)
    OVER(PARTITION BY city ORDER BY review_count DESC) AS top_review_count,
    LAST_VALUE(review_count)
    OVER(PARTITION BY city ORDER BY review_count DESC) AS bottom_review_count
    FROM business.json limit 15;

    +—————————————-+————+—————+——————-+———————-+
    | name | city | review_count | top_review_count | bottom_review_count |
    +—————————————-+————+—————+——————-+———————-+
    | My Wine Cellar | Ahwatukee | 98 | 124 | 12 |
    | McDonald’s | Ahwatukee | 7 | 124 | 12 |
    | U-Haul | Ahwatukee | 5 | 124 | 12 |
    | Hi-Health | Ahwatukee | 4 | 124 | 12 |
    | Healthy and Clean Living Environments | Ahwatukee | 4 | 124 | 12 |
    | Active Kids Pediatrics | Ahwatukee | 4 | 124 | 12 |
    | Cupz N’ Crepes | Ahwatukee | 124 | 124 | 12 |
    | Kathy’s Alterations | Ahwatukee | 12 | 124 | 12 |
    | Q to U BBQ | Anthem | 74 | 117 | 117 |
    | Dara Thai | Anthem | 56 | 117 | 117 |
    | Cafe Provence | Anthem | 53 | 117 | 117 |
    | Shanghai Club | Anthem | 50 | 117 | 117 |
    | Two Brothers Kitchen | Anthem | 43 | 117 | 117 |
    | The Tennessee Grill | Anthem | 32 | 117 | 117 |
    | Dollyrockers Boutique and Salon | Anthem | 30 | 117 | 117 |
    +—————————————-+————+—————+——————-+———————-+
    15 rows selected (0.516 seconds)

  3. 比较的评论评论的数量之前和之后的业务。

    SELECT city, review_count, name,
    LAG(review_count, 1) OVER(PARTITION BY city ORDER BY review_count DESC)
    AS preceding_count,
    LEAD(review_count, 1) OVER(PARTITION BY city ORDER BY review_count DESC)
    AS following_count
    FROM business.json limit 15;

    +————+—————+—————————————-+——————+——————+
    | city | review_count | name | preceding_count | following_count |
    +————+—————+—————————————-+——————+——————+
    | Ahwatukee | 124 | Cupz N’ Crepes | null | 98 |
    | Ahwatukee | 98 | My Wine Cellar | 124 | 12 |
    | Ahwatukee | 12 | Kathy’s Alterations | 98 | 7 |
    | Ahwatukee | 7 | McDonald’s | 12 | 5 |
    | Ahwatukee | 5 | U-Haul | 7 | 4 |
    | Ahwatukee | 4 | Hi-Health | 5 | 4 |
    | Ahwatukee | 4 | Healthy and Clean Living Environments | 4 | 4 |
    | Ahwatukee | 4 | Active Kids Pediatrics | 4 | null |
    | Anthem | 117 | Roberto’s Authentic Mexican Food | null | 74 |
    | Anthem | 74 | Q to U BBQ | 117 | 64 |
    | Anthem | 64 | Outlets At Anthem | 74 | 56 |
    | Anthem | 56 | Dara Thai | 64 | 53 |
    | Anthem | 53 | Cafe Provence | 56 | 50 |
    | Anthem | 50 | Shanghai Club | 53 | 43 |
    | Anthem | 43 | Two Brothers Kitchen | 50 | 32 |
    +————+—————+—————————————-+——————+——————+
    15 rows selected (0.518 seconds)

←分析社交媒体 安装钻→

    原文作者:眼泪成诗hocc
    原文地址: https://segmentfault.com/a/1190000004322295
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞