目的
数据技术团队,每个成员经常要编写SQL来查询和统计数据,数据源有Hive和DB等,每个人编写习惯或多或少有一些差异,这些差异并不一定会影响结果,但是团队中的其他人阅读代码时会产生困扰,尤其是一些复杂的多表关联查询SQL,更是让很多代码阅读者困惑,因此有必要在团队内部制定SQL编写规范,在SQL代码书写上达成一致,提高可视化和良好的阅读体验。编写规范
A、字段别名尽量不要使用拼音,要使用比较贴切的英文单词组合,做到文词达意;
B、字段别名尽量使用下划线分割多个单词;
例如:login_cnt(登录次数)、login_users(登录人数)等;
C、字段别名和子查询别名时不要省略as关键字;
例如:
count(distinct user_id) as login_users
count(user_id) as login_cnt
D、SQL语句编写使用全小写字符,除非有个别大小写敏感的字段名称;
E、SQL语句中如遇from、where、on、left join、inner join、right join、group by、order by等关键字,单独占一行;
例如:
select user_id,user_name,gender
from user
where reg_time>='2017-05-01' and reg_time<'2017-06-01'
F、避免在单独的SQL查询语句中写表别名;
反例:
select u.user_id,u.user_name,u.gender
from user as u
where u.reg_time>='2017-05-01' and u.reg_time<'2017-06-01'
在这里给user 起别名为u,没有任何意义,徒增代码量和阅读障碍
G、时间范围作为查询条件时,请使用>=和<作为时间范围,不要使用>=和<=这样的写法;
例如:
查询5月份登录用户数
select count(distinct user_id) as login_users
from user_login
where login_time>='2017-05-01' and login_time<'2017-06-01'
反例:
select count(distinct user_id) as login_users
from user_login
where login_time>='2017-05-01' and login_time<='2017-05-31 23:59:59'
这样写法会因为login_time时间精度的不同导致漏掉一些数据;且书写时比较复杂;
H、子查询别名尽量使用it (inner table)、t1 (temp table 1) 和rs1(record result 1)命名;
例如:
select min_date,count(mobile) as mobiles
from
(
select mobile,min(vdate) as min_date
from
(
select vdate,mobile
from
(
select vdate,split(mobile,',') as mobiles
from s_jd_picked
) as it1
lateral view outer explode(mobiles) explode_table1 as mobile
) as t1
where length(mobile)=11
group by mobile
) as rs
where min_date>='2017-04-01' and min_date<'2017-05-01'
group by min_date
order by min_date
I、SQL查询语句要有层次结构,使用4个空格缩进,在Hive中制表符不能作为缩进;
J、单行字符长度不要超过80个字符,如果超过请换行,提高代码阅读性;
- 下面提供一个较完整的例子:
例如:
select date(schedule_date) as schedule_date,
sum(cast(onBoardCount as float)) as first_billing_money,
sum(cast(secondValue as float)) as second_billing_money
from
(
select t1.project_id,t1.schedule_id,
case when t2.schedule_date is null then created_time else schedule_date end as schedule_date,
t1.mobile,interview_status
from
(
select mobile as mobile,projectId as project_id,
cast(projectScheduleId as int) as schedule_id,created_time,
cast(interviewerStatus as int) as interview_status
from mf_project_track
where created_time>='2017-01-01'
) as t1
left join
(
select projectId as project_id,id as schedule_id,
to_timestamp((executionDate) as schedule_date
from mf_project_schedule
) as t2
on t1.project_id=t2.project_id and t1.schedule_id=t2.schedule_id
) as rs1
inner join
mf_project as rs2
on cast(rs1.project_id as int)=rs2.id
where schedule_date>='2017-04-01'
group by date(schedule_date)