场景描述
现有分区表 dwh_reg_user_logins_latest,现在需要在这个表的基础上增加3个字段
基本思路
基本思路是再次新建一个表 dwh_reg_user_logins_latest_new,将旧表中的数据的基础上增加3个字段填入数据,再删除旧表 dwh_reg_user_logins_latest,并删除HDFS对应的文件路径,与此同时新建一个空表 dwh_reg_user_logins(这个表的结构与新表相同),再写脚本将 dwh_reg_user_logins_latest中的数据按分区导入 dwh_reg_user_logins_latest 中
注意:重命名不能简单的 ALETR TABLE table_a CHANGE NAME TO table_b,因为在Hive中,数据存储在hdfs中,必须对存储的数据文件做修改
脚本&&语句
建立新临时表
CREATE EXTERNAL TABLE `dwh_reg_user_logins_latest_new`( `uid` string, `client` string, `channel` string, `vers` string, `last_time` string, `devi` string, `sys` string, `dname` string, `mac` string, `imei` string, `ifa` string, `reso` string, `euid` string, `dpi` string, `host` string, `ip` string, `user_id` string, `country` string, `province` string, `city` string ) PARTITIONED BY(p_day BIGINT) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe' STORED AS RCFILE LOCATION '/user/hive/warehouse/dg/bigtables/dwh_reg_user_logins_latest_new';
新表灌入数据
#!/bin/sh . /etc/profile if [ $# -eq 0 ] then STR_DAY=`date -d "-1 day" +%Y%m%d` CUR_DATE=`date -d "-1 day" +%Y-%m-%d` CUR_PARTITION=`date -d "-1 day" +%Y%m%d` BEFORE_PARTITION=`date -d "-2 day" +%Y%m%d` END_DATE=`date -d "-1 day" +%Y-%m-%d` END_PARTITION=`date -d "-1 day" +%Y%m%d` elif [ $# -eq 1 ] then format_day=`echo $1|grep -o '[0-9]\{8\}'` format_hour=`echo $1|grep -o '[0-9]\{2\}$'` STR_DAY=`date -d "$format_day" +%Y%m%d` CUR_DATE=`date -d "$format_day" +%Y-%m-%d` CUR_PARTITION=`date -d "$format_day" +%Y%m%d` BEFORE_PARTITION=`date -d "-1 day $format_day" +%Y%m%d` END_DATE=`date -d "${format_day}" +%Y-%m-%d` END_PARTITION=`date -d "${format_day}" +%Y%m%d` else echo "the args is wrong ,you should give it like '2014092307'" exit 1; fi for day_month in 201708 do for day_day in 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 do day=${day_month}${day_day} echo $day query="set hive.exec.dynamic.partition.mode=nonstrict;\ insert overwrite table dwh_reg_user_logins_latest partition (p_day) \ select * from dwh_reg_user_logins_latest_new where p_day=$day" hive -e "${query}" done done
- 删除临时表
hive > drop table dwh_reg_user_logins_latest_new
hadoop dfs -rm -r datanode001:9000//user/hive/warehouse/dg/bigtables/dwh_reg_user_logins_latest_new