hive常用语法

2023年4月8日 292次阅读来源: Cocoapods

一、表的操作

1、创建表、分区、分隔符

create table table_name(

name string,

age string)

partitioned by (

dt string)

row format delimited fields terminated by ‘\t’;

dt为分区，并且指定分隔符为\t hive默认分隔符\001

2、删除表数据、删除表和sql相同

truncate table ,drop table;

3、删除表的分区

alter table table_name drop if exists partition (dt = 20180808);

4、表添加字段

alter table table_name add columns(c1 int, c2 int);

5、分区表表结构的复制和包含数据的复制

二、数据的导入和导出

1、从本地文件加载

load data local inpath ‘/home/coco/demo/*’ into table test_table

2、hive查询数据写入到本地文件并且指定分隔符

insert overwrite local directory ‘/home/coco/output’ row format delimited fields terminated by ‘\t’ select age,count(*) from test_name group by age order by age;

3、sqoop导入

sqoop import –connect jdbc:mysql://10.1.1.1:3306/user_center –username root –password 123456 –query “select * from user where age=18 and \$CONDITIONS ” –hive-import –target-dir /user/coco/user_all/city=beijing –hive-database user_center –hive-table test_table –split-by age –hive-partition-key city –hive-partition-value beijing –fields-terminated-by ‘\t’

\CONDITIONS为必填的

jdbc:mysql:// mysql地址/端口/库名 –username mysql用户名 –password mysql密码 –query 设为指定sql查询 –target-dir目标集群路径(可以查看表所在的集群路径,分区) –hive-database hive的库名 –hive-table hive的表名 –split-by 分割字段(尽量选取均匀分布的字段) –hive-partition-key 如果有是分区表分区的key –hive-partition-value 分区的值 –fields-terminated-by 分隔符

sqoop导入的防止只支持单分区表，如果需要导入多分区表，需要复杂些提供两种思路

直接把数据导入到表所在的路径，然后加载分区

或者建一个临时的不分区表，通过动态分区加载到分区表中

sqoop导入出错报java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@5f058f00 is still active. No statements may be…

–driver com.mysql.jdbc.Driver

接着出现The last packet sent successfully to the server was 0 milliseconds a

jdbc:mysql://10.1.1.1:3306/user_center?autoReconnect=true

4、sqoop导出

sqoop export –connect “jdbc:mysql://10.1.1.1:3306:3306/user_center?useUnicode=true&characterEncoding=utf-8” –username root –password 123456 –table user –export-dir /user/hive/warehouse/user_center.db/user_center –input-fields-terminated-by ‘\001’ –input-null-string ‘\\N’ –input-null-non-string ‘\\N’

三、数据处理

1、hive的去重

select * from( select *,row_number() over (partition by source_url order by phone asc) num from spider) t where t.num=1

partition by 后是需要去重的字段，可以添加多个，逗号分割。order by根据什么字段排序，决定着去重的时候保留去重哪些条数据，保留那一条数据

2、hive动态分区

hive -v -e “

use test_db;

set hive.exec.dynamic.partition=true;

set hive.exec.dynamic.partition.mode=nonstrict;

set hive.cli.print.header=true;

insert overwrite table user_table partition(city,dt) select a,b,c,city,dt from user_temp where age>18;

“

主要设置

set hive.exec.dynamic.partition=true;

set hive.exec.dynamic.partition.mode=nonstrict;

查询的最后字段就是分区的字段

3、hive引用python脚本处理数据

hive -v -e”

use test_db;

set hive.cli.print.header=false;

set hive.exec.dynamic.partition.mode=nonstrict;

set mapreduce.map.memory.mb=1025;

set mapreduce.reduce.memory.mb=1025;

add file /data/coco/house_.py;

drop table if exists test_table;

create table test_table as

SELECT TRANSFORM(a,b,c,d,e,f) USING ‘house_.py’

AS a,b,c FROM user_table;”

python脚本中

for line in sys.stdin:

(a,b,c) = line.split(‘\t’)

d=a+b

e=1

f=2

print ‘\t’.join([a,b,c,d,e,f])

    原文作者：Cocoapods
    原文地址: https://www.jianshu.com/p/60b1f6fd6a45
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。

一、表的操作

1、创建表、分区、分隔符

2、删除表数据、删除表和sql相同

3、删除表的分区

4、表添加字段

二、数据的导入和导出

1、从本地文件加载

2、hive查询数据写入到本地文件 并且指定分隔符

3、sqoop导入

4、sqoop导出

三、数据处理

1、hive的去重

2、hive动态分区

3、hive引用python脚本处理数据

2、hive查询数据写入到本地文件并且指定分隔符