Hive导入数据

  • 向管理表中导入数据
load data local inpath '/Users/wesley/apps/data/hive_learning_data/employees/employees.txt' overwrite into table employees partition (country='US',state='CA');
  • 从其他表导入数据
insert overwrite table employees
partition (country='US',state='CA')
select *
from staged_employees t2
where t2.country='US' and t2.state='CA';

如果staged_employees表有多个state需要插入到employees表的对应state分区中,我们可以这样写:

from staged_employees t2
insert overwrite table employees
          partition (country='US',state='CA')
select * where t2.country='US' and t2.state='CA'
insert overwrite table employees
          partition (country='US',state='OR')
select * where t2.country='US' and t2.state='OR'
insert overwrite table employees
          partition (country='US',state='IL')
select * where t2.country='US' and t2.state='IL'

通过上面的sql我们可以一次插入多个分区,但是如果需要插入的分区太多,写起来的sql太长太繁琐,这时候我们可以利用动态分区来插入:

  • 动态分区插入
insert overwrite table employees
          partition (country,state)
select name,salary,subordinates,deductions,address,country,state
from staged_employees

注意分区字段country、state必须写在select最后

  • 单个查询语句中创建表并导入数据
create table employees_us
as
select name,salary,subordinates,deductions,address,country,state
from staged_employees
where country='US'
    原文作者:pptb
    原文地址: https://www.jianshu.com/p/c0ad73c6550e
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞