hive中orc表sqoop导出到mysql

环境:

sqoop版本:1.4.6
hive版本:2.1.1
hadoop版本:2.7.3

起因:

orc表导出到mysql有时碰到空字符串的问题一直没有时间整理,这次测试一下

结论:

hive中为string类型的空字符串‘’数据导出到mysql中bigint或者int类型字段时会报错。
空值null没问题。

过程:

1、创建临时库分区orc表

CREATE TABLE temp.js_pianyuan_orc(
  dates string, 
  city_id string, 
  starttime string, 
  endtime string, 
  programid string, 
  device_id string, 
  ip string)
PARTITIONED BY ( 
  curdate string)
stored as orc;

2、创建mysql测试表

注意hive中的表字段和mysql中的表字段名字必须一致

CREATE TABLE js_pianyuan(
  dates varchar(64) DEFAULT NULL, 
  city_id varchar(64) DEFAULT NULL, 
  starttime varchar(64) DEFAULT NULL, 
  endtime varchar(64) DEFAULT NULL, 
  programid varchar(64) DEFAULT NULL, 
  device_id varchar(64) DEFAULT NULL, 
  ip varchar(64) DEFAULT NULL)
ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT;

3、数据检查

select count(1) from dm_bas.js_pianyuan where curdate = '20180122' and (dates is null or trim(dates) = '');
select count(1) from dm_bas.js_pianyuan where curdate = '20180122' and (city_id is null or trim(city_id) = '');
select count(1) from dm_bas.js_pianyuan where curdate = '20180122' and (starttime is null or trim(starttime) = '');
5040
select count(1) from dm_bas.js_pianyuan where curdate = '20180122' and (endtime is null or trim(endtime) = '');
5040
select count(1) from dm_bas.js_pianyuan where curdate = '20180122' and (programid is null or trim(programid) = '');
5144
select count(1) from dm_bas.js_pianyuan where curdate = '20180122' and (device_id is null or trim(device_id) = '');
select count(1) from dm_bas.js_pianyuan where curdate = '20180122' and (ip is null or trim(ip) = '');

最后确认starttime、endtime和programid 有空字符串数据

4、22号日期原数据测试导出

insert overwrite table temp.js_pianyuan_orc partition(curdate)
select * from dm_bas.js_pianyuan where curdate = '20180122';
sqoop export \
  --hcatalog-database temp \
  --hcatalog-table js_pianyuan_orc \
  --hcatalog-partition-keys curdate \
  --hcatalog-partition-values 20180122 \
  --connect jdbc:mysql://ip:3306/test \
  --username username \
  --password passwd --m 10\
  --table js_pianyuan
结果没有异常
18/01/24 10:34:58 INFO mapreduce.ExportJobBase: Transferred 178.8203 MB in 1,341.2754 seconds (136.5208 KB/sec)
18/01/24 10:34:58 INFO mapreduce.ExportJobBase: Exported 15558677 records.

5、空字符串改成字符串null测试导出

22号原始数据插入测试表23号

insert overwrite table temp.js_pianyuan_orc partition(curdate='20180123')
select 
case when trim(dates) = '' then 'null' else  dates end,
case when trim(city_id) = '' then 'null' else  city_id end,
case when trim(starttime) = '' then 'null' else  starttime end,
case when trim(endtime) = '' then 'null' else  endtime end,
case when trim(programid) = '' then 'null' else  programid end,
case when trim(device_id) = '' then 'null' else  device_id end,
case when trim(ip) = '' then 'null' else ip end
from dm_bas.js_pianyuan where curdate = '20180122';
结果没有异常
18/01/24 10:48:23 INFO mapreduce.ExportJobBase: Transferred 178.8589 MB in 714.5818 seconds (256.3058 KB/sec)
18/01/24 10:48:23 INFO mapreduce.ExportJobBase: Exported 15558677 records.

6、空字符串改成null测试

22号原始数据插入测试表24号

insert overwrite table temp.js_pianyuan_orc partition(curdate='20180124')
select 
case when trim(dates) = '' then null else  dates end,
case when trim(city_id) = '' then null else  city_id end,
case when trim(starttime) = '' then null else  starttime end,
case when trim(endtime) = '' then null else  endtime end,
case when trim(programid) = '' then null else  programid end,
case when trim(device_id) = '' then null else  device_id end,
case when trim(ip) = '' then null else ip end
from dm_bas.js_pianyuan where curdate = '20180122';

结果没有异常

18/01/24 11:25:53 INFO mapreduce.ExportJobBase: Transferred 178.8346 MB in 1,275.8381 seconds (143.5344 KB/sec)
18/01/24 11:25:53 INFO mapreduce.ExportJobBase: Exported 15558677 records.

7、更改mysql字段类型为bigint测试导出22号含有空字符串原数据

alter table js_pianyuan change programid `programid` bigint(64);

《hive中orc表sqoop导出到mysql》 空字符串异常

结果:

hive中的string类型字段有空字符串数据的导出到mysql端varchar字段类型没问题,但是导出mysql端int或者bigint类型会报异常。这种数据要单独处理。

8、更改mysql字段类型为bigint测试导出24号含有空值null的原数据

18/01/24 12:35:00 INFO mapreduce.ExportJobBase: Transferred 178.8346 MB in 1,100.6016 seconds (166.3878 KB/sec)
18/01/24 12:35:00 INFO mapreduce.ExportJobBase: Exported 15558677 records.

结果没有异常。

    原文作者:AI_leef
    原文地址: https://www.jianshu.com/p/e6c6bfef33bc
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞