索引是hive0.7之后才有的功能,创建索引需要评估其合理性,因为创建索引也是要磁盘空间,维护起来也是需要代价的
创建索引
hive> create index [index_studentid] on table student_3(studentid) > as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' > with deferred rebuild > IN TABLE index_table_student_3; OK Time taken: 12.219 seconds hive>
org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler :创建索引需要的实现类
index_studentid:索引名称
student_3:表名
index_table_student_3:创建索引后的表名
查看索引表(index_table_student_3)没有数据
hive> select*from index_table_student_3; OK Time taken: 0.295 seconds
加载索引数据
hive> alter index index_studentid on student_3 rebuild; WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases. Query ID = root_20161226235345_5b3fcc2b-7f90-4b10-861f-31cbaed8eb73 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1482824475750_0001, Tracking URL = http://hadoop-node4.com:8088/proxy/application_1482824475750_0001/ Kill Command = /usr/local/development/hadoop-2.6.4/bin/hadoop job -kill job_1482824475750_0001 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2016-12-26 23:55:40,317 Stage-1 map = 0%, reduce = 0% 2016-12-26 23:56:40,757 Stage-1 map = 0%, reduce = 0% 2016-12-26 23:56:48,768 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.08 sec 2016-12-26 23:57:34,981 Stage-1 map = 100%, reduce = 67%, Cumulative CPU 3.66 sec 2016-12-26 23:57:40,716 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.68 sec MapReduce Total cumulative CPU time: 4 seconds 680 msec Ended Job = job_1482824475750_0001 Loading data to table default.index_table_student_3 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 4.68 sec HDFS Read: 10282 HDFS Write: 537 SUCCESS Total MapReduce CPU Time Spent: 4 seconds 680 msec OK Time taken: 280.693 seconds
查询索引表中数据
hive> select*from index_table_student_3; OK 1 hdfs://hadoop-node4.com:8020/opt/hive/warehouse/student_3/sutdent.txt [0] 2 hdfs://hadoop-node4.com:8020/opt/hive/warehouse/student_3/sutdent.txt [28] 3 hdfs://hadoop-node4.com:8020/opt/hive/warehouse/student_3/sutdent.txt [56] 4 hdfs://hadoop-node4.com:8020/opt/hive/warehouse/student_3/sutdent.txt [85] 5 hdfs://hadoop-node4.com:8020/opt/hive/warehouse/student_3/sutdent.txt [113] 6 hdfs://hadoop-node4.com:8020/opt/hive/warehouse/student_3/sutdent.txt [143] Time taken: 2.055 seconds, Fetched: 6 row(s) hive>
查看
hdfs://hadoop-node4.com:8020/opt/hive/warehouse/student_3/sutdent.txt
[root@node4 node4]# hdfs dfs -text /opt/hive/warehouse/student_3/sutdent.txt; 001 0 BeiJing xinlang@.com 002 1 ShangHaixinlang@.com 003 0 ShegZhen xinlang@.com 004 1 NanJing xinlang@.com 005 0 GuangDong xinlang@.com 006 1 HaiNan xinlang@.com[root@node4 node4]#
删除索引
DROP INDEX index_studentid on student_3;
查看索引
hive> SHOW INDEX on student_3; OK index_studentid student_3 studentid index_table_student_3 compact Time taken: 0.487 seconds, Fetched: 1 row(s) hive>