Hive之环境安装

2019年6月7日 205次阅读来源: 阿坤的博客

本文讲述的是如何在CentOS 7中搭建Hive 1.2.2集群环境，并运行一个简单查询的例子。

涉及内容：
1.安装
2.简单操作

1.安装

1.1.下载

下载地址：传送门

1.2.解压

tar -zxvf /opt/soft-install/apache-hive-1.2.2-bin.tar.gz -C /opt/soft

1.3.修改配置文件

1.3.1.修改conf/hive-env.sh

cp hive-env.sh.template hive-env.sh

加入

HADOOP_HOME=/opt/soft/hadoop-2.7.3

1.3.2.新建conf/hive-site.xml

vi hive-site.xml

加入

<configuration>
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://127.0.0.1:3306/hive?createDatabaseIfNotExist=true</value>
    <description>JDBC connect string for a JDBC metastore</description>
  </property>
  
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>
  
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
    <description>username to use against metastore database</description>
  </property>
  
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>root</value>
    <description>password to use against metastore database</description>
  </property>

  <property>
    <name>hive.cli.print.current.db</name>
    <value>true</value>
  </property>

  <property>
    <name>hive.cli.print.header</name>
    <value>true</value>
  </property>

  <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive/warehouse</value>
  </property>
</configuration>

配置使用MySQL来保存Hive的元数据表，其中：
hive.cli.print.current.db=true 表示显示当前数据库
hive.cli.print.header=true 表示显示表头
hive.metastore.warehouse.dir=/user/hive/warehouse 表示指定Hive数据库存储路径

1.4.拷贝MySQL的jar到hive/lib下

《Hive之环境安装》 mysql

2.简单操作

2.1.启动shell

必须先启动hadoop集群

bin/hive

2.2.查询

show tables;
show databases |schemas;
show partitions table_name;
show functions;
desc extended table_name;
desc formatted table_name;
describe database database_name;
describe table table_name;

2.3.DDL

2.3.1.创建数据库

create database if not exists test;

查看数据库

show databases;

使用数据库

use test;

2.3.2.创建表

1、创建内部表

create table student(sno int,sname string,sex string,sage int,sdept string) row format delimited fields terminated by ',';

2、删除表

drop table student

2.3.3.导入数据

load data local inpath '/opt/soft-install/data/student.txt' overwrite into table student;

其中student.txt

1001,张三,男,22,高一
1002,李四,女,25,高二

2.4.DML

2.4.1.查询数据

hive> select * from student;
OK
1001    张三    男      22      高一
1002    李四    女      25      高二
Time taken: 1.464 seconds, Fetched: 2 row(s)

hive> select count(*) from student;
Query ID = hadoop_20180514002841_6dd61d4a-6c8c-4c22-aada-5e3bc89b9cbb
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1526265117233_0001, Tracking URL = http://hadoop1:8088/proxy/application_1526265117233_0001/
Kill Command = /opt/soft/hadoop-2.7.3/bin/hadoop job  -kill job_1526265117233_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2018-05-14 00:28:59,129 Stage-1 map = 0%,  reduce = 0%
2018-05-14 00:29:09,174 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.62 sec
2018-05-14 00:29:19,709 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 4.02 sec
MapReduce Total cumulative CPU time: 4 seconds 20 msec
Ended Job = job_1526265117233_0001
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 4.02 sec   HDFS Read: 6856 HDFS Write: 2 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 20 msec
OK
2
Time taken: 39.698 seconds, Fetched: 1 row(s)

2.3.5.删除数据

1、清空表数据

insert overwrite table student select * from student where 1=0;

truncate table student

    原文作者：阿坤的博客
    原文地址: https://www.jianshu.com/p/9e5cb579f731
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。