HIVE安装与使用-内嵌模式

2023年1月23日 192次阅读来源: 鹅鹅鹅_

一、Hive简介

什么是Hive
- Hive是基于Hadoop的一个数据仓库工具，可以将结构化的数据文件映射为一张数据库表，并提供类SQL查询功能。
- 本质是将SQL转换为MapReduce程序。
- Hive本身不存储数据，完全依赖于HDFS和MapReduce，Hive可以将结构化的数据文件映射为一张数据库表，Hive中表纯逻辑，就是表的元数据。而Hbase是物理表，定位是NoSQL。
为什么使用Hive
- 操作接口采用类SQL语法，提供快速开发的能力。
- 避免了去写MapReduce，减少开发人员的学习成本。
- 扩展功能很方便。
Hive的特点
- 可扩展
  Hive可以自由的扩展集群的规模，一般情况下不需要重启服务。
- 延展性
  Hive支持用户自定义函数，用户可以根据自己的需求来实现自己的函数。
- 容错
  良好的容错性，节点出现问题SQL仍可完成执行。
Hive的运行模式
- 内嵌模式
  将元数据保存在本地内嵌的 Derby 数据库中，这是使用Hive最简单的方式。但是这种方式缺点也比较明显，因为一个内嵌的 Derby 数据库每次只能访问一个数据文件，这也就意味着它不支持多会话连接。
- 本地模式
  这种模式是将元数据保存在本地独立的数据库中（一般是MySQL），这用就可以支持多会话和多用户连接了。
- 远程模式
  此模式应用于 Hive客户端较多的情况。把MySQL数据库独立出来，将元数据保存在远端独立的 MySQL服务中，避免了在每个客户端都安装MySQL服务从而造成冗余浪费的情况。

二、安装与配置

首先要安装hadoop
略。

下载hive
网址：http://hive.apache.org/downloads.html

[hadoop@master ~]$ wget http://www-eu.apache.org/dist/hive/stable-2/apache-hive-2.1.1-bin.tar.gz
[hadoop@master ~]$ tar -xvf apache-hive-2.1.1-bin.tar.gz 
[hadoop@master ~]$ cd apache-hive-2.1.1-bin
[hadoop@master apache-hive-2.1.1-bin]$ ls
bin  conf  examples  hcatalog  jdbc  lib  LICENSE  NOTICE  README.txt  RELEASE_NOTES.txt  scripts
[hadoop@master apache-hive-2.1.1-bin]$ pwd
/home/hadoop/apache-hive-2.1.1-bin

设置环境变量

[hadoop@master apache-hive-2.1.1-bin]$ vim ~/.bash_profile 
export HIVE_HOME=/home/hadoop/apache-hive-2.1.1-bin
export PATH=$HIVE_HOME/bin:$PATH
[hadoop@master apache-hive-2.1.1-bin]$ . ~/.bash_profile

内嵌模式

修改 Hive 配置文件
$HIVE_HOME/conf对应的是Hive的配置文件路径,该路径下的hive-site.xml是Hive工程的配置文件。默认情况下，该文件并不存在，我们需要拷贝它的模版来实现：

[hadoop@master conf]$ cp hive-default.xml.template hive-site.xml

hive-site.xml 的主要配置有：

#该参数指定了 Hive 的数据存储目录，默认位置在 HDFS 上面的 /user/hive/warehouse 路径下。
 <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive/warehouse</value>
    <description>location of default database for the warehouse</description>
  </property>
  #该参数指定了 Hive 的数据临时文件目录，默认位置为 HDFS 上面的 /tmp/hive 路径下。
  <property>
    <name>hive.exec.scratchdir</name>
    <value>/tmp/hive</value>
    <description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/&lt;username&gt; is created, with ${hive.scratch.dir.permission}.</description>
  </property>

修改 Hive 目录下 /conf/hive-env.sh 文件

[hadoop@master conf]$ cp hive-env.sh.template hive-env.sh
# Set HADOOP_HOME to point to a specific hadoop install directory
HADOOP_HOME=/home/hadoop/hadoop-2.7.3

# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=/home/hadoop/apache-hive-2.1.1-bin/conf

# Folder containing extra ibraries required for hive compilation/execution can be controlled by:
export HIVE_AUX_JARS_PATH=/home/hadoop/apache-hive-2.1.1-bin/lib

创建必要目录

[hadoop@master ~]$ hdfs dfs -ls /
Found 3 items
drwx------   - hadoop supergroup          0 2017-04-06 18:01 /tmp
drwxr-xr-x   - hadoop supergroup          0 2017-04-06 17:58 /user
drwxr-xr-x   - hadoop supergroup          0 2017-04-06 17:58 /usr
[hadoop@master ~]$ hdfs dfs -ls /user
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2017-04-08 11:00 /user/hadoop
#创建目录
[hadoop@master ~]$ hdfs dfs -mkdir -p /user/hive/warehouse
[hadoop@master ~]$ hdfs dfs -mkdir -p /tmp/hive
#赋予写权限
[hadoop@master ~]$ hdfs dfs -chmod a+w /tmp/hive
[hadoop@master ~]$ hdfs dfs -chmod a+w /user/hive/warehouse

修改 io.tmpdir 路径
同时，要修改 hive-site.xml 中所有包含 ${system:java.io.tmpdir} 字段的 value 即路径，你可以自己新建一个目录来替换它，例如 /home/Hadoop/cloud/apache-hive-2.1.1-bin/iotmp。然后使用vim全局替换命令替换

#这里是本地路径，不是hdfs路径
[hadoop@master conf]$ mkdir /home/hadoop/apache-hive-2.1.1-bin/iotmp
#vim全局替换
%s#${system:java.io.tmpdir}#/home/hadoop/cloud/apache-hive-2.1.1-bin/iotmp#g  
#还需要将如下的system:删除
 ${system:java.io.tmpdir}/${ system:user.name}

三、运行Hive

初始化

#首先要运行服务metastore
[hadoop@master apache-hive-2.1.1-bin]$ hive --service metastore
#初始化derby
[hadoop@master apache-hive-2.1.1-bin]$ schematool -initSchema -dbType derby
#启动hive
[hadoop@master apache-hive-2.1.1-bin]$ hive
hive>

重新初始化derby需要删除目录:

[hadoop@master apache-hive-2.1.1-bin]$ rm -rf metastore_db/

创建数据库

hive> create database db_hive_test;
OK
Time taken: 0.282 seconds

切换到新建数据库并查看databases

hive> use db_hive_test;
OK
Time taken: 0.016 seconds
hive> show databases;
OK
db_hive_test
default
Time taken: 0.013 seconds, Fetched: 2 row(s)

创建测试表

hive> create table student(id int,name string) row format delimited fields terminated by '\t';
OK
Time taken: 0.4 seconds
hive> desc student;
OK
id                      int                                         
name                    string                                      
Time taken: 0.052 seconds, Fetched: 2 row(s)

装载本地数据到Hive测试表

#先在本地创建测试文件student.txt
[hadoop@master hive]$ cat student.txt 
1   zhangsan
2   baiqio
333 aaadf
#上传并加载测试文件到Hive表
hive> load data local inpath '~/hive/student.txt' into table db_hive_test.student;
FAILED: SemanticException Line 1:23 Invalid path ''~/hive/student.txt'': No files matching path file:/home/hadoop/apache-hive-2.1.1-bin/~/hive/student.txt
hive> load data local inpath 'hive/student.txt' into table db_hive_test.student;
FAILED: SemanticException Line 1:23 Invalid path ''hive/student.txt'': No files matching path file:/home/hadoop/apache-hive-2.1.1-bin/hive/student.txt
hive> load data local inpath '../hive/student.txt' into table db_hive_test.student;
Loading data to table db_hive_test.student
OK
Time taken: 0.675 seconds

操作student表

hive> select * from student;
OK
1   zhangsan
2   baiqio
333 aaadf
Time taken: 1.006 seconds, Fetched: 3 row(s)
hive> select * from student where id=1;
OK
1   zhangsan
Time taken: 0.445 seconds, Fetched: 1 row(s)

本地文件student.txt上传到hdfs路径

[hadoop@master hive]$ hdfs dfs -ls /user/hive/warehouse
Found 1 items
drwxrwxrwx   - hadoop supergroup          0 2017-04-08 14:37 /user/hive/warehouse/db_hive_test.db
[hadoop@master hive]$ hdfs dfs -ls /user/hive/warehouse/db_hive_test.db
Found 1 items
drwxrwxrwx   - hadoop supergroup          0 2017-04-08 14:47 /user/hive/warehouse/db_hive_test.db/student
[hadoop@master hive]$ hdfs dfs -ls /user/hive/warehouse/db_hive_test.db/student
Found 1 items
-rwxrwxrwx   2 hadoop supergroup         30 2017-04-08 14:47 /user/hive/warehouse/db_hive_test.db/student/student.txt

从HDFS文件导入数据到Hive

#先上传文件
[hadoop@master hive]$ cat student.txt 
4   zhangsan
5   baiqio
6   aaadf
[hadoop@master hive]$ hdfs dfs -mkdir hive
[hadoop@master hive]$ hdfs dfs -put student.txt hive/student.txt2
#再导入数据
hive> load data inpath 'hive/student.txt2' into table student;
Loading data to table db_hive_test.student
OK
Time taken: 0.324 seconds
hive> select * from student;
OK
1   zhangsan
2   baiqio
333 aaadf
4   zhangsan
5   baiqio
6   aaadf
Time taken: 0.21 seconds, Fetched: 6 row(s)

Hive查询结果导出到文件

#注意：第一行无分号
hive> insert overwrite local directory '/home/wyp/Documents/result'
hive> select * from test;
#还可以导出到HDFS文件系统
hive> insert overwrite directory '/home/wyp/Documents/result'
hive> select * from test;
#最好指定列分隔符
hive> insert overwrite local directory '/home/wyp/Documents/result'
hive> row format delimited
hive> fields terminated by '\t'
hive> select * from test;

将数据抽象成数据库表后，对数据的操作和统计是非常方便的。

    原文作者：鹅鹅鹅_
    原文地址: https://www.jianshu.com/p/a9512aa7ae68
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。