1. 相关概念
Hive Metastore有三种配置方式(根据元数据划分,元数据一般存储在关系型数据库里面),分别是:
Embedded Metastore Database (Derby) 内嵌模式
把元数据存在内嵌的Derby里面,不支持多会话连接
Local Metastore Server 本地元存储(个人选择使用)
将元数据存储在本地的mysql
Remote Metastore Server 远程元存储
将元数据独立出来,数据存储在远程的mysql里面,避免每个客户端都安装mysql
内嵌模式安装
主要步骤如下:
- 安装mysql
- 创建mysql账户
- 创建hive元数据库
- 下载mysql驱动包
- 配置hive的相关文件
- 启动hive shell命令
- 查看元数据库信息
1)安装mysql
sudo apt-get install mysql-server mysql-client
2)创建hive账户
create user 'hive' identified by 'aa'
3)创建hive数据库,用以存放hive元数据
create database hive;
4)把hive数据授权给hive
账户
grant all privileges on *.* to 'hive'@'localhost' identified by 'hive';
5)下载mysql驱动包
6)进入/home/aa/jike/apache-hive-1.2.2-bin/conf
,基于模板添加hive的配置文件
aa@ubuntu:~$ cp hive-default.xml.template hive-site.xml
aa@ubuntu:~$ cp hive-default.xml.template hive-default.xml
运行hive
结果报错,需要给hive添加hadoop路径权限
aa@ubuntu:~$ hive
Logging initialized using configuration in jar:file:/home/aa/jike/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties
Exception in thread "main" java.lang.RuntimeException: java.io.IOException: Filesystem closed
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:323)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1057)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:554)
at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:599)
at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
... 7 more
添加权限
aa@ubuntu:~$ hadoop fs -chmod -R 777 /tmp/hive
再次尝试,如下则表示获取权限了
aa@ubuntu:~$ hadoop fs -ls /tmp/
Found 1 items
drwxrwxrwx - aa supergroup 0 2017-05-06 23:27 /tmp/hive
再次运行“hive`结果继续报错,URI配置有问题
aa@ubuntu:~$ hive
Logging initialized using configuration in jar:file:/home/aa/jike/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties
Exception in thread "main" java.lang.RuntimeException: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
at org.apache.hadoop.fs.Path.initialize(Path.java:148)
at org.apache.hadoop.fs.Path.<init>(Path.java:126)
at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:563)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
... 7 more
Caused by: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
at java.net.URI.checkPath(URI.java:1823)
at java.net.URI.<init>(URI.java:745)
at org.apache.hadoop.fs.Path.initialize(Path.java:145)
... 10 more
在tmp
路径下创建hive
目录,将所有的system:java.io.tmpdir
以及${system:java.io.tmpdir}/${system:user.name}
替换为/home/aa/jike/tmp/hive
aa@ubuntu:~$ mkdir /home/aa/jike/tmp/hive
再次尝试,运行hive
发现如下,hive已跑通,内嵌模式成功
aa@ubuntu:~$ hive
Logging initialized using configuration in jar:file:/home/simon/jike/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties
hive>
本地元存储安装
即在内嵌模式完成的基础上,将连接改为mysql模式
配置/home/aa/jike/apache-hive-1.2.2-bin/conf/hive-site.xml
文件,修改ConnectionURL 和 DriverName
aa@ubuntu:~$ vim hive-site.xml
##修改ConnectionURL
##原文件
...
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:;databaseName=metastore_db;create=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
...
## 修改为
...
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
...
##修改DriverName
##原文件
...
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.apache.derby.jdbc.EmbeddedDriver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
...
## 修改为
...
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql//localhost:3306/hive</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
...
##修改ConnectionUserName
##原文件
...
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>APP</value>
<description>Username to use against metastore database</description>
</property>
...
## 修改为
...
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value> ##创建的数据库用户名
<description>Username to use against metastore database</description>
</property>
...
##修改ConnectionPassword
##原文件
...
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>mine</value>
<description>password to use against metastore database</description>
</property>
...
## 修改为
...
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value> ##创建的数据库密码
<description>password to use against metastore database</description>
</property>
...
再次尝试,运行hive
发现如下,hive已跑通,内嵌模式成功
aa@ubuntu:~$ hive
Logging initialized using configuration in jar:file:/home/simon/jike/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties
hive>
使用hive
账户进入hive
数据库里面,可以看到创建的表
mysql> show tables;
+---------------------------+
| Tables_in_hive |
+---------------------------+
| BUCKETING_COLS |
| CDS |
| COLUMNS_V2 |
| DATABASE_PARAMS |
| DBS |
| FUNCS |
| FUNC_RU |
| GLOBAL_PRIVS |
| PARTITIONS |
| PARTITION_KEYS |
| PARTITION_KEY_VALS |
| PARTITION_PARAMS |
| PART_COL_STATS |
| ROLES |
| SDS |
| SD_PARAMS |
| SEQUENCE_TABLE |
| SERDES |
| SERDE_PARAMS |
| SKEWED_COL_NAMES |
| SKEWED_COL_VALUE_LOC_MAP |
| SKEWED_STRING_LIST |
| SKEWED_STRING_LIST_VALUES |
| SKEWED_VALUES |
| SORT_COLS |
| TABLE_PARAMS |
| TAB_COL_STATS |
| TBLS |
| VERSION |
+---------------------------+
29 rows in set (0.00 sec)
参考资料
1.Ubuntu 16.04 mysql安装配置
2.百度传课教程-Hive 环境搭建
3.在Hadoop伪分布式模式下安装Hive(derby,mysql)