Hive元数据基础

一、存储Hive版本的元数据表

1.version:存储Hive版本
《Hive元数据基础》

version表存hive的版本信息,该表中数据只有一条,如果存在多条,会造成hive启动不起来。

mysql> SELECT * FROM VERSION;
+--------+----------------+-----------------------------------------+
| VER_ID | SCHEMA_VERSION | VERSION_COMMENT                         |
+--------+----------------+-----------------------------------------+
|      1 | 1.1.0          | Set by MetaStore hadoop@192.168.137.141 |
+--------+----------------+-----------------------------------------+
1 row in set (0.00 sec)

二、Hive数据库相关的元数据表

1.dbs:该表存储Hive中所有数据库的基本信息
《Hive元数据基础》

mysql> select * from dbs;
+-------+------------------------------------+--------------------------------------------------------------------+-----------------+------------+------------+
| DB_ID | DESC                               | DB_LOCATION_URI                                                    | NAME            | OWNER_NAME | OWNER_TYPE |
+-------+------------------------------------+--------------------------------------------------------------------+-----------------+------------+------------+
|     1 | Default Hive database              | hdfs://192.168.137.141:9000/user/hive/warehouse                    | default         | public     | ROLE       |
|     9 | this is ruozedata 03 test database | hdfs://192.168.137.141:9000/user/hive/warehouse/hive2_ruozedata.db | hive2_ruozedata | hadoop     | USER       |
|    11 | NULL                               | hdfs://192.168.137.141:9000/user/hive/warehouse/ruozedata.db       | ruozedata       | hadoop     | USER       |
+-------+------------------------------------+--------------------------------------------------------------------+-----------------+------------+------------+
//与hive中的database是一一对应的
hive> show databases;
OK
default
hive2_ruozedata
ruozedata
Time taken: 0.095 seconds, Fetched: 3 row(s)

2.database_params:该表存储数据库的相关参数,在CREATE DATABASE时候用
《Hive元数据基础》

DBS和DATABASE_PARAMS这两张表通过DB_ID字段关联

WITH DBPROPERTIES (property_name=property_value, …)指定的参数
mysql> select * from database_params;
+-------+-----------+-------------+
| DB_ID | PARAM_KEY | PARAM_VALUE |
+-------+-----------+-------------+
|     9 | creator   | ruoze       |
|     9 | date      | 2018-06-17  |
+-------+-----------+-------------+
//只有DB_ID=9的数据库有,因为当时创建数据库的语句为:
//create database hive2_ruozedata comment 'this is ruozedata 03 test database' with dbproperties('creator'='ruoze','date'='2018-06-17');
//hive中查看database的详细信息
hive> desc database extended hive2_ruozedata;
OK
hive2_ruozedata this is ruozedata 03 test database      hdfs://192.168.137.141:9000/user/hive/warehouse/hive2_ruozedata.db      hadoop  USER    {date=2018-06-17, creator=ruoze}
Time taken: 0.102 seconds, Fetched: 1 row(s)

三、Hive表和视图相关的元数据表

1.tbls:该表中存储Hive表、视图、索引表的基本信息
《Hive元数据基础》

//查看DB_ID=1的数据库中的表,即default数据库
mysql> select * from tbls where DB_ID=1;
+--------+-------------+-------+------------------+--------+-----------+-------+-----------------+---------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER  | RETENTION | SD_ID | TBL_NAME        | TBL_TYPE      | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT |
+--------+-------------+-------+------------------+--------+-----------+-------+-----------------+---------------+--------------------+--------------------+
|      1 |  1529135635 |     1 |                0 | hadoop |         0 |     1 | zh              | MANAGED_TABLE | NULL               | NULL               |
|     31 |  1529287737 |     1 |                0 | hadoop |         0 |    41 | hive_wc         | MANAGED_TABLE | NULL               | NULL               |
|     32 |  1529288829 |     1 |                0 | hadoop |         0 |    42 | rating_json     | MANAGED_TABLE | NULL               | NULL               |
|     33 |  1529289493 |     1 |                0 | hadoop |         0 |    43 | hive_rownumber  | MANAGED_TABLE | NULL               | NULL               |
|     76 |  1531056372 |     1 |                0 | hadoop |         0 |    91 | user_click_tmp1 | MANAGED_TABLE | NULL               | NULL               |
|     78 |  1531056697 |     1 |                0 | hadoop |         0 |    95 | user_click1     | MANAGED_TABLE | NULL               | NULL               |
|     81 |  1532851994 |     1 |                0 | hadoop |         0 |   101 | hive_array      | MANAGED_TABLE | NULL               | NULL               |
|     83 |  1532855555 |     1 |                0 | hadoop |         0 |   103 | hive_map        | MANAGED_TABLE | NULL               | NULL               |
|     85 |  1532856628 |     1 |                0 | hadoop |         0 |   105 | hive_struct     | MANAGED_TABLE | NULL               | NULL               |
|     91 |  1535883337 |     1 |                0 | hadoop |         0 |   111 | ruoze_test      | MANAGED_TABLE | NULL               | NULL               |
+--------+-------------+-------+------------------+--------+-----------+-------+-----------------+---------------+--------------------+--------------------+
//hive中与之一一对应
hive> use default;
OK
Time taken: 0.08 seconds
hive> show tables;
OK
hive_array
hive_map
hive_rownumber
hive_struct
hive_wc
rating_json
ruoze_test
user_click1
user_click_tmp1
zh
Time taken: 0.064 seconds, Fetched: 10 row(s)

2.table_params:该表存储表/视图的属性信息

mysql> select * from table_params where TBL_ID=85;
+--------+-----------------------+-------------+
| TBL_ID | PARAM_KEY             | PARAM_VALUE |
+--------+-----------------------+-------------+
|     85 | COLUMN_STATS_ACCURATE | true        |
|     85 | numFiles              | 1           |
|     85 | numRows               | 0           |
|     85 | rawDataSize           | 0           |
|     85 | totalSize             | 91          |
|     85 | transient_lastDdlTime | 1532856636  |
+--------+-----------------------+-------------+

hive> desc formatted hive_struct;
OK
# col_name              data_type               comment             
                 
ip                      string                                      
userinfo                struct<name:string,age:int>                         
                 
# Detailed Table Information             
Database:               default                  
Owner:                  hadoop                   
CreateTime:             Sun Jul 29 17:30:28 CST 2018     
LastAccessTime:         UNKNOWN                  
Protect Mode:           None                     
Retention:              0                        
Location:               hdfs://192.168.137.141:9000/user/hive/warehouse/hive_struct      
Table Type:             MANAGED_TABLE            
Table Parameters:                
        COLUMN_STATS_ACCURATE   true                
        numFiles                1                   
        numRows                 0                   
        rawDataSize             0                   
        totalSize               91                  
        transient_lastDdlTime   1532856636 

Hive文件存储信息相关的元数据表主要涉及SDS、SD_PARAMS、SERDES、SERDE_PARAMS
由于HDFS支持的文件格式很多,而建Hive表时候也可以指定各种文件格式,Hive在将HQL解析成MapReduce时候,需要知道去哪里,使用哪种格式去读写HDFS文件,而这些信息就保存在这几张表中。

四、Hive文件存储信息相关的元数据表

1.sds:该表保存文件存储的基本信息,如INPUT_FORMAT、OUTPUT_FORMAT、是否压缩等。

TBLS表中的SD_ID与该表关联,可以获取Hive表的存储信息。
《Hive元数据基础》

mysql> select * from sds where SD_ID=43;
+-------+-------+------------------------------------------+---------------+---------------------------+----------------------------------------------------------------+-------------+------------------------------------------------------------+----------+
| SD_ID | CD_ID | INPUT_FORMAT                             | IS_COMPRESSED | IS_STOREDASSUBDIRECTORIES | LOCATION                                                       | NUM_BUCKETS | OUTPUT_FORMAT                                              | SERDE_ID |
+-------+-------+------------------------------------------+---------------+---------------------------+----------------------------------------------------------------+-------------+------------------------------------------------------------+----------+
|    43 |    33 | org.apache.hadoop.mapred.TextInputFormat |               |                           | hdfs://192.168.137.141:9000/user/hive/warehouse/hive_rownumber |          -1 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |       43 |
+-------+-------+------------------------------------------+---------------+---------------------------+----------------------------------------------------------------+-------------+------------------------------------------------------------+----------+
1 row in set (0.10 sec)

2.SD_PARAMS:该表存储Hive存储的属性信息,在创建表时候使用STORED BY ‘storage.handler.class.name’ [WITH SERDEPROPERTIES (…)指定
《Hive元数据基础》

3.SERDES:该表存储序列化使用的类信息
《Hive元数据基础》

mysql> SELECT * FROM SERDES;
+----------+------+----------------------------------------------------+
| SERDE_ID | NAME | SLIB                                               |
+----------+------+----------------------------------------------------+
|        1 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
|       11 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
|       12 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
|       13 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
|       14 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
|       17 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
|       22 | NULL | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
.......
+----------+------+----------------------------------------------------+
45 rows in set (0.00 sec)

4.SERDE_PARAMS:该表存储序列化的一些属性、格式信息,比如:行、列分隔符
《Hive元数据基础》

mysql> SELECT * FROM SERDE_PARAMS;
+----------+----------------------+-------------+
| SERDE_ID | PARAM_KEY            | PARAM_VALUE |
+----------+----------------------+-------------+
|        1 | serialization.format | 1           |
|       11 | field.delim          |                |
|       11 | serialization.format |                |
|       12 | field.delim          |                |
|       12 | serialization.format |                |
|       13 | serialization.format | 1           |
|       14 | field.delim          |                |
|       14 | serialization.format |                |
|       17 | field.delim          |                |
|       17 | serialization.format |                |
|       22 | field.delim          |                |
......
|      101 | colelction.delim     | ,           |
|      101 | field.delim          |                |
|      101 | serialization.format |                |
|      103 | colelction.delim     | #           |
|      103 | field.delim          | ,           |
|      103 | mapkey.delim         | :           |
|      103 | serialization.format | ,           |
|      105 | colelction.delim     | :           |
|      105 | field.delim          | #           |
|      105 | serialization.format | #           |
|      106 | field.delim          |                |
|      106 | serialization.format |                |
|      111 | serialization.format | 1           |
|      116 | field.delim          | ,           |
|      116 | serialization.format | ,           |
+----------+----------------------+-------------+
89 rows in set (0.00 sec)

五、Hive表字段相关的元数据表

1.columns_v2:该表存储表对应的字段信息
《Hive元数据基础》

mysql> select * from columns_v2 where CD_ID=33;
+-------+---------+-------------+-----------+-------------+
| CD_ID | COMMENT | COLUMN_NAME | TYPE_NAME | INTEGER_IDX |
+-------+---------+-------------+-----------+-------------+
|    33 | NULL    | age         | int       |           1 |
|    33 | NULL    | id          | int       |           0 |
|    33 | NULL    | name        | string    |           2 |
|    33 | NULL    | sex         | string    |           3 |
+-------+---------+-------------+-----------+-------------+
4 rows in set (0.00 sec)

六、Hive表分区相关的元数据表

1.PARTITIONS:该表存储表分区的基本信息
《Hive元数据基础》

mysql> select * from partitions where TBL_ID=29;
+---------+-------------+------------------+-----------------------------------+-------+--------+
| PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME                         | SD_ID | TBL_ID |
+---------+-------------+------------------+-----------------------------------+-------+--------+
|       7 |  1529246035 |                0 | deptno=30                         |    36 |     29 |
|       8 |  1529246036 |                0 | deptno=10                         |    37 |     29 |
|       9 |  1529246036 |                0 | deptno=__HIVE_DEFAULT_PARTITION__ |    38 |     29 |
|      10 |  1529246037 |                0 | deptno=20                         |    39 |     29 |
+---------+-------------+------------------+-----------------------------------+-------+--------+
4 rows in set (0.02 sec)

2.partition_keys:该表存储分区的字段信息
《Hive元数据基础》

mysql> select * from partition_keys where TBL_ID=25;
+--------+--------------+-------------+-----------+-------------+
| TBL_ID | PKEY_COMMENT | PKEY_NAME   | PKEY_TYPE | INTEGER_IDX |
+--------+--------------+-------------+-----------+-------------+
|     25 | NULL         | event_month | string    |           0 |
+--------+--------------+-------------+-----------+-------------+
1 row in set (0.00 sec)

3.partition_key_vals:该表存储分区字段值
《Hive元数据基础》

mysql> select * from partition_key_vals where part_id=7;
+---------+--------------+-------------+
| PART_ID | PART_KEY_VAL | INTEGER_IDX |
+---------+--------------+-------------+
|       7 | 30           |           0 |
+---------+--------------+-------------+
1 row in set (0.00 sec)

4.PARTITION_PARAMS:该表存储分区的属性信息
《Hive元数据基础》

mysql> select * from partition_params where part_id=7;
+---------+-----------------------+-------------+
| PART_ID | PARAM_KEY             | PARAM_VALUE |
+---------+-----------------------+-------------+
|       7 | COLUMN_STATS_ACCURATE | true        |
|       7 | numFiles              | 1           |
|       7 | numRows               | 6           |
|       7 | rawDataSize           | 269         |
|       7 | totalSize             | 275         |
|       7 | transient_lastDdlTime | 1529246039  |
+---------+-----------------------+-------------+
6 rows in set (0.00 sec)
    原文作者:白面葫芦娃92
    原文地址: https://www.jianshu.com/p/1a7e8a083409
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞