查找hive表的存储位置并查看表文件大小及分区文件名

(作者:陈玓玏)

有时候我们需要查看Hive表对应文件的文件大小,那么分两步:

  1. 知道Hive表在HDFS中的存储位置;
  2. 查看Hive表对应的文件大小。

1. 知道Hive表在HDFS中的存储位置
使用show create table tableName来查看:

0: jdbc:hive2://nfjd-hadoop02-node46.jpushoa.> show create table tmp.cdl_push_r;
INFO  : Compiling command(queryId=hive_20191108203838_0355393e-9a92-44d3-a0c4-bf52cabcfa4b): show create table tmp.cdl_push_r
INFO  : UserName: chendl
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:createtab_stmt, type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling command(queryId=hive_20191108203838_0355393e-9a92-44d3-a0c4-bf52cabcfa4b); Time taken: 0.398 seconds
INFO  : Executing command(queryId=hive_20191108203838_0355393e-9a92-44d3-a0c4-bf52cabcfa4b): show create table tmp.cdl_push_r
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=hive_20191108203838_0355393e-9a92-44d3-a0c4-bf52cabcfa4b); Time taken: 0.042 seconds
INFO  : OK
CREATE TABLE `tmp.cdl_push_r`(
  `imei` string, 
  `recall_date` bigint, 
  `feature` string, 
  `value` bigint)
PARTITIONED BY ( 
  `customer_name` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://nameservice1/user/hive/warehouse/tmp.db/cdl_push_r'
TBLPROPERTIES (
  'spark.sql.create.version'='2.4.3', 
  'spark.sql.sources.schema.numPartCols'='1', 
  'spark.sql.sources.schema.numParts'='1', 
  'spark.sql.sources.schema.part.0'='{\"type\":\"struct\",\"fields\":[{\"name\":\"im\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"recall_date\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}},{\"name\":\"feature\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"value\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}},{\"name\":\"customer_name\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}}]}', 
  'spark.sql.sources.schema.partCol.0'='customer_name', 
  'transient_lastDdlTime'='1570433559')
22 rows selected (0.514 seconds)

结果中LOCATION为’hdfs://nameservice1/user/hive/warehouse/tmp.db/cdl_push_r’,即Hive表存储的位置。
2. 查看Hive表对应的文件大小
根据找到的Hive表存储位置,通过hdfs命令查看表的大小,最后一个参数直接复制上面的LOCATION即可:

[chendl@cdl]$ hadoop fs -du -s -h hdfs://nameservice1/user/hive/warehouse/tmp.db/cdl_push_r
222.5 M  445.0 M  hdfs://nameservice1/user/hive/warehouse/tmp.db/cdl_push_r

也可以查看此表有多少个分区:

[chendl@cdl]$ hadoop fs -ls hdfs://nameservice1/user/hive/warehouse/tmp.db/cdl_push_r
Found 1 items
drwxrwx---+  - chendl hive          0 2019-10-10 03:04 hdfs://nameservice1/user/hive/warehouse/tmp.db/cdl_push_r/customer_name=test190924
>

参考资料: https://blog.csdn.net/lilychen1983/article/details/80912876

    原文作者:小白白白又白cdllp
    原文地址: https://blog.csdn.net/weixin_39750084/article/details/102960836
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞