(作者:陈玓玏)
有时候我们需要查看Hive表对应文件的文件大小,那么分两步:
- 知道Hive表在HDFS中的存储位置;
- 查看Hive表对应的文件大小。
1. 知道Hive表在HDFS中的存储位置
使用show create table tableName来查看:
0: jdbc:hive2://nfjd-hadoop02-node46.jpushoa.> show create table tmp.cdl_push_r;
INFO : Compiling command(queryId=hive_20191108203838_0355393e-9a92-44d3-a0c4-bf52cabcfa4b): show create table tmp.cdl_push_r
INFO : UserName: chendl
INFO : Semantic Analysis Completed
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:createtab_stmt, type:string, comment:from deserializer)], properties:null)
INFO : Completed compiling command(queryId=hive_20191108203838_0355393e-9a92-44d3-a0c4-bf52cabcfa4b); Time taken: 0.398 seconds
INFO : Executing command(queryId=hive_20191108203838_0355393e-9a92-44d3-a0c4-bf52cabcfa4b): show create table tmp.cdl_push_r
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Completed executing command(queryId=hive_20191108203838_0355393e-9a92-44d3-a0c4-bf52cabcfa4b); Time taken: 0.042 seconds
INFO : OK
CREATE TABLE `tmp.cdl_push_r`(
`imei` string,
`recall_date` bigint,
`feature` string,
`value` bigint)
PARTITIONED BY (
`customer_name` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://nameservice1/user/hive/warehouse/tmp.db/cdl_push_r'
TBLPROPERTIES (
'spark.sql.create.version'='2.4.3',
'spark.sql.sources.schema.numPartCols'='1',
'spark.sql.sources.schema.numParts'='1',
'spark.sql.sources.schema.part.0'='{\"type\":\"struct\",\"fields\":[{\"name\":\"im\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"recall_date\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}},{\"name\":\"feature\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"value\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}},{\"name\":\"customer_name\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}}]}',
'spark.sql.sources.schema.partCol.0'='customer_name',
'transient_lastDdlTime'='1570433559')
22 rows selected (0.514 seconds)
结果中LOCATION为’hdfs://nameservice1/user/hive/warehouse/tmp.db/cdl_push_r’,即Hive表存储的位置。
2. 查看Hive表对应的文件大小
根据找到的Hive表存储位置,通过hdfs命令查看表的大小,最后一个参数直接复制上面的LOCATION即可:
[chendl@cdl]$ hadoop fs -du -s -h hdfs://nameservice1/user/hive/warehouse/tmp.db/cdl_push_r
222.5 M 445.0 M hdfs://nameservice1/user/hive/warehouse/tmp.db/cdl_push_r
也可以查看此表有多少个分区:
[chendl@cdl]$ hadoop fs -ls hdfs://nameservice1/user/hive/warehouse/tmp.db/cdl_push_r
Found 1 items
drwxrwx---+ - chendl hive 0 2019-10-10 03:04 hdfs://nameservice1/user/hive/warehouse/tmp.db/cdl_push_r/customer_name=test190924
>
参考资料: https://blog.csdn.net/lilychen1983/article/details/80912876