视频随笔
视频地址:
hbase教程
1.与传统关系型数据库的区别 hbase 传统
分布式 单机
列动态增减 建表时候指定
只有字符串一种数据类型 数值,字符
空值不被存储 存储
不支持SQL
查询方式单一,通过rowkey,或rowkey范围,或全表扫描
列式 行式
非结构化,json 结构化
2.hbase特点:
分布式
快速随机写,基于key简单读 是否支持单挑更新?
亿级行,百万列 关系型数据库对列数有限制
列式存储
不支持sql,java api,(套一个壳通过SQL访问)
3.hbase能否替代关系型数据库
不支持事务,交易数据mysql
不能提供丰富的查询,join等
只能作为补充
4.hmaster作用
1.管理regionserver
2.管理ddl,源数据定义
5.regionserver作用
1.dml
2.wal(write ahead log)
6.简单概念:
DML(Data Manipulation Language)数据操纵语言命令使用户能够查询数据库以及操作已有数据库中的数据。
如insert,delete,update,select等都是DML. DDL语句用语定义和管理数据库中的对象,如Create,Alter和Drop.
7.hbhbase逻辑视图; 类似sortedMap,其中key 是 (rowkey,column,version)组成的三维坐标,查询时候必须提供rowkey,根据查询粒度,column和version可选
8.hbase的物理存储:
1.table = n个region 按照rowkey水平切分
2.Region = n store 一个column family 一个store
3.store = 1个 memstore (内存) + n 个 hfile(hdfs文件) ,memstore 中的数据flush一次会产生一个hfile
9.hbase 设计建议
1.自己定义一个anmespace(database)
2.定义合理的schema
3.建表时设置合理预分区 pre-split auto-split force-split
4.选择合适的字段做rowkey,比如手机号,imsi
5.column family 和column的名字短一些,节省存储空间
6.设置合适的版本数量,建议保留3份
10.hbase 的操作
1.put 单条/批量操作,无update方法,类似map
2.delete 单条/批量操作
11.操作演练:
./hbase shell
1).简单状态查询
hbase(main):006:0> status 1 active master, 0 backup masters, 2 servers, 0 dead, 1.0000 average load Took 0.0175 seconds hbase(main):007:0> whoami hadoop (auth:SIMPLE) groups: hadoop Took 0.0006 seconds
2).查看某一具体命令用法
hbase(main):012:0> help "status" Show cluster status. Can be 'summary', 'simple', 'detailed', or 'replication'. The default is 'summary'. Examples: hbase> status hbase> status 'simple' hbase> status 'summary' hbase> status 'detailed' hbase> status 'replication' hbase> status 'replication', 'source' hbase> status 'replication', 'sink' hbase(main):013:0>
3)查看namespace 可以用tab补全功能
hbase(main):013:0> list_namespace NAMESPACE default hbase 2 row(s) Took 0.1524 seconds hbase(main):014:0>
4).创建namespace
reate create_namespace hbase(main):019:0> create_namespace 'gp' Took 0.2463 seconds hbase(main):020:0> hbase(main):020:0> list_namespace NAMESPACE default gp hbase 3 row(s) Took 0.0270 seconds
5)创建带预分区的表:
create ‘namespace:表名’,'列族',... hbase(main):024:0> create 'gp:test','info',{NUMREGIONS => 4, SPLITALGO => 'HexStringSplit'} Created table gp:test Took 2.6835 seconds => Hbase::Table - gp:test hbase(main):025:0> desc 'gp:test' Table gp:test is ENABLED gp:test COLUMN FAMILIES DESCRIPTION {NAME => 'info', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_ BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'fals e', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLIC ATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_ME MORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'f alse', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'} 1 row(s) Took 0.3126 seconds hbase(main):026:0>
6)修改表属性,将存储的version由一个 改为 3个
hbase(main):028:0> alter 'gp:test',{NAME=>'info',VERSIONS=>'3'} Updating all regions with the new schema... 4/4 regions updated. Done. Took 2.3734 seconds hbase(main):029:0> desc 'gp:test' Table gp:test is ENABLED gp:test COLUMN FAMILIES DESCRIPTION {NAME => 'info', VERSIONS => '3', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_ BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'fals e', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLIC ATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_ME MORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'f alse', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'} 1 row(s) Took 0.0597 seconds hbase(main):030:0>
7)插入数据:
语法 put ‘namespace:tablename’,‘rowkey’,‘columnfamily:column’,‘value’,version(版本可不指定,默认是时间戳) hbase(main):030:0> put 'gp:test','123','info:col1','v1' Took 0.2623 seconds hbase(main):033:0> scan 'gp:test' ROW COLUMN+CELL 123 column=info:col1, timestamp=1534082352792, value=v1 1 row(s) Took 0.1840 seconds
8)用get查询数据:
hbase(main):035:0> put 'gp:test','456','info:col1','v2',12 Took 0.0188 seconds hbase(main):036:0> scan 'gp:test' ROW COLUMN+CELL 123 column=info:col1, timestamp=1534082352792, value=v1 456 column=info:col1, timestamp=12, value=v2 2 row(s) Took 0.0526 seconds hbase(main):037:0> get 'gp:test','123' COLUMN CELL info:col1 timestamp=1534082352792, value=v1 1 row(s) Took 0.0783 seconds hbase(main):038:0>
9)get rowkey=‘123’ 的指定列
hbase(main):038:0> put 'gp:test','123','info:col2','v3' Took 0.0487 seconds hbase(main):039:0> get 'gp:test','123','info:col1' COLUMN CELL info:col1 timestamp=1534082352792, value=v1 1 row(s) Took 0.0104 seconds hbase(main):040:0>
10)删除某一行的指定列:
hbase(main):022:0> delete 'gp:test','123','info:col1' hbase(main):043:0> scan 'gp:test' ROW COLUMN+CELL 123 column=info:col2, timestamp=1534082891558, value=v3 456 column=info:col1, timestamp=12, value=v2 2 row(s) Took 0.0606 seconds hbase(main):044:0>
11)删除整行记录:
hbase(main):044:0> deleteall 'gp:test','456' Took 0.0225 seconds hbase(main):045:0> scan 'gp:test' ROW COLUMN+CELL 123 column=info:col2, timestamp=1534082891558, value=v3 1 row(s) Took 0.0687 seconds hbase(main):046:0> 执行delete操作之后并未马上删除数据,只是打上了delete标志 可以通过如下命令查看 hbase(main):050:0> scan 'gp:test', {RAW => true, VERSIONS => 10} ROW COLUMN+CELL 123 column=info:col1, timestamp=1534082352792, type=Delete 123 column=info:col1, timestamp=1534082352792, value=v1 123 column=info:col2, timestamp=1534082891558, value=v3 456 column=info:, timestamp=1534083246672, type=DeleteFamily 456 column=info:col1, timestamp=12, value=v2 2 row(s) Took 0.1143 seconds hbase(main):051:0> delete其实是一个put操作,插入了type=Deletexxx 目前数据还在memstore 中,未flush到hfile中
12)执行flush,major_compact后数据会被删掉
hbase(main):051:0> flush 'gp:test' Took 0.8562 seconds hbase(main):055:0> scan 'gp:test', {RAW => true, VERSIONS => 10} ROW COLUMN+CELL 123 column=info:col1, timestamp=1534082352792, type=Delete 123 column=info:col2, timestamp=1534082891558, value=v3 456 column=info:, timestamp=1534083246672, type=DeleteFamily 2 row(s) Took 0.0718 seconds hbase(main):002:0> major_compact 'gp:test' Took 0.3532 seconds hbase(main):001:0> scan 'gp:test', {RAW => true, VERSIONS => 10} ROW COLUMN+CELL 123 column=info:col2, timestamp=1534082891558, value=v3 1 row(s) Took 0.8065 seconds hbase(main):002:0> 生产中很少进行compact ,会阻塞读写
13)清空表和namespace
hbase(main):003:0> truncate 'gp:test' Truncating 'gp:test' table (it may take a while): Disabling table... Truncating table... Took 2.1177 seconds hbase(main):004:0> scan 'gp:test' ROW COLUMN+CELL 0 row(s) Took 1.1058 seconds hbase(main):005:0> disable 'gp:test' Took 0.5193 seconds hbase(main):006:0> scan 'gp:test' ROW COLUMN+CELL org.apache.hadoop.hbase.TableNotEnabledException: gp:test is disabled. at org.apache.hadoop.hbase.client.ConnectionImplementation.relocateRegion(ConnectionImplementation.java:714) at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:328) at org.apache.hadoop.hbase.client.ScannerCallable.prepare(ScannerCallable.java:139) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.prepare(ScannerCallableWithReplicas.java:399) at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105) at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ERROR: Table gp:test is disabled! For usage try 'help "scan"' Took 0.1323 seconds hbase(main):007:0> drop 'gp:test' Took 0.3581 seconds hbase(main):008:0> drop drop drop_all drop_namespace hbase(main):008:0> list list list_deadservers list_labels list_locks list_namespace list_namespace_tables list_peer_configs list_peers list_procedures list_quota_snapshots list_quota_table_sizes list_quotas list_regions list_replicated_tables list_rsgroups list_security_capabilities list_snapshot_sizes list_snapshots list_table_snapshots hbase(main):008:0> list_namespace list_namespace list_namespace_tables hbase(main):008:0> list_namespace 'gp' NAMESPACE gp 1 row(s) Took 0.1517 seconds hbase(main):009:0> drop drop drop_all drop_namespace hbase(main):009:0> drop_namespace 'gp' Took 0.2719 seconds hbase(main):010:0> list list list_deadservers list_labels list_locks list_namespace list_namespace_tables list_peer_configs list_peers list_procedures list_quota_snapshots list_quota_table_sizes list_quotas list_regions list_replicated_tables list_rsgroups list_security_capabilities list_snapshot_sizes list_snapshots list_table_snapshots hbase(main):010:0> list_namespace list_namespace list_namespace_tables hbase(main):010:0> list_namespace NAMESPACE default hbase 2 row(s) Took 0.0322 seconds hbase(main):011:0>