承接上一篇文章已经创建了的Hadoop 2.6 的伪分布式环境,这里记录一下创建HBase伪分布式环境的过程,使用的HBase版本是1.1.1。
主要分为以下几步:
- 搭建Hadoop 2.6伪分布式环境(见上一篇文章)
- 下载安装Hbase 1.1.1
- 配置本地ssh免登陆
- 安装并配置Hadoop伪分布模式
- 测试运行wordcount实例
- 下载安装HBase
HBase的下载地址在这里,我们选择hbase-1.1.1-bin.tar.gz 版本下载:
su hadoop # 切换到上篇文章中创建的hadoop账户
cd ~
wget http://mirrors.cnnic.cn/apache/hbase/1.1.1/hbase-1.1.1-bin.tar.gz
然后解压到当前文件夹
tar xzvf hbase-1.1.1-bin.tar.gz
mv hbase1.1.1 hbase
- 修改HBase配置文件
2.1 进入HBase配置文件夹,为HBase 指定JAVA_HOME的值
cd hbase/conf
vim hbase-env.sh
按照上一篇文章的搭建过程,则这里JAVA_HOME的值应该是 /usr/lib/jvm/java-8-oracle,
# The java implementation to use. Java 1.7+ required.
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
然后编辑hbase-site.xml文件:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/zookeeper</value>
</property>
</configuration>
因为我们采用的是伪分布模式,这里需要将HBase的数据存储到之前的Hadoop的HDFS上,hbase.rootdir的值便是HDFS上HBase数据存储的位置,值中的主机名和端口号要和之前Hadoop的 core-site.xml中的fs.default.name的值相同,比如上一篇文章中的 hdfs://localhost:9000 。
- 为HBase配置环境变量
和之前配置Hadoop一样,为了以后方便启动HBase,这里我们将HBase的启动脚本等目录加入环境变量:
vim ~/.bashrc
然后将以下值追加到文件末尾:
export HBASE_HOME=/home/hadoop/hbase
export HBASE_CONF_DIR=$HBASE_HOME/conf
export HBASE_CLASS_PATH=$HBASE_CONF_DIR
export PATH=$PATH:$HBASE_HOME/bin
- 启动HBase
由于伪分布式下的 HBase 依赖 HDFS ,因此我们需要先启动 HDFS :
start-dfs.sh
然后启动 HBase :
start-hbase.sh
- 测试 HBase
上一步启动HBase之后,我们查看一下系统中的Java进程,应该可以看到以下几个进程:
hadoop@iZ259rt0i0iZ:~$ jps
2832 HMaster
2945 HRegionServer
2245 DataNode
2150 NameNode
3065 Jps
2745 HQuorumPeer
2431 SecondaryNameNode
然后我们启动HBase shell,利用HBase提供的命令进行简单交互:
hadoop@iZ259rt0i0iZ:~$ hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/hbase/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2015-08-06 15:09:30,272 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.0.1.1, re1dbf4df30d214fca14908df71d038081577ea46, Sun May 17 12:34:26 PDT 2015
hbase(main):001:0> help
HBase Shell, version 1.0.1.1, re1dbf4df30d214fca14908df71d038081577ea46, Sun May 17 12:34:26 PDT 2015
Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command.
Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group.
COMMAND GROUPS:
Group name: general
Commands: status, table_help, version, whoami
Group name: ddl
Commands: alter, alter_async, alter_status, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, show_filters
Group name: namespace
Commands: alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables
Group name: dml
Commands: append, count, delete, deleteall, get, get_counter, incr, put, scan, truncate, truncate_preserve
Group name: tools
Commands: assign, balance_switch, balancer, catalogjanitor_enabled, catalogjanitor_run, catalogjanitor_switch, close_region, compact, compact_rs, flush, major_compact, merge_region, move, split, trace, unassign, wal_roll, zk_dump
Group name: replication
Commands: add_peer, append_peer_tableCFs, disable_peer, enable_peer, list_peers, list_replicated_tables, remove_peer, remove_peer_tableCFs, set_peer_tableCFs, show_peer_tableCFs
Group name: snapshots
Commands: clone_snapshot, delete_all_snapshot, delete_snapshot, list_snapshots, restore_snapshot, snapshot
Group name: configuration
Commands: update_all_config, update_config
Group name: security
Commands: grant, revoke, user_permission
Group name: visibility labels
Commands: add_labels, clear_auths, get_auths, list_labels, set_auths, set_visibility
SHELL USAGE:
Quote all names in HBase Shell such as table and column names. Commas delimit
command parameters. Type <RETURN> after entering a command to run it.
Dictionaries of configuration used in the creation and alteration of tables are
Ruby Hashes. They look like this:
{'key1' => 'value1', 'key2' => 'value2', ...}
and are opened and closed with curley-braces. Key/values are delimited by the
'=>' character combination. Usually keys are predefined constants such as
NAME, VERSIONS, COMPRESSION, etc. Constants do not need to be quoted. Type
'Object.constants' to see a (messy) list of all constants in the environment.
If you are using binary keys or values and need to enter them in the shell, use
double-quote'd hexadecimal representation. For example:
hbase> get 't1', "key\x03\x3f\xcd"
hbase> get 't1', "key\003\023\011"
hbase> put 't1', "test\xef\xff", 'f1:', "\x01\x33\x40"
The HBase shell is the (J)Ruby IRB with the above HBase-specific commands added.
For more on the HBase Shell, see http://hbase.apache.org/book.html
hbase(main):002:0> list
TABLE
0 row(s) in 0.4050 seconds
=> []
hbase(main):003:0>
退出HBase shell:
hbase(main):003:0> exit
这里我们尝试使用HBase 的 Thrift API,用Python和HBase进行简单交互。首先启动HBase的Thrift服务:
hadoop@iZ259rt0i0iZ:~$ hbase-daemon.sh start thrift
starting thrift, logging to /home/hadoop/hbase/logs/hbase-hadoop-thrift-iZ259rt0i0iZ.out
然后安装Python的happybase模块,HBase是对 HBase的Thrift接口的一个简单包装:
pip install happybase
然后启动ipython,如果没有ipython,请通过pip安装:
hadoop@iZ259rt0i0iZ:~$ ipython
Python 2.7.6 (default, Mar 22 2014, 22:59:56)
Type "copyright", "credits" or "license" for more information.
IPython 3.2.1 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
In [1]: import happybase
In [2]: connection = happybase.Connection('localhost')
In [3]: connection.tables()
Out[3]: []
In [4]: families = {'basic':dict(max_versions=3),'detail':dict(max_versions=1000),'comment':dict(max_versions=1000),'answer':dict(max_versions=1000),'follower':dict(max_versions=1000)}
In [5]: connection.create_table('question',families)
In [6]: connection.tables()
Out[6]: ['question']
In [7]:
至此,我们HBase的伪分布式环境搭建完毕。
Reference: