目录
Hive 连接 HBase
我的版本是:
HADOOP 2.4.1
HBase 0.98.6.1
Hive 0.13.1
关于 HBase 0.98.6.1
我好像还是没有完全正确安装HBase,0.98.6.1对应的Hadoop版本是2.2,我这里面用的2.4.1。
使用的过程中,会遇到各种问题,比如在用importtsv
向HBase里面导入数据的时候,会报错。暂时的解决方法是,用Hadoop2.4.1的jar包直接替换掉HBase里面的hadoop开头的2.2的jar包。运行以后没有报错。
问题
首先在Hbase里面先创建一个table
$ hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.92.0, r1231986, Mon Jan 16 13:16:35 UTC 2012
hbase(main):001:0>
hbase(main):001:0> create 'bar', 'cf'
0 row(s) in 0.1200 seconds
hbase(main):002:0>
然后使用Hive连接HBase中的这个表,使用Hive的HBaseStorageHandler,DDL语句如下:
hive>CREATE EXTERNAL TABLE foo(rowkey STRING, a STRING, b STRING) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key,cf:c1,cf:c2') TBLPROPERTIES ('hbase.table.name' = 'bar');
出现了如下错误:
14/10/24 19:31:43 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.io.IOException: Attempt to start meta tracker failed.
at org.apache.hadoop.hbase.catalog.CatalogTracker.start(CatalogTracker.java:201)
at org.apache.hadoop.hbase.client.HBaseAdmin.getCatalogTracker(HBaseAdmin.java:230)
at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:277)
at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:293)
at org.apache.hadoop.hive.hbase.HBaseStorageHandler.preCreateTable(HBaseStorageHandler.java:162)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:554)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:547)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89)
at com.sun.proxy.$Proxy9.createTable(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:613)
at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4189)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:281)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/meta-region-server
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:220)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:425)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:77)
at org.apache.hadoop.hbase.catalog.CatalogTracker.start(CatalogTracker.java:197)
... 33 more
找了好久终于找到了解决办法
解决方法
HBaseIntegration使用的是 hive-hbase-handler-x.y.z.jar模块。
https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration
The handler requires
Hadoop 0.20 or higher
, and has only been tested with dependency versions hadoop-0.20.x,hbase-0.92.0
andzookeeper-3.3.4
. If you are not using hbase-0.92.0, you will need to rebuild the handler with the HBase jar matching your version, and change the--auxpath
above accordingly. Failure to use matching versions will lead to misleading connection failures such as MasterNotRunningException since the HBase RPC protocol changes often.
使用这个HBaseStorageHandler需要用到一些jar包,需要使用--auxpath
来指定相对路径。但是cwiki上面说方法太复杂,使用起来容易出错。
但是在介绍 HBaseBulkLoad 的时候也用到了额外的jar包,这里面的使用方式就简单多了。
https://cwiki.apache.org/confluence/display/Hive/HBaseBulkLoad
Add necessary JARs
You will need to add a couple jar files to your path. First, put them in DFS:
hadoop dfs -put /usr/lib/hive/lib/hbase-VERSION.jar /user/hive/hbase-VERSION.jar
hadoop dfs -put /usr/lib/hive/lib/hive-hbase-handler-VERSION.jar /user/hive/hive-hbase-handler-VERSION.jar
Then add them to your hive-site.xml
:
<property>
<name>hive.aux.jars.path</name>
<value>/user/hive/hbase-VERSION.jar,/user/hive/hive-hbase-handler-VERSION.jar</value>
</property>
在hive-site.xml里面直接设置jar包路径,方便多了。
我把文件传到hdfs上面之后,添加的配置如下:
<property>
<name>hive.aux.jars.path</name>
<value>/user/hive/lib/hbase-common-0.98.6.1-hadoop2.jar,/user/hive/lib/hive-hbase-handler-0.13.1.jar,/user/hive/lib/zookeeper-3.4.6.jar</value>
<description>The location of the plugin jars that contain implementations of user defined functions and serdes.</description>
</property>
这样修改完成之后,再重新启动Hive
#nohup hive --service metastore > $HIVE_HOME/log/hive_metastore.log &
#nohup hive --service hiveserver > $HIVE_HOME/log/hiveserver.log &
#./hive -hiveconf hbase.zookeeper.quorum=slave1,slave2,slave3
最后一步#./hive -hiveconf hbase.zookeeper.quorum=slave1,slave2,slave3
一定不能少了,这是启动成功的关键。
关于最后一句的作用,参考大神的原话:
You need to tell Hive where to find the zookeepers quorum which would elect the HBase master
现在重新在Hive的shell中执行:
hive>CREATE EXTERNAL TABLE foo(rowkey STRING, a STRING, b STRING) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key,cf:c1,cf:c2') TBLPROPERTIES ('hbase.table.name' = 'bar');
不报错,成功添加外部表!
Hive中table的定义
Hive 相关概念:
【受管理的表】A managed table
is one for which the definition is primarily managed in Hive’s metastore, and for whose data storage Hive is responsible.
【外部表】An external table
is one whose definition is managed in some external catalog, and whose data Hive does not own (i.e. it will not be deleted when the table is dropped).
【内部表】native
【外部表】non-native
These two distinctions (managed vs. external and native vs non-native) are orthogonal(正交).
Hence, there are four possibilities for base tables:
managed native
: what you get by default with CREATE TABLEexternal native
: what you get with CREATE EXTERNAL TABLE when no STORED BY clause is specifiedmanaged non-native
: what you get with CREATE TABLE when a STORED BY clause is specified; Hive stores the definition in its metastore, but does not create any files itself; instead, it calls the storage handler with a request to create a corresponding object structureexternal non-native
: what you get with CREATE EXTERNAL TABLE when a STORED BY clause is specified; Hive registers the definition in its metastore and calls the storage handler to check that it matches the primary definition in the other system
One more thing 关于Hive的关闭
Hive好像没有指定关闭的脚本。我暂时的用的方法是,找出Hive的pid(两个东西),然后直接kill…简单粗暴啊。
# netstat -lnp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:10000 0.0.0.0:* LISTEN 21415/java
tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN 12601/java
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 884/sshd
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 960/master
tcp 0 0 0.0.0.0:9083 0.0.0.0:* LISTEN 21100/java
tcp 0 0 192.168.129.63:9000 0.0.0.0:* LISTEN 12601/java
tcp 0 0 192.168.129.63:9001 0.0.0.0:* LISTEN 12783/java
tcp 0 0 :::22 :::* LISTEN 884/sshd
tcp 0 0 ::ffff:192.168.129.63:8088 :::* LISTEN 12939/java
tcp 0 0 ::1:25 :::* LISTEN 960/master
tcp 0 0 ::ffff:192.168.129.63:8030 :::* LISTEN 12939/java
tcp 0 0 ::ffff:192.168.129.63:8031 :::* LISTEN 12939/java
tcp 0 0 ::ffff:192.168.129.63:60000 :::* LISTEN 20610/java
tcp 0 0 ::ffff:192.168.129.63:8032 :::* LISTEN 12939/java
tcp 0 0 ::ffff:192.168.129.63:8033 :::* LISTEN 12939/java
tcp 0 0 :::60010 :::* LISTEN 20610/java
Active UNIX domain sockets (only servers)
Proto RefCnt Flags Type State I-Node PID/Program name Path
unix 2 [ ACC ] STREAM LISTENING 8318 1/init @/com/ubuntu/upstart
unix 2 [ ACC ] STREAM LISTENING 10389 850/dbus-daemon /var/run/dbus/system_bus_socket
unix 2 [ ACC ] STREAM LISTENING 10698 960/master public/cleanup
unix 2 [ ACC ] STREAM LISTENING 10705 960/master private/tlsmgr
unix 2 [ ACC ] STREAM LISTENING 10709 960/master private/rewrite
unix 2 [ ACC ] STREAM LISTENING 10713 960/master private/bounce
unix 2 [ ACC ] STREAM LISTENING 10717 960/master private/defer
unix 2 [ ACC ] STREAM LISTENING 10721 960/master private/trace
unix 2 [ ACC ] STREAM LISTENING 10725 960/master private/verify
unix 2 [ ACC ] STREAM LISTENING 10729 960/master public/flush
unix 2 [ ACC ] STREAM LISTENING 10733 960/master private/proxymap
unix 2 [ ACC ] STREAM LISTENING 10737 960/master private/proxywrite
unix 2 [ ACC ] STREAM LISTENING 10741 960/master private/smtp
unix 2 [ ACC ] STREAM LISTENING 10745 960/master private/relay
unix 2 [ ACC ] STREAM LISTENING 10749 960/master public/showq
unix 2 [ ACC ] STREAM LISTENING 10753 960/master private/error
unix 2 [ ACC ] STREAM LISTENING 10757 960/master private/retry
unix 2 [ ACC ] STREAM LISTENING 10761 960/master private/discard
unix 2 [ ACC ] STREAM LISTENING 10765 960/master private/local
unix 2 [ ACC ] STREAM LISTENING 10769 960/master private/virtual
unix 2 [ ACC ] STREAM LISTENING 10773 960/master private/lmtp
unix 2 [ ACC ] STREAM LISTENING 10777 960/master private/anvil
unix 2 [ ACC ] STREAM LISTENING 10781 960/master private/scache
#kill -9 21110
#kill -9 21415
参考链接
https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration
https://cwiki.apache.org/confluence/display/Hive/StorageHandlers
http://stackoverflow.com/questions/23658600/error-while-creating-an-hive-table-on-top-of-an-hbase-table