Spring For Hadoop--Working With HBase（一）

2024年2月29日 228次阅读来源: 爱听歌的嘻嘻

最近在做web项目使用到了Hadoop,HBase，在这里对Spring For Hadoop（SHDP）的使用做个总结，主要使用了SHDP中提供的一些封装好的HBase模块。本博客将分两部分写：

对Spring For Hadoop–Working With HBase的基本认识（结合官方文档和自己的使用心得）
项目中的实际使用案例

             **1.对Spring For Hadoop--Working With HBase的基本认识**

SHDP对HBase和Spring进行了整合，开发者可以通过这个框架可以很轻易地对HBase进行操作。（可以联想使用Spring+Hibernate那样来操作数据库）

SHDP提供了通过hbase-configuration来设置HBase的配置文件:如：

<!-- default bean id is 'hbaseConfiguration' that uses the existing 'hadoopCconfiguration' object -->
<hdp:hbase-configuration configuration-ref="hadoopCconfiguration" />

通过上面的声明可以更容易的创建HBase的Configrution对象，除此之外还支持管理HBase的连接：当application Context关闭时，HBase中所有打开的连接都可以通过stop-proxy和delete-connetcion属性来调整。

<!-- delete associated connections but do not stop the proxies --><hdp:hbase-configuration stop-proxy="false" delete-connection="true">
  foo=bar
  property=value
</hdp:hbase-configuration>

另外，还可以通过指定zk的端口号来让客户端连接HBase

<!-- specify ZooKeeper host/port -->
<hdp:hbase-configuration zk-quorum="${hbase.host}" zk-port="${hbase.port}">

当然也通过引入其他配置文件的属性来为这个配置文件添加配置，如：

<hdp:hbase-configuration properties-ref="some-props-bean" properties-location="classpath:/conf/testing/hbase.properties"/>

DAO的支持
SHDP通过org.springframework.data.hadoop.hbase包为HBase提供为DAO的支持，通过HbaseTemplate 和其他几个回调函数如TableCallback, RowMapper ，ResultsExtractor可以轻易的实现HBase表的查询，数据的查询，扫描器的准备和一些结果的分析，极大的提高了开发效率。

此DAO的核心是HbaseTemplate ，一个与HBase进行交互的高层抽象类，这个类的使用需要设置HBase的配置，一旦设置成功后就HbaseTemplate就是线程安全的，可以在同一时间内被多个实例重复使用（间接的达到了HBase连接池的效果，HBase连接池在一些需要频繁的访问HBase的web项目非常重要！！！！）

// default HBase configuration
<hdp:hbase-configuration/>

// wire hbase configuration (using default name 'hbaseConfiguration') into the template
<bean id="htemplate" class="org.springframework.data.hadoop.hbase.HbaseTemplate" p:configuration-ref="hbaseConfiguration"/>

HbaseTemplate 还为表的执行逻辑或者结果及行的提取提供了通用的回调函数

// writing to 'MyTable'
template.execute("MyTable", new TableCallback<Object>() {
  @Overridepublic Object doInTable(HTable table) throws Throwable {
    Put p = new Put(Bytes.toBytes("SomeRow"));
    p.add(Bytes.toBytes("SomeColumn"), Bytes.toBytes("SomeQualifier"), Bytes.toBytes("AValue"));
    table.put(p);
    return null;
  }
});

这段代码展示了TableCallback 的使用，它完成了表的查找和资源的清理，而无需使用者里显式的处理，注意，在用不用回调函数时，HBase API抛出的异常都将被自动捕捉并且转换为Spring DAO的异常，而资源的清理也将被显式的调用。

此外HbaseTemplate 还为一些常用的操作提供了已经包装好的方法，用户可以直接使用而无需自己写回调函数

// read each row from 'MyTable'
List<String> rows = template.find("MyTable", "SomeColumn", new RowMapper<String>() {
  @Override
    public String mapRow(Result result,int rowNum) throws Exception{
    return result.toString();
  }
}));

（上面两段代码熟悉Spring和Hibernater可以轻易地联想到Spring提供的HibernateCallback和HibernaterTemplate）

除了HbaseTemplate 外，org.springframework.data.hadoop.hbase通过HbaseInterceptor类和HbaseSynchronizationManager类支持自动将HBase表绑定给当前线程，也就是说，每个在HBase上执行的DAO操作的类都会被HbaseInterceptor包装，因此一旦发现有在使用的表都将被绑定给当前线程，之后再使用这张表时就无需再查找表了（同样也达到了HBase连接池的效果），调用结束后，表将被自己关闭。

参考资料：Spring For Hadoop官方文档

    原文作者：爱听歌的嘻嘻
    原文地址: https://segmentfault.com/a/1190000004371405
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。