Phoenix查询测试经验总结

2023年2月21日 323次阅读来源: Jeffbond

1. 背景

适当的索引能够让极大提升查询速度，因此在Phoenix查询的测试用例中包括了对有索引跟无索引的查询性能的比较。测试过程中遇到一些问题，在此记录下来。

2. 问题及解决

2.1. 创建索引时报错，报错如下：

//创建索引语句：
0: jdbc:phoenix:localhost> CREATE INDEX ind_1 ON TESTINPUT(ff1);

//报错：
Error: ERROR 1029 (42Y88): Mutable secondary indexes must have the hbase.regionserver.wal.codec property set to org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec in the hbase-sites.xml of every region server tableName=IND_1 (state=42Y88,code=1029)
java.sql.SQLException: ERROR 1029 (42Y88): Mutable secondary indexes must have the hbase.regionserver.wal.codec property set to org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec in the hbase-sites.xml of every region server tableName=IND_1
    at org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:396)
    at org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145)
    at org.apache.phoenix.schema.MetaDataClient.createIndex(MetaDataClient.java:1162)
    at org.apache.phoenix.compile.CreateIndexCompiler$1.execute(CreateIndexCompiler.java:95)
    at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:322)
    at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:314)
    at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
    at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:312)
    at org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:1435)
    at sqlline.Commands.execute(Commands.java:822)
    at sqlline.Commands.sql(Commands.java:732)
    at sqlline.SqlLine.dispatch(SqlLine.java:808)
    at sqlline.SqlLine.begin(SqlLine.java:681)
    at sqlline.SqlLine.start(SqlLine.java:398)
    at sqlline.SqlLine.main(SqlLine.java:292)

原因：Phoenix支持两种索引：可变索引跟不可变索引。在可变表上建的索引是可变索引，在不可变表上建的索引是不可变索引。可变索引是指插入或删除数据的时候会同时更新索引；不可变索引适用于只写入一次不再更改的表，索引只建立一次，再插入数据不会更新索引。上面使用的语句是创建可变索引，需要在hbase-site.xml中进行相关配置使其支持可变索引（不可变索引无需另外配置，默认支持）。

解决：对HMaster和HRegionserver节点分别增加配置,然后重启HBase集群

HMaster

<property>
  <name>hbase.regionserver.wal.codec</name>
  <value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>
</property>
<property>
 <name>hbase.master.loadbalancer.class</name>
 <value>org.apache.phoenix.hbase.index.balancer.IndexLoadBalancer</value>
</property>
<property>
 <name>hbase.coprocessor.master.classes</name>
 <value>org.apache.phoenix.hbase.index.master.IndexMasterObserver</value>

</property>

 - HRegionserver

<property>
<name>hbase.regionserver.wal.codec</name>
<value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>
</property>
<property>
<name>hbase.region.server.rpc.scheduler.factory.class</name>
<value>org.apache.hadoop.hbase.ipc.PhoenixRpcSchedulerFactory</value>
<description>Factory to create the Phoenix RPC Scheduler that usesseparate queues for index and metadata updates</description>
</property>
<property>
<name>hbase.rpc.controllerfactory.class</name>
<value>org.apache.hadoop.hbase.ipc.controller.ServerRpcControllerFactory</value>
<description>Factory to create the Phoenix RPCScheduler that uses separate queues for index and metadataupdates</description>
</property>
<property>
<name>hbase.coprocessor.regionserver.classes</name>
<value>org.apache.hadoop.hbase.regionserver.LocalIndexMerger</value>
</property>


### 2.2. 对10亿数据查询时，报错如下：

16/11/29 10:33:50 WARN client.ScannerCallable: Ignore, probably already closed
org.apache.hadoop.hbase.regionserver.LeaseException: org.apache.hadoop.hbase.regionserver.LeaseException: lease ‘1132’ does not exist
at org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:221)
at org.apache.hadoop.hbase.regionserver.Leases.cancelLease(Leases.java:206)

…

org.apache.phoenix.exception.PhoenixIOException: org.apache.phoenix.exception.PhoenixIOException: Failed after attempts=36, exceptions:
Tue Nov 29 10:33:50 CST 2016, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=60321: row ‘��s,d’ on table ‘TEST11’ at region=TEST11,\x11\x00\x00\x00\x00\x00\x00\x00\x00,1479985615575.c3adb68acea8d88d223bffd3acc16c2e., hostname=node-20-105,60020,1480385981798, seqNum=1244662

…

Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=18173, waitTime=60001, operationTimeout=60000 expired.
at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:70)
at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1197)
…


- 原因：

某些查询需要很长时间才能返回结果，被HBase的超时机制杀掉了。

- 思路：

增大超时时间，在hbase-site.xml里增加了如下配置：

<property>
<name>hbase.rpc.timeout</name>
<value>600000</value>
</property>

<property>
<name>hbase.client.operation.timeout</name>
<value>600000</value>
</property>

<property>
<name>hbase.client.scanner.timeout.period</name>
<value>600000</value>
</property>

<property>
<name>hbase.regionserver.lease.period</name>
<value>600000</value>
</property>

<property>
<name>phoenix.query.timeoutMs</name>
<value>600000</value>
</property>

<property>
<name>phoenix.query.keepAliveMs</name>
<value>600000</value>
</property>

<property>
<name>hbase.client.ipc.pool.type</name>
<value>RoundRobinPool</value>
</property>
<property>
<name>hbase.client.ipc.pool.size</name>
<value>10</value>
</property>


最终虽然配置生效了，但是还是报同样的错。已经将网上说的可能的配置项都配了还是无法解决超时问题。等增加了机器，查询时间变短，10亿数据的查询应该就没有超时问题了。



## 3. 特性

- 不可变索引默认支持，不需要另外配置；可变索引需要如上添加配置才能支持
- 创建不可变表：

CREATE TABLE TABLENAME (pk long PRIMARY KEY,col1 int) IMMUTABLE_ROWS=true;

- 创建索引有以下几种方式：

CREATE INDEX ind_name ON TABLENAME(COLUMN1);
CREATE INDEX ind_name ON TABLENAME(COLUMN1,COLUMN2);
CREATE INDEX ind_name ON TABLENAME(COLUMN1) INCLUDE(COLUMN2);

- 执行查询的时候，Phoenix查询优化器将选择合适的索引。可以使用explain plan进行查看

- 除非所有查询使用的列被索引或者覆盖列，否则二级索引不会被使用
- 建索引的时候不要包括primary key，否则索引不会被使用；可以单独对primary key建索引
- where条件里有primary key的时候会使用Range Scan，因为表本来就是按照primary key的顺序排列的
- primary key在插入时是自动排序的，插入完成后primary key保持有序（如果该表只有一个分区，则全局有序；如果有多个分区，则在每个分区内部有序，并非全局有序）
- 对某几个（1个或多个）列建索引，则会生成一张索引表，该表由创建索引的这几个列组成，并在最后一列添加primary key列。也就是说索引表也是一张表，只不过该表列数比原表少。
- 索引表的第一列是有序的
- upsert into一个跟之前一样的primary key，会将之前那个primary key的记录替换成新的。
- phoenix虽然不支持update语句，但是可以用upsert into tablename(id,columnname) values(id,newvalue)来实现同样的功能。
- local index 对应的索引表的分区跟表的分区在同一个region server上（索引表分区数必须跟表分区数一样）
- global index 对应的索引表的分区跟表的分区不一定在同一个region server上（索引表分区数必须跟表分区数一样）
- 对一张表建了多个local index，对于HBase来讲，其实只存了一张索引表。但是global index则不同。

## 4. 参考资料
- https://github.com/forcedotcom/phoenix/wiki/Secondary-Indexing
- http://phoenix.apache.org/language/index.html#create_index
- http://blog.csdn.net/jiangshouzhuang/article/details/52387718


![FullStackPlan](http://upload-images.jianshu.io/upload_images/1752522-2e4b0e5141927479.jpg?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)
欢迎关注公众号: FullStackPlan 获取更多干货哦~

    原文作者：Jeffbond
    原文地址: https://www.jianshu.com/p/a3c24638b498
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。