Phoenix安装
环境要求
Hadoop,Zookeeper,HBase都安装完成下载
官网下载地址:http://phoenix.apache.org/download.html
找到对应HBase的版本上传到Master节点,解压并配置
$ tar -zxvf apache-phoenix-4.14.0-HBase-1.1-bin.tar.gz
环境变量
# For Phoenix
export PHOENIX_HOME=/home/hadoop/phoenix/apache-phoenix-4.14.0-HBase-1.1-bin
export PHOENIX_CLASSPATH=$PHOENIX_HOME
export PATH=$PHOENIX_HOME/bin:$PATH
- 拷贝jar包
进入到phoenix的安装目录,找到phoenix-4.14.0-HBase-1.1-server.jar
,拷贝到HBase各个节点的$HBASE_HOME/lib
下
==重新启动HBase==
- 启动phoenix
启动脚本sqlline.py
,参数是Zookeeper节点
$ cd $PHOENIX_HOME/bin
$ sqlline.py zk01,zk02,zk03:2181
Setting property: [incremental, false]
Setting property: [isolation, TRANSACTION_READ_COMMITTED]
issuing: !connect jdbc:phoenix:172.23.5.48,172.23.5.47,172.23.5.46:2181 none none org.apache.phoenix.jdbc.PhoenixDriver
Connecting to jdbc:phoenix:172.23.5.48,172.23.5.47,172.23.5.46:2181
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/phoenix/apache-phoenix-4.14.0-HBase-1.1-bin/phoenix-4.14.0-HBase-1.1-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
18/06/26 14:25:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Connected to: Phoenix (version 4.14)
Driver: PhoenixEmbeddedDriver (version 4.14)
Autocommit status: true
Transaction isolation: TRANSACTION_READ_COMMITTED
Building list of tables and columns for tab-completion (set fastconnect to true to skip)...
133/133 (100%) Done
Done
sqlline version 1.2.0
到这里我们就已经进入到 phoenix 的终端了
简单使用
使用官网上的一个例子来建表、导入数据、查询
安装包里有样例数据../example
路径下
drwxr-xr-x 2 hadoop hadoop 36 Jun 5 05:36 pig
-rw-r--r-- 1 hadoop hadoop 149 Jun 5 05:36 STOCK_SYMBOL.csv
-rw-r--r-- 1 hadoop hadoop 183 Jun 5 05:36 STOCK_SYMBOL.sql
-rw-r--r-- 1 hadoop hadoop 2091 Jun 5 05:36 WEB_STAT.csv
-rw-r--r-- 1 hadoop hadoop 563 Jun 5 05:36 WEB_STAT_QUERIES.sql
-rw-r--r-- 1 hadoop hadoop 296 Jun 5 05:36 WEB_STAT.sql
WEB_STAT.sql
CREATE TABLE IF NOT EXISTS WEB_STAT (
HOST CHAR(2) NOT NULL,
DOMAIN VARCHAR NOT NULL,
FEATURE VARCHAR NOT NULL,
DATE DATE NOT NULL,
USAGE.CORE BIGINT,
USAGE.DB BIGINT,
STATS.ACTIVE_VISITOR INTEGER
CONSTRAINT PK PRIMARY KEY (HOST, DOMAIN, FEATURE, DATE)
);
WEB_STAT.csv
NA,Salesforce.com,Login,2013-01-01 01:01:01,35,42,10
EU,Salesforce.com,Reports,2013-01-02 12:02:01,25,11,2
EU,Salesforce.com,Reports,2013-01-02 14:32:01,125,131,42
NA,Apple.com,Login,2013-01-01 01:01:01,35,22,40
NA,Salesforce.com,Dashboard,2013-01-03 11:01:01,88,66,44
WEB_STAT_QUERIES.sql
SELECT DOMAIN, AVG(CORE) Average_CPU_Usage, AVG(DB) Average_DB_Usage
FROM WEB_STAT
GROUP BY DOMAIN
ORDER BY DOMAIN DESC;
-- Sum, Min and Max CPU usage by Salesforce grouped by day
SELECT TRUNC(DATE,'DAY') DAY, SUM(CORE) TOTAL_CPU_Usage, MIN(CORE) MIN_CPU_Usage, MAX(CORE) MAX_CPU_Usage
FROM WEB_STAT
WHERE DOMAIN LIKE 'Salesforce%'
GROUP BY TRUNC(DATE,'DAY');
-- list host and total active users when core CPU usage is 10X greater than DB usage
SELECT HOST, SUM(ACTIVE_VISITOR) TOTAL_ACTIVE_VISITORS
FROM WEB_STAT
WHERE DB > (CORE * 10)
GROUP BY HOST;
执行语句
$ ../bin/psql.py zk01,zk02,zk03 WEB_STAT.sql WEB_STAT.csv WEB_STAT_QUERIES.sql
no rows upserted
Time: 2.51 sec(s)
csv columns from database.
CSV Upsert complete. 39 rows upserted
Time: 0.146 sec(s)
DOMAIN AVERAGE_CPU_USAGE AVERAGE_DB_USAGE
---------------------------------------- ---------------------------------------- ----------------------------------------
Salesforce.com 260.727 257.636
Google.com 212.875 213.75
Apple.com 114.111 119.556
Time: 0.093 sec(s)
DAY TOTAL_CPU_USAGE MIN_CPU_USAGE MAX_CPU_USAGE
----------------------- ---------------------------------------- ---------------------------------------- ----------------------------------------
2013-01-01 00:00:00.000 35 35 35
2013-01-02 00:00:00.000 150 25 125
2013-01-03 00:00:00.000 88 88 88
2013-01-04 00:00:00.000 26 3 23
2013-01-05 00:00:00.000 550 75 475
2013-01-06 00:00:00.000 12 12 12
2013-01-08 00:00:00.000 345 345 345
2013-01-09 00:00:00.000 390 35 355
2013-01-10 00:00:00.000 345 345 345
2013-01-11 00:00:00.000 335 335 335
2013-01-12 00:00:00.000 5 5 5
2013-01-13 00:00:00.000 355 355 355
2013-01-14 00:00:00.000 5 5 5
2013-01-15 00:00:00.000 720 65 655
2013-01-16 00:00:00.000 785 785 785
2013-01-17 00:00:00.000 1590 355 1235
Time: 0.05 sec(s)
HO TOTAL_ACTIVE_VISITORS
-- ----------------------------------------
EU 150
NA 1
Time: 0.036 sec(s)
登录hbase的命令行界面时,会发现多了几张表
hbase(main):001:0> list
TABLE
SYSTEM.CATALOG
SYSTEM.FUNCTION
SYSTEM.LOG
SYSTEM.MUTEX
SYSTEM.SEQUENCE
SYSTEM.STATS
WEB_STAT
使用SQL查询表
0: jdbc:phoenix:zk01> select * from WEB_STAT limit 10;
+-------+-----------------+------------+--------------------------+-------+------+-----------------+
| HOST | DOMAIN | FEATURE | DATE | CORE | DB | ACTIVE_VISITOR |
+-------+-----------------+------------+--------------------------+-------+------+-----------------+
| EU | Apple.com | Mac | 2013-01-01 01:01:01.000 | 35 | 22 | 34 |
| EU | Apple.com | Store | 2013-01-03 01:01:01.000 | 345 | 722 | 170 |
| EU | Google.com | Analytics | 2013-01-13 08:06:01.000 | 25 | 2 | 6 |
| EU | Google.com | Search | 2013-01-09 01:01:01.000 | 395 | 922 | 190 |
| EU | Salesforce.com | Dashboard | 2013-01-06 05:04:05.000 | 12 | 22 | 43 |
HBase shell scan
hbase(main):006:0> scan 'WEB_STAT',{LIMIT=>5}
ROW COLUMN+CELL
EUApple.com\x00Mac\x00\x80\x00\x01;\xF3\xA04\xC8 column=STATS:\x80\x0D, timestamp=1530008199259, value=\x80\x00\x00"
EUApple.com\x00Mac\x00\x80\x00\x01;\xF3\xA04\xC8 column=USAGE:\x00\x00\x00\x00, timestamp=1530008199259, value=x
EUApple.com\x00Mac\x00\x80\x00\x01;\xF3\xA04\xC8 column=USAGE:\x80\x0B, timestamp=1530008199259, value=\x80\x00\x00\x00\x00\x00\x00#
EUApple.com\x00Mac\x00\x80\x00\x01;\xF3\xA04\xC8 column=USAGE:\x80\x0C, timestamp=1530008199259, value=\x80\x00\x00\x00\x00\x00\x00\x16
映射到存在的HBase表
创建一张hbase表
> create 't1', 'f'
> put 't1', "row1", 'f:q', 1
> put 't1', "row2", 'f:q', 2
> put 't1', "row3", 'f:q', 3
> put 't1', "row4", 'f:q', 4
> put 't1', "row5", 'f:q', 5
在phoenix建一张同样的表
t1、row、f、q 需要用双引号括起来,原因主要是大小写的问题
./sqlline.py localhost
CREATE TABLE IF NOT EXISTS "t1" (
"row" VARCHAR NOT NULL,
"f"."q" VARCHAR
CONSTRAINT PK PRIMARY KEY ("row")
);
可以使用SQL查询t1表