Hive/Impala--HAProxy实现Impala/HiveServer2负载均衡

2023年2月15日 348次阅读来源: BillowX

HAProxy安装

1、在集群中选择一个节点，使用yum方式安装HAProxy服务

yum -y install haproxy

2.启动与停止HAProxy服务，并将服务添加到自启动列表

service haproxy start
service haproxy stop
chkconfig haproxy on

Impala配置

将/etc/haproxy目录下的haproxy.cfg文件备份，新建haproxy.cfg文件，添加如下配置

#---------------------------------------------------------------------
# Example configuration for a possible web application.  See the
# full configuration options online.
#
#   http://haproxy.1wt.eu/download/1.4/doc/configuration.txt
#
#---------------------------------------------------------------------

#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
    log         127.0.0.1 local2

    chroot      /var/lib/haproxy
    pidfile     /var/run/haproxy.pid
    maxconn     4000
    user        haproxy
    group       haproxy
    daemon

    # turn on stats unix socket
    stats socket /var/lib/haproxy/stats

#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
    mode                    http
    log                     global
    option                  httplog
    option                  dontlognull
    #option http-server-close
    #option forwardfor       except 127.0.0.0/8
    option                  redispatch
    retries                 3
    timeout http-request    10s
    timeout queue           1m
    timeout connect         10s
    timeout client          1m
    timeout server          1m
    timeout http-keep-alive 10s
    timeout check           10s
    maxconn                 3000


listen stats
    bind 0.0.0.0:1080
    mode http
    option httplog
    maxconn 5000
    stats refresh 30s
    stats  uri /stats 

listen impalashell
    bind 0.0.0.0:25003
    mode tcp
    option tcplog
    balance leastconn
    server hadoop1 hadoop1:21000 check
    server hadoop2 hadoop2:21000 check
    server hadoop3 hadoop3:21000 check

listen impalajdbc
    bind 0.0.0.0:25004
    mode tcp
    option tcplog
    balance leastconn
    server hadoop1 hadoop1:21050 check
    server hadoop2 hadoop2:21050 check
    server hadoop3 hadoop3:21050 check

主要配置了HAProxy的http状态管理界面、impalashell和impalajdbc的负载均衡。
配置完成后重启HAProxy

service haproxy restart

浏览器访问http://{hostname}:1080/stats查看状态界面

Impala Shell测试

使用多个终端同时访问，并执行SQL语句，查看是否会通过HAProxy服务自动负载到其它Impala Daemon节点
使用Impala shell访问HAProxy服务的25003端口，命令如下

impala-shell -i hadoop1:25003

打开第一个终端访问并执行SQL

impala-shell -i hadoop1:25003
...
select * from my_first_table;
...
+----+------+
| id | name |
+----+------+
| 1  | john |
| 2  | tom  |
| 3  | jim  |
+----+------+
Fetched 3 row(s) in 7.20s

同时打开第二个终端访问并执行SQL

impala-shell -i hadoop1:25003
...
select * from my_first_table;
...
+----+------+
| id | name |
+----+------+
| 1  | john |
| 2  | tom  |
| 3  | jim  |
+----+------+
Fetched 3 row(s) in 7.20s

通过以上测试可以看到，两个终端执行的SQL不在同一个Impala Daemon，这样就实现了Impala Daemon服务的负载均衡。

Impala JDBC访问

url改变为haproxy的host以及impala jdbc负载均衡配置的端口：

jdbc:impala://hadoop1:25004

Hive Server2负载均衡

编辑/etc/haproxy/haproxy.cfg文件，在文件末尾增加如下配置

listen hivejdbc
    bind 0.0.0.0:25005
    mode tcp
    option tcplog
    balance leastconn
    server hadoop1 hadoop1:10000 check
    server hadoop2 hadoop2:10000 check

重启HAProxy服务

service haproxy restart

若集群做了kerberos认证，那么需要在cm的hive配置页里面搜索：HiveServer2 Load Balancer
修改参数值为：hadoop1:25005
保存配置，回到CM主页根据提示重启相应服务。

Beeline测试

使用Beeline访问HAProxy服务的25005端口，命令如下

[root@hadoop1 ~]# beeline 
beeline> !connect jdbc:hive2://hadoop1:25005
...
Enter username for jdbc:hive2://hadoop1:25005: hive
Enter password for jdbc:hive2://hadoop1:25005:

Hive JDBC连接

url改变为haproxy的host以及hive jdbc负载均衡配置的端口：

jdbc:hive2://hadoop1:25005

    原文作者：BillowX
    原文地址: https://www.jianshu.com/p/b9840bea1ba8
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。