A proxy server is a server or application that acts as an intermediary for requests from clients seeking resources from other servers/applications.
A client connects to the proxy server, requesting some service or resource available from a different server and the proxy server evaluates the request and sends it to the intended server/service/application. The server’s response is returned to the proxy server which in turn returns it to the client.
Currently Cloudera manager does not have proxy and load balancing features, so we have to use an external proxy software of our choice.
In this section, we are going to use “HA Proxy” as our proxy software.
HIVESERVER2:
Before we begin installing “Ha proxy” and configuring for HiveServer2, please ensure that Hiveserver2 is running in more than 1 hosts. If only one instance of HS2 is available, please add one more role of HS2 in another server, so that we make full use of proxy setup.
Now we have HS2 instance running in two hosts, master and standby on the port 10000 (default port).
Let’s login to the server in which you want to set up HAproxy and this server shouldn’t be having a HS2 instances. i.e HS2 instances and proxy server should be different.
[root@server3 ~]# yum install haproxy (If specific version given, haproxy-version)
Got to haproxy config file,
[root@server3 ~]# vi /etc/haproxy/haproxy.cfg #lines starting with ‘#’ sign below are comments for your understanding. You needn’t to mention them in configuration file listen hiveserver2 :10000 #haproxy will listen in port 10000 for hiveserver2 client requests. mode tcp option tcplog balance leastconn #tcp – connection mode between haproxy to hive servers #leastconn – requests will be sent to server with less connection server server1 master:10000 server server2 standby:10000 #first field ‘server’ indicates you’re mentioning the server in the line #second field is a description for your server. You can give any name #third field should contain FQDN of Hiveserver2 host with port number
Now the configuration is done, start the haproxy service and also enable it to start automatically during server startup.
# service haproxy start # chkconfig haproxy on
Now come to the CM – Hive – configuration – search ‘load balancer’ and provide the haproxy server detail and the port (10000) in which haproxy is listening for Hiveserver2.
Click save changes and deploy the client configuration.
To test the proxy connection, connect to hiveserver2 via jdbc using the haproxy server as uri.
# beeline –u ‘jdbc:hive2://server3:10000/default’
If the connection is successful then the proxy setup is done successfully for Hiveserver2.
IMPALA
For impala, we have to do proxy setup for impala daemons and the configuration is similar to the hs2.
Please ensure that impala daemons are running in more than one hosts.
In the haproxy config file, add the below lines,
[root@server3 ~]# vi /etc/haproxy/haproxy.cfg listen impala :21000 #haproxy will listen in port 21000 for impala client requests. mode tcp option tcplog balance leastconn server server1 master:21050 server server2 standby:21050 server server3 slave1:21050 # 21050 is the default port for impala daemon. # You can change it as per impala configuration
Once the haproxy configuration is done, update the impala loadbalancer property.
CM – Impala – configuration – search ‘load balancer’
Impala Load balancer —> server3:21000
To test the proxy connection, use the jdbc connection string as below.
jdbc:impala://server3:21000
Problem Scenarios:
- Setup proxy loadbalancing for hiveserver2
- Configure the haproxy to loadbalance impala daemons
Thus we covered the proxy setup for HS2 and impala.
—
Use the comments section below to post your doubts, questions and feedback.
Please follow my blog to get notified of more certification related posts, exam tips, etc.
the load balancer configuration will be provided ?
listen impala :21000
#haproxy will listen in port 21000 for impala client requests.
mode tcp
option tcplog
balance leastconn
server server1 master:21050
server server2 standby:21050
server server3 slave1:21050
I expect them to give the loadbalancer server name:port number, instances name:port and the mode of loadbalancer (roundrobin, leastconn).
This configuration format is a standard one.