Using Zookeeper as a load balancer for H/A
To provide high availability or load balancing for INDEXIMA HIVE2 servers aka master or secondary masters, the INDEXIMA CLUSTER provides a function called dynamic service discovery where multiple masters can register themselves within Zookeeper. This avoids connecting the data visualization clients directly to one of the INDEXIMA masters.
INDEXIMA masters register themselves to Zookeeper by adding a
znode, which contains the actual hostname and port of the HIVE2 server instance.
Enabling dynamic service discovery
Update the following properties in the hive-site.xml configuration file to enable dynamic service discovery and to configure Zookeeper.
|hive.server2.support.dynamic.service.discovery||true||Enable dynamic discovery|
|hive.server2.zookeeper.namespace||Indexima||Parent node in ZooKeeper when supporting dynamic discovery|
|hive.zookeeper.quorum||master1:2181, master2:2181...||typical example of|
|hive.zookeeper.client.port||2181||Port of ZooKeeper servers to talk to.|
hive.zookeeper.quorum is used in connection string by JDBC/ODBC clients instead of URI of specific HIVE2 server instances.
hive.zookeeper.client.port is used when Zookeeper servers specified in
hive.zookeeper.quorum do not contain port numbers.
Comment or remove the property
hive.server2.thrift.bind.host defaulted to
0.0.0.0 to avoid registering znode the IP address
Typical example of
<property> <name>hive.server2.zookeeper.namespace</name> <value>indexima</value> </property> <property> <name>hive.zookeeper.quorum</name> <value>master0.co.internal:2181,master1.co.internal:2181,master2.co.internal:2181,master3.co.internal:2181</value> </property>
Any modification of the conf/hive-site.xml file inside the Galactica directory requires a restart of the INDEXIMA CLUSTER.
Data Visualization Client Connections
JDBC/ODBC clients need to use the following connection string to connect to the Indexima masters:
A comma-separated list of Zookeeper Indexima masters that form the ensemble, such as master0:2181, master1:2181, master2:2181...
The namespace on Zookeeper under which HiveServer2
znodesare added. The namespace value is configured in
If you plan to use the keyword synchronize for a remote data source via Zookeeper in an Hortonwork ecosystem, you need to change 2 jar files. The procedure is described in chapter Synchronize Hive database with INDEXIMA dataspace.
Checking issues with Zookeeper
Keep in mind that you need to insert additional overhead when adding ZooKeeper to the connection path of the BI client in the INDEXIMA CLUSTER. A healthy well-sized ZooKeeper will have a minimum impact on your configuration. However, if your clients are struggling to access the INDEXIMA server, verify the number of current connections to Zookeeper by using:
ss -anop | grep 2181 | wc -l
When the number of connections is too high, (i.e. 4000) an application or a server may be over-flooding ZooKeeper servers with connections that risk never closing and make ZooKeeper unstable.
Just keep in mind some facts about Zookeeper.
- ZooKeeper is quite a disk-intensive application. Make sure to have enough resources on the node where you deploy it.
- The ZooKeeper data directory contains snapshot and transactional log files. Purge them, clean them periodically. Make sure the build-in log-rotate of log4j Zookeeper is properly set to automatic.
- ZooKeeper server lists used by clients in their connection strings must match the lists contained on each ZooKeeper server. Unintended behaviors might occur if client and server lists don't match.
- Prevent the Zookeeper from swapping by choosing an adequate Java heap size.