Load balancing

Load balancing the core engine

Indexima core engine nodes are all master nodes, listening on hive port (10000 per default) for inbound queries (Note that the master/slave mode of Indexima, with a single master and slaves nodes, has been deprecated).

Indexima is a distributed SQL engine. Queries sent to a node of the cluster will be distributed to every node of the cluster during execution.

Under normal workload, sending every query to a single Indexima node is acceptable, as the execution will anyhow be distributed. Nevertheless, the node receiving the query will have an extra workload (schematically, to parse the query and aggregate the results). In order to optimize the workload distribution, it is advised to load balance an Indexima cluster, with a network load balancer distributing the queries to every Indexima node.

One of the nodes of the cluster is always elected as 'primary master'. The primary master is in charge of some orchestration activities between the nodes but has no other specificities. In case of failure of the primary node, another node is elected automatically as primary.

Load balancing the console

Indexima console is a light web server, used to administer and monitor an Indexima cluster. The console is not required for the cluster to run and execute queries. The console can communicate with any node of the cluster (no need to target the primary node). For those reasons, it's not required to load balance the access to the console, neither the communication between the console and the core engine.

Load balancing solutions for core engine

Indexima is compatible with most load balancing solutions (Zookeeper, AWS NLB, Nginx, HA Proxy, F5 appliance, ...).
Indexima can run in HTTP transport mode and use an HTTP load balancer (eg AWS ALB) to load balance the sql requests. See documentation on hive transportMode for more details.
Indexima is compatible with any routing algorithms (round-robin, least outstanding requests, ...).
Using sticky sessions on the load balancer is only required if you need the USE SCHEMA function to avoid specifying the schema in subsequent queries.