The file "galactica.conf", located in your /<indexima_install_folder>/galactica/conf folder, is used to configure the Indexima cluster in various environments with multiple security mechanisms while allowing to tune Indexima Data Engine for optimum performance. "galactica.conf" also helps with troubleshooting by enabling several levels of debugging modes.

Parameter modifications in the "galactica.conf" file are not dynamically applied. They require restarting the Indexima cluster to be taken into account.

Some parameters are specified as dynamic meaning that they can be altered by a query afterward. The HSQL command SET_ followed by one space can prefix the "galactica.conf" parameter with the new value.

# static parameter
result.max_size = 512
# dynamic parameter
SET_ result.max_size = 256

Modifications done using such a dynamic method do not require restarting the Indexima cluster. However, all changes are lost if the Indexima cluster is restarted. In this case, altered parameters revert to the values specified in "galactica.conf" or to the default values when not specified.

Dynamic parameter changes are preserved when issuing a cluster INIT command.

Parameters

Below are listed all possible parameters for "galactica.conf", with their default value and a quick description.

General

Nodes

ParameterType Choices/DefaultDescriptionDynamic command

nodes

required

string

List of Indexima active nodes separated by a comma ','

Prefer FQDN instead of IP address.

The first node in the list is the Indexima master. It will host the Hive2 server and will be the entry point for all Indexima clients.

example: Single node cluster
               nodes = localhost
               Multinode cluster. node-1.intra is Master
               nodes = node-1.intra, node-2.intra, node-3.intra

No

nodes.requestedinteger0

When using dynamic mode, this parameter replaces the nodes parameter where you have to list the IP of all the Indexima nodes of the cluster.

In the nodes.requested, just specify the number of nodes in your cluster.

No

nodes.connect.min-nodes

integer2

Minimum number of nodes member to start in an Indexima cluster before starting transactions with Indexima clients.

This option permits to do not wait that an Indexima cluster has all node members up and running to start using it.

No

Other

ParameterType Choices/DefaultDescriptionDynamic command

warehouse

required

string

Absolute path for storage for HyperIndex and Data Spaces.

No

warehouse.protocol

required

string
  • HDFS
  • LOCAL
  • S3 

Storage protocol to address the Indexima warehouse.
example: warehouse.protocol = HDFS

No

warehouse.shared

Deprecated since 2021.1

string
  • true
  • false

When true, warehouse.shared indicates that the Indexima warehouse is sitting on local storage that is shareable with all members of the Indexima cluster.
example: 

One unique filesystem is mounted on each Indexima node member (ie. NFS):

warehouse.shared = true

The warehouse is sitting on the filesystem of each Indexima node and is not visible from the other nodes of the Indexima cluster:

warehouse.shared = false

The warehouse is hosted on HDFS or S3:

warehouse.shared = true

No

cores

required

integer8

Number of CPU cores used by Indexima Data Engine. Indexima only supports an equal number of cores used per node. If you have nodes with a different number of cores between each other, you must set this parameter to the lowest common denominator.
example: cores = 8

No

readers

required

integer4

Max number of thread used for reading queries (SELECT typically).

It must be multiple of the number of cores

It is recommended to set it to cores / 2

example: readers = 4

No

loaders

required

integer4

Max number of threads used for loading queries (LOAD DATA typically). This is also the max number of load queries that can be run in parallel.

It must be multiple of the number of cores

It is recommended to set it to cores / 2

example: loader = 4

No

queries

required

integer32Max number of total threads used for querying
example: queries = 32
No

Storage related parameters

General

ParameterTypeChoices/DefaultDescriptionDynamic command

partitions

required

integer 

Number of shards (a type of horizontal partitions) across potentially multiple instances of the schema.

It is highly recommended to set partitions according to the formula: partitions = cores * number of nodes.

You can oversize it if you want to anticipate a cluster size change.

You can not undersize it.

If you change the number of partitions, all your tables must be dropped and recreated, and all data reloaded.
example: 
For a 4 nodes cluster with 8 cores per node:
               partitions = 32   

No

dimension.partitionsinteger8Number of partitions used for dimension tables (tables without HyperIndex).
example: 
For a 4 nodes cluster with 8 cores per node:
               dimension.partitions = 8
No
export.partition.size.mbinteger10000

Defines the maximum size in Mb of files generated during an export.

No

pages.oneFilePerColumnbooleantrue

When true, Indexima creates one file per column to take advantage of the Indexima K-Store technology. So when using local/shared file systems or cloud file-systems such as S3, this parameter must be set to true.

When false, Indexima forces K-Store to store all columns in a single file to limit the number of open files. This is used when the Indexima warehouse is located on an HDFS file system.

No

S3

ParameterTypeChoices/DefaultDescriptionDynamic command
warehouse.s3.compatibleboolean

false

If you are using your own fake S3 server, set this parameter to true. Otherwise, S3 connections are considered genuine AWS S3.No
warehouse.s3.endpointstring
When warehouse.s3.compatible is set to true, you must provide the endpoint URL of your S3 server.
example: warehouse.s3.endpoint = https://my_minio:9000
No
warehouse.s3.bucketstring
Define the name of the bucket to be used by Indexima when using a fake S3 connection.No
warehouse.s3.httpsboolean

false

Set to true if your fake S3 server is using an encrypted https connectionNo
warehouse.s3.cert.checkboolean

false

When set to true, the S3 connection will fail if the used SSL certificate is insecure. If you are using a self-signed certificate, leave this parameter to false.No
s3.max.retryinteger40Number of retries on s3 failure.No
warehouse.s3.keystring
Amazon S3 directory where files are stored for S3 compatible FSNo

Connectivity toward Indexima

ParameterTypeChoices/DefaultDescriptionDynamic command

node.port

int19 999 

Port used by the Indexima Galactica engine to communicate with the other nodes of the Indexima Cluster.

No

webui.port


9999

Port used by the Indexima Monitor Console. This webserver aggregates logs, queries history, and hosts Indexima Analyzer, among other features.
For more information about the Indexima Monitor, refer to the Indexima Monitor Documentation

No

webui.ssl.enable

booleanfalse

When true, The Indexima Monitor Console uses a secured SSL HTTPS protocol to connect.

webui.ssl.keystore.location and webui.ssl.keystore.password must be previously set up to enable this feature.

When false, the standard HTTP protocol is used.

No

webui.ssl.keystore.location

string

Specify the location of the SSL certificates Keystore to secure the Indexima Monitor console access with HTTPS
example: webui.ssl.keystore.location = /etc/indexima/keystore/mycluster.indexima.co.keystore

No

webui.ssl.keystore.password

string

Define the password associated with the Indexima Keystore. 

No

monitor.api.keystring
Service API key between Indexima cluster and Developer Console.

heartbeat

integer300 000

Time in milliseconds for a full round-trip heartbeat packet between the Indexima master and each worker node. The default value is 5 minutes. 

No

Indexima Connectivity toward external datasources

ParameterTypeChoices/DefaultDescriptionDynamic command

jdbc.timeout.seconds

integer60Timeout value of the JDBC driver connection in seconds. This is the maximum time allowed for the data source to give a response, not the maximum time the query can take.

No

jdbc.load.fetch.size

integer1000Specify the number of rows fetched with each database round trip for a query when importing data from an external JDBC source.

Yes

jdbc.query.timeout.seconds

integer60 

Timeout for statement JDBC execution in seconds

Yes

jdbc.create.field.restriction

boolean

true

When set to true, Indexima replaces some characters on the fly to avoid wrong interpretation of incoming data.

Character substitution chart

CharacterSubstituted by
space ' 'underscore _
simple quote 'underscore _
dot .No character
comma ,No character
parenthesis ( and )No character
slash /dash -
Yes
 

Logs and history

ParameterTypeChoices/DefaultDescriptionDynamic command

log.dir

string/home/indexima/log

Path to the folder where the indexima logs are written

No

log.insertinteger1 000 000During a "load" command, log message every N insertsNo
log.levelstringINFOLog level of the clusterNo
history.countinteger500
No
history.flushinteger10Max number of queries before flushing history to diskNo
history.dirstring/home/indexima/historyQuery history directoryNo
history.exportstring/home/indexima/history-exportCSV format Query history export directoryNo
hive.log.thresholdstringERRORHive log thresholdNo
hive.log.dirstring/home/indexima/history-exportHive log directoryNo

Memory Usage Parameters

Tables & Indexes

ParameterTypeChoices/DefaultDescriptionDynamic command
index.memory.max_size.mbinteger1024

index.memory.max_size defines the maximum size (in Mb) of each index, per node.

This parameter limits the size of an HyperIndex to prevent running out of memory if the cardinality of the selected index is too high.

Yes
table.memory.max_size.mbinteger4096

Set the maximum size of a dataspace (table) per node. The size is in Mb. This parameter prevents running out of memory when using multiple large tables.

Yes
limited.memory.max_size.mbinteger1024Maximum size of a limited table(in Mb).Yes
table.prefetch.lastdays
2Preload indexes that have been used during the last N daysYes

Big indexes

ParameterTypeChoices/DefaultDescriptionDynamic command
big.index.enablebooleanFALSEEnable big index featureYes
big.index.limit.mblong5000Memory limit for big index behaviourYes
big.index.max.active.partitionsinteger2

Max active partition allowed for queries on a big index

No
big.index.swap.pathstringswapDefault location for index data swapYes

Queries

ParameterTypeChoices/DefaultDescriptionDynamic command

query.max

long100 000Maximum number of bytes of a SQL Statement

No

result.max_size.mblong256Define the maximum size in MB of the queries computed resultsYes
global.result.max_size.mblong512Define the global maximum size in MB to store the calculated results of ALL the queries at one moment.
The guideline to set this parameter is the number of nodes of the cluster times result.max_size. The minimum value recommended is 512MB.
Yes
select.timeout.ms

long-1

Timeout in milliseconds of query execution. If the query lasts more than the timeout time, the system will stop the query with the error Timeout

Yes
join.memory.max_size.mbinteger12Maximum size allocated for one line join computation(in Mb).Yes
cognos.limit.selectInteger-1Add an implicit limit on SELECT queries without GROUP BY for Cognos usage. Value -1 means disabled.
queriesinteger32Number of threads used for queryingNo
queries.high-costinteger5Number of high cost queries that may run at the same timeNo
queries.high-cost.frozeninteger5Maximum number of frozen high cost queriesYes
queries.hybrid.ratiodouble0.25Ratio of hybrid queries allowed to run concurrentlyNo
query.high-cost.memory.mblong128Memory size threshold for a query to become high costYes

Cache

ParameterTypeChoices/DefaultDescriptionDynamic command
cache.master.mbinteger0 

Define a certain amount of RAM in MB allocated to cache Hive 2 queries addressed by Indexima Data Hub for better performance

Yes
cache.master.min_exec.msinteger1000

Define the threshold upon a query result would be put into the cache

Yes

Load

ParameterTypeChoices/DefaultDescriptionDynamic command
insert.queue.mem.size.mbinteger256During a load, Maximum insert queue memory size (in Mb)Yes

Miscellaneous

ParameterTypeChoices/DefaultDescriptionDynamic command
memory.coeffloat0.85

Coefficient of max heap memory used for the Hyperindexes

No

Security

General

ParameterTypeChoices/DefaultDescriptionDynamic command

impersonation

booleanfalse

When impersonation = true, the Indexima account is used to grant access to shared storage on each node of the Indexima cluster

No

webui.authenticate

booleanfalse

when webui.authenticate = false, there is no access control for the Indexima MONITOR located at <Your_URL>:9999

when webui.authenticate = true and when hive-site.xml contains the address and access rules of an LDAP server, Indexima MONITOR access is granted only if the specified user at login time of Indexima ADMIN console is authenticated by the LDAP service.

No

session.users

string
Allow users listed in session.user to connect to Indexima Data Engine without Kerberos authentication.
example: session.users = user1, user2, user3
No

session.passwords

string

List of passwords for users in session.user. Passwords are attributed to the user in the corresponding list index
example: session.passwords =pass1, pass2, pass3

In the example above, user1 has password "pass1", user2 has password "pass2", and user3 has password "pass3".

No

users.in.admin.role

string

List of users that have full rights to administer and run Indexima cluster. Indexima lists 2 types of users: administrators (they are listed with this parameter) and the users (all the others)
example: users.in.admin.role = user1, user2

If Ranger : You must also include the system user used to run Indexima process.

No
webui.rightsbooleanfalseMonitor Roles enablerNo

Ranger

ParameterTypeChoices/DefaultDescriptionDynamic command

privilege.driver.name

string
Java class name of the Indexima plugin driver
example: privilege.driver.name = io.galactica.ranger.client.RangerIndeximaDriver
No

privilege.driver.property.servicetype

string
Set the Ranger property servicetype to be used with the Indexima Ranger plug-in. This property will be used to display Service name of the plugin in Ranger GUI.
example: privilege.driver.property.servicetype = indexima
No

privilege.driver.property.appid

string
Set the Ranger property appid to be used with the Indexima Ranger plug-in. This property identifies the application ID
example: privilege.driver.property.appid = indexima
No

SSL

ParameterTypeChoices/DefaultDescriptionDynamic command

node.ssl.enable

booleanfalseWhen true, encrypt data on the private Indexima network, enabling Inter-Node SSL keystoreNo

node.ssl.keystore.location

string
Indicates the path of the Inter-Node SSL keystore location
example: node.ssl.keystore.location = /path/to/my_keystore
No

node.ssl.keystore.password

string
Specify the password associated with the Inter-Node SSL keystore defined at node.ssl.keystore.location
example: node.ssl.keystore.password = my_keystore_password
No

Misc

ParameterTypeChoices/DefaultDescriptionDynamic command
allow.create.if-selectbooleanfalseTable creation is allowed only if select on this table is granted.No

Analyzer

ParameterTypeChoices/DefaultDescriptionDynamic command

analyzer.hits

integer3

Let you define the default required hits when you enter the Analyzer page.

example: analyzer.hits = 5

UI

analyzer.days

integer30

Let you define the default number of days when you enter the Analyzer page.

example: analyzer.days = 10

UI

analyzer.cardinality

integer4

Let you define the default cardinality when you enter the Analyzer page.

example: analyzer.cardinality = 6

UI

sampling.external.max

integer100 000 000The maximum number of lines to use for the sampling in the Analyzer if the table is external.UI

sampling.internal.max

integer10 000The maximum number of lines to use for the sampling in the Analyzer if the table is not external.UI

analyzer.evaluation.cardinality.small

integer10 (percent)Cardinality for small level index. If the cardinality computes for an index is below this value, the index is considered small.No

analyzer.evaluation.cardinality.medium

integer20 (percent)Cardinality for medium-level index. If the cardinality computes for an index is below this value, the index is considered medium.No

analyzer.evaluation.cardinality.big

integer 40 (percent)Cardinality for big level index. If the cardinality computes for an index is below this value, the index is considered as big. Otherwise, the index is considered dangerous and should be avoided.No
analyzer.cache.size.byteslong1 000 000 Cache size for evaluating cardinalities of expressions during the analysisNo
analyzer.modestringINCREMENTAL
Yes
analyser.history.duration.daysinteger30
Yes
analyser.default-merge-policystringMAX_INDEX/COEF

Default policy used to merge indexes (COEF, MAX_INDEX).

  • COEF: The user can use a coefficient between 0 & 100 in the analyzer in order to get a few or plenty of hyperindexes suggestions
  • MAX_INDEX: a maximum of analyser.default-max-expected-indexes will be suggested
Yes
analyser.default-max-expected-indexesinteger8Maximum targeted index count when use MAX_INDEX merge index policy.Yes

YARN

ParameterTypeChoices/DefaultDescriptionDynamic command

yarn.resourcemanager.hostname

stringvalue in yarn-site.xml

Hostname of the Yarn Resource Manager to call in order to create the Indexima Yarn Application

example: yarn.resourcemanager.hostname = localhost

No

yarn.memory.mb

integer1024

Allocates memory (in mb) to run the Indexima data engine. This value must be greater than java heap (GALACTICA_MEM) + 20%

example: yarn.memory = 40000 (40 GB)

No

yarn.memory.master.mb

integer256

Memory size (in mb) of the master container. This value must be the lower possible but greater than the value set by Hadoop or an error is fired by YARN.

example: yarn.memory.master = 128

No

yarn.dir

string

hdfs://localhost:8020/user/` + System.getenv("USER") + `/indexima

Define the HDFS directory used to share the application binaries and configuration files when deploying an Indexima cluster in Hadoop.

example: yarn.dir = hdfs://localhost:8020/tmp/indexima

No

yarn.name

stringIndexima

Name of the Indexima cluster

example: yarn.name = indexima_prod

No

yarn.kerberos

booleanfalse

Must be true if the Indexima data engine is running in an Hadoop cluster secured by Kerberos

example: yarn.kerberos = true

No

yarn.relax.locality

booleanfalse

Defines the YARN relax locality policy during deployment.

example: yarn.relax.locality = true

No

yarn.racks

string
Defines the rack to use during deploymentNo

High Availability

ParameterTypeChoices/DefaultDescriptionDynamic command
nodes.connect.restart.delay.secondsinteger10When a new node connects to the cluster, delay in seconds before the cluster re-initializes itself. 0 to disable automatic re-initialization. (dynamic cluster only)Yes

high-availability

Deprecated since 1.7.11

integer1

Enable High availability and define the number of Indexima master participating to the Indexima cluster. Each master also hosts the Indexima Hive2 server. A node fails when there is no more communication when solicited for queries. The remaining Indexima masters take over to continue to provide the Indexima Hive2 service. A load balancer such as Zookeeper is required to provide a dynamic path recovery.

High availability housekeeping such as PING or INIT can be issued from the Indexima MONITOR at <your_server>:9999 or by using start_node.sh script for stand alone AND Hadoop deployment.

example: high-availability = 2
               Node1 will be the primary master
               Node2 and Node3 will be secondary masters

This parameter is deprecated. If you want to activate high availability, use the dynamic mode High-Availability

No

Spill To Disk

ParameterTypeChoices/DefaultDescriptionDynamic command

spill.enable

booleanfalse

 Enable spill to disk function

Yes

spill.memory.size.mblong256

Memory size taken to process queries before being spilled on disk. When the size of query elements is greater than this value, queries will start spilling on disk.

Yes

spill.disk.path

string/tmp/indexima-cache

Folder on the disk of each node where elements will be spilled.

Yes
request.result.max.chunk.sizeInteger10000Maximum number of lines in each chunk when a large output result is spilled to disk.Yes

External Tables

General

ParameterTypeChoices/DefaultDescriptionDynamic command

external.field.max_size

integer127when the field name or alias name size is bigger than external.field.max_size then  field name or alias name is renamed to be shortenedYes

Synchronization

ParameterTypeChoices/DefaultDescriptionDynamic command
external.synchronize.consistency.enablebooleanfalseexecute a synchronize or not before adding a new indexYes

Automatic Synchronization

Only available for Snowflake datasource

ParameterTypeChoices/DefaultDescriptionDynamic command
external.synchronize.check.cronstringempty

Cron expression for table synchronization check, more precise than external.synchronize.check.rate

Yes
external.synchronize.check.rateinteger0Number of seconds between external table synchronization checkYes
external.synchronize.check.userstringadminThe Indexima user who will run the SYNCHRONIZE during the automatic updateNo

Smart Tables

Smart Tables Process parameters

ParameterTypeChoices/DefaultDescriptionDynamic command

analyser.smart.metrics.days

integer15The number of sliding days to pickup queries for analysis.Yes
analyser.smart.optimizerstringLast month tuningDefault optimizer to use. Can be overridden for each table.  Values are defined in optimize_index.json.Yes
analyser.smart.scheduling.cronstring<empty>Define when the process of analysis and index creation startsYes
analyser.smart.scheduling.duration.minutesinteger120Maximum duration of smart tables analysisYes
analyser.smart.threadsinteger2Number of threads used for running smart tables indexation.No
analyser.smart.max.indexesinteger20Maximum number of indexes for a smart table.Yes

Smart Tables weight parameters

ParameterTypeChoices/DefaultDescriptionDynamic command
analyser.smart.threshold.slow.msinteger2000Speed threshold where a request is considered slowNo
analyser.smart.weightinteger1Default weight for score computationYes
analyser.smart.weight.bucketinteger1Weight of the K-Store query for score computation Yes
analyser.smart.weight.speedinteger1Weight of the speed ratio for score computationYes

analyser.smart.weight.delegate

integer1Weight of query delegated to the underlying table for score computationYes
analyser.smart.weight.sizeinteger1Weight of the table size ratio for score computationYes
analyser.smart.weight.trafficinteger1Weight of the traffic ratio for score computationYes

Miscellaneous

ParameterTypeChoices/DefaultDescriptionDynamic command
notification.check.cronstringemptySET_ notification.check.cron = "0 0/15 * * * ?" // every 15 minutesYes

nodes.connect.timeout.seconds

integer120

Number of seconds to wait before starting the Indexima cluster with a missing node member.

This option allows starting a stand-alone Indexima cluster without taking the precaution to start all workers before the master.

No

powerbi.impersonate.field

stringempty

Define a field name. If that field name is used in a where clause of a query as field='value', it will result in the fact the string contained in the operand will be considered as the actual user executing the query.

The field name can be any field name provided it doesn't already exist as an actual field of the table.

To be used in a query, this field name needs to exists. Thus it is compulsory to add the virtual field in the table.

Example: powerbi.impersonate.field = idx_powerbi_impersonate

ALTER TABLE mytable ADD COLUMNS (idx_powerbi_impersonate as 'X');
SQL

when executing this query

SELECT * FROM mytable WHERE idx_powerbi_impersonate ='test';
SQL

it would be as if a user named 'test' executed the query and idx_powerbi_impersonate ='test' is considered as 1=1

No

timestamp.precision 

stringHOUR/MINUTE/SECOND/DAYDefault precision of timestamp fields at table creationYes
history.maxinteger20 000Maximum number of queries loaded for previous day display in webUINo
error.maxinteger10 000Maximum number of errors during a loadYes