galactica.conf

The file "galactica.conf", located in your /<indexima_install_folder>/galactica/conf folder, is used to configure the Indexima cluster in various environments with multiple security mechanisms while allowing to tune Indexima Data Engine for optimum performance. "galactica.conf" also helps with troubleshooting by enabling several levels of debugging modes.

Parameter modifications in the "galactica.conf" file are not dynamically applied. They require restarting the Indexima cluster to be taken into account.

Some parameters are specified as dynamic meaning that they can be altered by a query afterward. The HSQL command SET_ followed by one space can prefix the "galactica.conf" parameter with the new value.

# static parameter
result.max_size = 512
# dynamic parameter
SET_ result.max_size = 256

Modifications done using such a dynamic method do not require restarting the Indexima cluster. However, all changes are lost if the Indexima cluster is restarted. In this case, altered parameters revert to the values specified in "galactica.conf" or to the default values when not specified.

Dynamic parameter changes are preserved when issuing a cluster INIT command.

Parameters

Below are listed all possible parameters for "galactica.conf", with their default value and a quick description.

General

Nodes

Parameter Type Choices/Default Description Dynamic command

nodes

required

string

List of Indexima active nodes separated by a comma ','

Prefer FQDN instead of IP address.

The first node in the list is the Indexima master. It will host the Hive2 server and will be the entry point for all Indexima clients.

example: Single node cluster
nodes = localhost
Multinode cluster. node-1.intra is Master
nodes = node-1.intra, node-2.intra, node-3.intra

No

nodes.requested

integer

0

When using dynamic mode, this parameter replaces the nodes parameter where you have to list the IP of all the Indexima nodes of the cluster.

In the nodes.requested, just specify the number of nodes in your cluster.

No

nodes.connect.min-nodes

integer

2

Minimum number of nodes member to start in an Indexima cluster before starting transactions with Indexima clients.

This option permits to do not wait that an Indexima cluster has all node members up and running to start using it.

No

Other

Parameter	Type	Choices/Default	Description	Dynamic command
warehouse required	string		Absolute path for storage for HyperIndex and Data Spaces. HDFS path in Hadoop warehouse = hdfs://your_hdfs_server:8020/tmp/galactica local path when Indexima is setup in stand-alone mode warehouse = /home/indexima Amazon URI or S3 path when Amazon cloud is used warehouse = s3a://your_bucket/your_object_id	No
warehouse.protocol required	string	HDFS LOCAL S3	Storage protocol to address the Indexima warehouse. example: warehouse.protocol = HDFS	No
warehouse.shared Deprecated since 2021.1	string	true false	When true, warehouse.shared indicates that the Indexima warehouse is sitting on local storage that is shareable with all members of the Indexima cluster. example: One unique filesystem is mounted on each Indexima node member (ie. NFS): warehouse.shared = true The warehouse is sitting on the filesystem of each Indexima node and is not visible from the other nodes of the Indexima cluster: warehouse.shared = false The warehouse is hosted on HDFS or S3: warehouse.shared = true	No
cores required	integer	8	Number of CPU cores used by Indexima Data Engine. Indexima only supports an equal number of cores used per node. If you have nodes with a different number of cores between each other, you must set this parameter to the lowest common denominator. example: cores = 8	No
readers required	integer	4	Max number of thread used for reading queries (SELECT typically). It must be multiple of the number of cores It is recommended to set it to cores / 2 example: readers = 4	No
loaders required	integer	4	Max number of threads used for loading queries (LOAD DATA typically). This is also the max number of load queries that can be run in parallel. It must be multiple of the number of cores It is recommended to set it to cores / 2 example: loader = 4	No
queries required	integer	32	Max number of total threads used for querying example: queries = 32	No

Storage related parameters

General

Parameter	Type	Choices/Default	Description	Dynamic command
partitions required	integer		Number of shards (a type of horizontal partitions) across potentially multiple instances of the schema. It is highly recommended to set partitions according to the formula: partitions = cores * number of nodes. You can oversize it if you want to anticipate a cluster size change. You can not undersize it. If you change the number of partitions, all your tables must be dropped and recreated, and all data reloaded. example: For a 4 nodes cluster with 8 cores per node: partitions = 32	No
dimension.partitions	integer	8	Number of partitions used for dimension tables (tables without HyperIndex). example: For a 4 nodes cluster with 8 cores per node: dimension.partitions = 8	No
export.partition.size.mb	integer	10000	Defines the maximum size in Mb of files generated during an export.	No
pages.oneFilePerColumn	boolean	true	When true, Indexima creates one file per column to take advantage of the Indexima K-Store technology. So when using local/shared file systems or cloud file-systems such as S3, this parameter must be set to true. When false, Indexima forces K-Store to store all columns in a single file to limit the number of open files. This is used when the Indexima warehouse is located on an HDFS file system.	No

S3

Parameter	Type	Choices/Default	Description	Dynamic command
warehouse.s3.compatible	boolean	false	If you are using your own fake S3 server, set this parameter to true. Otherwise, S3 connections are considered genuine AWS S3.	No
warehouse.s3.endpoint	string		When warehouse.s3.compatible is set to true, you must provide the endpoint URL of your S3 server. example: warehouse.s3.endpoint = https://my_minio:9000	No
warehouse.s3.bucket	string		Define the name of the bucket to be used by Indexima when using a fake S3 connection.	No
warehouse.s3.https	boolean	false	Set to true if your fake S3 server is using an encrypted https connection	No
warehouse.s3.cert.check	boolean	false	When set to true, the S3 connection will fail if the used SSL certificate is insecure. If you are using a self-signed certificate, leave this parameter to false.	No
s3.max.retry	integer	40	Number of retries on s3 failure.	No
warehouse.s3.key	string		Amazon S3 directory where files are stored for S3 compatible FS	No

Connectivity toward Indexima

Parameter	Type	Choices/Default	Description	Dynamic command
node.port	int	19 999	Port used by the Indexima Galactica engine to communicate with the other nodes of the Indexima Cluster.	No
webui.port		9999	Port used by the Indexima Monitor Console. This webserver aggregates logs, queries history, and hosts Indexima Analyzer, among other features. For more information about the Indexima Monitor, refer to the Indexima Monitor Documentation	No
webui.ssl.enable	boolean	false	When true, The Indexima Monitor Console uses a secured SSL HTTPS protocol to connect. `webui.ssl.keystore.location` and `webui.ssl.keystore.password` must be previously set up to enable this feature. When false, the standard HTTP protocol is used.	No
webui.ssl.keystore.location	string		Specify the location of the SSL certificates Keystore to secure the Indexima Monitor console access with HTTPS example: webui.ssl.keystore.location = /etc/indexima/keystore/mycluster.indexima.co.keystore	No
webui.ssl.keystore.password	string		Define the password associated with the Indexima Keystore.	No
monitor.api.key	string		Service API key between Indexima cluster and Developer Console.
heartbeat	integer	300 000	Time in milliseconds for a full round-trip heartbeat packet between the Indexima master and each worker node. The default value is 5 minutes.	No

Indexima Connectivity toward external datasources

Parameter Type Choices/Default Description Dynamic command

jdbc.timeout.seconds

integer

60

Timeout value of the JDBC driver connection in seconds. This is the maximum time allowed for the data source to give a response, not the maximum time the query can take.

No

jdbc.load.fetch.size

integer

1000

Specify the number of rows fetched with each database round trip for a query when importing data from an external JDBC source.

Yes

jdbc.query.timeout.seconds

integer

60

Timeout for statement JDBC execution in seconds

Yes

jdbc.create.field.restriction

boolean

true

When set to true, Indexima replaces some characters on the fly to avoid wrong interpretation of incoming data.

Character substitution chart

Character	Substituted by
space ' '	underscore `_`
simple quote `'`	underscore `_`
dot `.`	No character
comma `,`	No character
parenthesis `(` and `)`	No character
slash `/`	dash `-`

Yes

Logs and history

Parameter	Type	Choices/Default	Description	Dynamic command
log.dir	string	/home/indexima/log	Path to the folder where the indexima logs are written	No
log.insert	integer	1 000 000	During a "load" command, log message every N inserts	No
log.level	string	INFO	Log level of the cluster	No
history.count	integer	500		No
history.flush	integer	10	Max number of queries before flushing history to disk	No
history.dir	string	/home/indexima/history	Query history directory	No
history.export	string	/home/indexima/history-export	CSV format Query history export directory	No
hive.log.threshold	string	ERROR	Hive log threshold	No
hive.log.dir	string	/home/indexima/history-export	Hive log directory	No

Memory Usage Parameters

Tables & Indexes

Parameter	Type	Choices/Default	Description	Dynamic command
index.memory.max_size.mb	integer	1024	index.memory.max_size defines the maximum size (in Mb) of each index, per node. This parameter limits the size of an HyperIndex to prevent running out of memory if the cardinality of the selected index is too high.	Yes
table.memory.max_size.mb	integer	4096	Set the maximum size of a dataspace (table) per node. The size is in Mb. This parameter prevents running out of memory when using multiple large tables.	Yes
limited.memory.max_size.mb	integer	1024	Maximum size of a limited table(in Mb).	Yes
table.prefetch.lastdays		2	Preload indexes that have been used during the last N days	Yes

Big indexes

Parameter	Type	Choices/Default	Description	Dynamic command
big.index.enable	boolean	FALSE	Enable big index feature	Yes
big.index.limit.mb	long	5000	Memory limit for big index behaviour	Yes
big.index.max.active.partitions	integer	2	Max active partition allowed for queries on a big index	No
big.index.swap.path	string	swap	Default location for index data swap	Yes

Queries

Parameter	Type	Choices/Default	Description	Dynamic command
query.max	long	100 000	Maximum number of bytes of a SQL Statement	No
result.max_size.mb	long	256	Define the maximum size in MB of the queries computed results	Yes
global.result.max_size.mb	long	512	Define the global maximum size in MB to store the calculated results of ALL the queries at one moment. The guideline to set this parameter is the number of nodes of the cluster times result.max_size. The minimum value recommended is 512MB.	Yes
select.timeout.ms	long	-1	Timeout in milliseconds of query execution. If the query lasts more than the timeout time, the system will stop the query with the error Timeout	Yes
join.memory.max_size.mb	integer	12	Maximum size allocated for one line join computation(in Mb).	Yes
cognos.limit.select	Integer	-1	Add an implicit limit on SELECT queries without GROUP BY for Cognos usage. Value -1 means disabled.
queries	integer	32	Number of threads used for querying	No
queries.high-cost	integer	5	Number of high cost queries that may run at the same time	No
queries.high-cost.frozen	integer	5	Maximum number of frozen high cost queries	Yes
queries.hybrid.ratio	double	0.25	Ratio of hybrid queries allowed to run concurrently	No
query.high-cost.memory.mb	long	128	Memory size threshold for a query to become high cost	Yes

Cache

Parameter	Type	Choices/Default	Description	Dynamic command
cache.master.mb	integer	0	Define a certain amount of RAM in MB allocated to cache Hive 2 queries addressed by Indexima Data Hub for better performance	Yes
cache.master.min_exec.ms	integer	1000	Define the threshold upon a query result would be put into the cache	Yes

Load

Parameter	Type	Choices/Default	Description	Dynamic command
insert.queue.mem.size.mb	integer	256	During a load, Maximum insert queue memory size (in Mb)	Yes

Miscellaneous

Parameter	Type	Choices/Default	Description	Dynamic command
memory.coef	float	0.85	Coefficient of max heap memory used for the Hyperindexes	No

Security

General

Parameter	Type	Choices/Default	Description	Dynamic command
impersonation	boolean	false	When `impersonation = true`, the Indexima account is used to grant access to shared storage on each node of the Indexima cluster	No
webui.authenticate	boolean	false	when `webui.authenticate = false`, there is no access control for the Indexima MONITOR located at `<Your_URL>:9999` when `webui.authenticate = true` and when hive-site.xml contains the address and access rules of an LDAP server, Indexima MONITOR access is granted only if the specified user at login time of Indexima ADMIN console is authenticated by the LDAP service.	No
session.users	string		Allow users listed in `session.user` to connect to Indexima Data Engine without Kerberos authentication. example: session.users = user1, user2, user3	No
session.passwords	string		List of passwords for users in session.user. Passwords are attributed to the user in the corresponding list index example: session.passwords =pass1, pass2, pass3 In the example above, user1 has password "pass1", user2 has password "pass2", and user3 has password "pass3".	No
users.in.admin.role	string		List of users that have full rights to administer and run Indexima cluster. Indexima lists 2 types of users: administrators (they are listed with this parameter) and the users (all the others) example: users.in.admin.role = user1, user2 If Ranger : You must also include the system user used to run Indexima process.	No
webui.rights	boolean	false	Monitor Roles enabler	No

Ranger

Parameter	Type	Description	Dynamic command
privilege.driver.name	string	Java class name of the Indexima plugin driver example: privilege.driver.name = io.galactica.ranger.client.RangerIndeximaDriver	No
privilege.driver.property.servicetype	string	Set the Ranger property `servicetype` to be used with the Indexima Ranger plug-in. This property will be used to display Service name of the plugin in Ranger GUI. example: privilege.driver.property.servicetype = indexima	No
privilege.driver.property.appid	string	Set the Ranger property `appid` to be used with the Indexima Ranger plug-in. This property identifies the application ID example: privilege.driver.property.appid = indexima	No

SSL

Parameter	Type	Choices/Default	Description	Dynamic command
node.ssl.enable	boolean	false	When true, encrypt data on the private Indexima network, enabling Inter-Node SSL keystore	No
node.ssl.keystore.location	string		Indicates the path of the Inter-Node SSL keystore location example: node.ssl.keystore.location = /path/to/my_keystore	No
node.ssl.keystore.password	string		Specify the password associated with the Inter-Node SSL keystore defined at `node.ssl.keystore.location example: node.ssl.keystore.password = my_keystore_password`	No

Misc

Parameter	Type	Choices/Default	Description	Dynamic command
allow.create.if-select	boolean	false	Table creation is allowed only if select on this table is granted.	No

Analyzer

Parameter	Type	Choices/Default	Description	Dynamic command
analyzer.hits	integer	3	Let you define the default required hits when you enter the Analyzer page. example: analyzer.hits = 5	UI
analyzer.days	integer	30	Let you define the default number of days when you enter the Analyzer page. example: analyzer.days = 10	UI
analyzer.cardinality	integer	4	Let you define the default cardinality when you enter the Analyzer page. example: analyzer.cardinality = 6	UI
sampling.external.max	integer	100 000 000	The maximum number of lines to use for the sampling in the Analyzer if the table is external.	UI
sampling.internal.max	integer	10 000	The maximum number of lines to use for the sampling in the Analyzer if the table is not external.	UI
analyzer.evaluation.cardinality.small	integer	10 (percent)	Cardinality for small level index. If the cardinality computes for an index is below this value, the index is considered small.	No
analyzer.evaluation.cardinality.medium	integer	20 (percent)	Cardinality for medium-level index. If the cardinality computes for an index is below this value, the index is considered medium.	No
analyzer.evaluation.cardinality.big	integer	40 (percent)	Cardinality for big level index. If the cardinality computes for an index is below this value, the index is considered as big. Otherwise, the index is considered dangerous and should be avoided.	No
analyzer.cache.size.bytes	long	1 000 000	Cache size for evaluating cardinalities of expressions during the analysis	No
analyzer.mode	string	INCREMENTAL		Yes
analyser.history.duration.days	integer	30		Yes
analyser.default-merge-policy	string	MAX_INDEX/COEF	Default policy used to merge indexes (COEF, MAX_INDEX). COEF: The user can use a coefficient between 0 & 100 in the analyzer in order to get a few or plenty of hyperindexes suggestions MAX_INDEX: a maximum of `analyser.default-max-expected-indexes` will be suggested	Yes
analyser.default-max-expected-indexes	integer	8	Maximum targeted index count when use MAX_INDEX merge index policy.	Yes

YARN

Parameter	Type	Choices/Default	Description	Dynamic command
yarn.resourcemanager.hostname	string	value in yarn-site.xml	Hostname of the Yarn Resource Manager to call in order to create the Indexima Yarn Application example: yarn.resourcemanager.hostname = localhost	No
yarn.memory.mb	integer	1024	Allocates memory (in mb) to run the Indexima data engine. This value must be greater than java heap (GALACTICA_MEM) + 20% example: yarn.memory = 40000 (40 GB)	No
yarn.memory.master.mb	integer	256	Memory size (in mb) of the master container. This value must be the lower possible but greater than the value set by Hadoop or an error is fired by YARN. example: yarn.memory.master = 128	No
yarn.dir	string	hdfs://localhost:8020/user/` + System.getenv("USER") + `/indexima	Define the HDFS directory used to share the application binaries and configuration files when deploying an Indexima cluster in Hadoop. example: yarn.dir = hdfs://localhost:8020/tmp/indexima	No
yarn.name	string	Indexima	Name of the Indexima cluster example: yarn.name = indexima_prod	No
yarn.kerberos	boolean	false	Must be true if the Indexima data engine is running in an Hadoop cluster secured by Kerberos example: yarn.kerberos = true	No
yarn.relax.locality	boolean	false	Defines the YARN relax locality policy during deployment. example: yarn.relax.locality = true	No
yarn.racks	string		Defines the rack to use during deployment	No

High Availability

Parameter Type Choices/Default Description Dynamic command

nodes.connect.restart.delay.seconds integer 10 When a new node connects to the cluster, delay in seconds before the cluster re-initializes itself. 0 to disable automatic re-initialization. (dynamic cluster only) Yes

high-availability

Deprecated since 1.7.11

integer

1

Enable High availability and define the number of Indexima master participating to the Indexima cluster. Each master also hosts the Indexima Hive2 server. A node fails when there is no more communication when solicited for queries. The remaining Indexima masters take over to continue to provide the Indexima Hive2 service. A load balancer such as Zookeeper is required to provide a dynamic path recovery.

High availability housekeeping such as PING or INIT can be issued from the Indexima MONITOR at <your_server>:9999 or by using start_node.sh script for stand alone AND Hadoop deployment.

example: high-availability = 2
Node1 will be the primary master
Node2 and Node3 will be secondary masters

This parameter is deprecated. If you want to activate high availability, use the dynamic mode High-Availability

No

Spill To Disk

Parameter	Type	Choices/Default	Description	Dynamic command
spill.enable	boolean	false	Enable spill to disk function	Yes
spill.memory.size.mb	long	256	Memory size taken to process queries before being spilled on disk. When the size of query elements is greater than this value, queries will start spilling on disk.	Yes
spill.disk.path	string	/tmp/indexima-cache	Folder on the disk of each node where elements will be spilled.	Yes
request.result.max.chunk.size	Integer	10000	Maximum number of lines in each chunk when a large output result is spilled to disk.	Yes

External Tables

General

Parameter	Type	Choices/Default	Description	Dynamic command
external.field.max_size	integer	127	when the field name or alias name size is bigger than external.field.max_size then field name or alias name is renamed to be shortened	Yes

Synchronization

Parameter	Type	Choices/Default	Description	Dynamic command
external.synchronize.consistency.enable	boolean	false	execute a synchronize or not before adding a new index	Yes

Automatic Synchronization

Only available for Snowflake datasource

Parameter	Type	Choices/Default	Description	Dynamic command
external.synchronize.check.cron	string	empty	Cron expression for table synchronization check, more precise than `external.synchronize.check.rate`	Yes
external.synchronize.check.rate	integer	0	Number of seconds between external table synchronization check	Yes
external.synchronize.check.user	string	admin	The Indexima user who will run the SYNCHRONIZE during the automatic update	No

Smart Tables

Smart Tables Process parameters

Parameter	Type	Choices/Default	Description	Dynamic command
analyser.smart.metrics.days	integer	15	The number of sliding days to pickup queries for analysis.	Yes
analyser.smart.optimizer	string	Last month tuning	Default optimizer to use. Can be overridden for each table. Values are defined in optimize_index.json.	Yes
analyser.smart.scheduling.cron	string	<empty>	Define when the process of analysis and index creation starts	Yes
analyser.smart.scheduling.duration.minutes	integer	120	Maximum duration of smart tables analysis	Yes
analyser.smart.threads	integer	2	Number of threads used for running smart tables indexation.	No
analyser.smart.max.indexes	integer	20	Maximum number of indexes for a smart table.	Yes

Smart Tables weight parameters

Parameter	Type	Choices/Default	Description	Dynamic command
analyser.smart.threshold.slow.ms	integer	2000	Speed threshold where a request is considered slow	No
analyser.smart.weight	integer	1	Default weight for score computation	Yes
analyser.smart.weight.bucket	integer	1	Weight of the K-Store query for score computation	Yes
analyser.smart.weight.speed	integer	1	Weight of the speed ratio for score computation	Yes
analyser.smart.weight.delegate	integer	1	Weight of query delegated to the underlying table for score computation	Yes
analyser.smart.weight.size	integer	1	Weight of the table size ratio for score computation	Yes
analyser.smart.weight.traffic	integer	1	Weight of the traffic ratio for score computation	Yes

Miscellaneous

Parameter	Type	Choices/Default	Description	Dynamic command
notification.check.cron	string	empty	SET_ notification.check.cron = "0 0/15 * * * ?" // every 15 minutes	Yes
nodes.connect.timeout.seconds	integer	120	Number of seconds to wait before starting the Indexima cluster with a missing node member. This option allows starting a stand-alone Indexima cluster without taking the precaution to start all workers before the master.	No
powerbi.impersonate.field	string	empty	Define a field name. If that field name is used in a where clause of a query as field='value', it will result in the fact the string contained in the operand will be considered as the actual user executing the query. The field name can be any field name provided it doesn't already exist as an actual field of the table. To be used in a query, this field name needs to exists. Thus it is compulsory to add the virtual field in the table. Example: `powerbi.impersonate.field = idx_powerbi_impersonate` SQL `ALTER TABLE mytable ADD COLUMNS (idx_powerbi_impersonate as 'X');` when executing this query SQL `SELECT * FROM mytable WHERE idx_powerbi_impersonate ='test';` it would be as if a user named 'test' executed the query and `idx_powerbi_impersonate ='test'` is considered as `1=1`	No
timestamp.precision	string	HOUR/MINUTE/SECOND/DAY	Default precision of timestamp fields at table creation	Yes
history.max	integer	20 000	Maximum number of queries loaded for previous day display in webUI	No
error.max	integer	10 000	Maximum number of errors during a load	Yes