Indexima is integrated with Apache Atlas data catalog and any data catalog using the same interface as Atlas. This integration allows organizations' metadata management and governance capabilities to build a catalog of their data assets connected to Indexima.

Installation & Deployment

Download and deploy the JAR file

To use the Apache Atlas integration, the Atlas library needs to be deployed to each Indexima node.
Download the appropriate version of this library indexima-atlas-lib-[VERSION].jar from https://download.indexima.com/release and deploy it to each indexima node in /galactica/ext directory.

Configure file atlas-application.properties

Please refer to atlas-application.properties to configure this file.

Restart Indexima service after adding the Atlas library and atlas-application.properties file for Indexima service to load Atlas.

Authentication Configuration

Indexima can authenticate to Atlas with FILE, LDAP, or KERBEROS authentication mechanism.

Rights

Whatever the mechanism that would be used to connect to Atlas ( dedicated used or Indexima user that runs the application), make sure this user will have the WRITE rights within Atlas.


FILE and LDAP authentication

For FILE and LDAP authentication, the user used to connect to Atlas can be provided with parameters atlas.user and atlas.password. The parameter atlas.enable allows to activate/deactivate the atlas integration (see galactica.conf).

Example of atlas activation with a FILE or LDAP connexion: execute the following commands in Indexima console

SET_ atlas.user=[ATLAS_ADMIN_USER];
SET_ atlas.password=[ATLAS_PASSWORD];
SET_ atlas.enable=true;
SQL


As with any dynamic parameters, dynamically set atlas parameters are stored in the warehouse in galactica_ext.conf file. The Atlas password is automatically encrypted when set.

Please note that any change of Atlas.user or Atlas.password must be followed by an atlas.enable=true in order for the change to take effect.

Note: If the atlas parameters are added directly in galactica.conf (not recommended for dynamic parameters), the atlas password must not be encrypted.

Kerberos authentication

For Kerberos authentication, after adding  atlas.authentication.method.kerberos=true in file atlas-application.properties, the atlas integration is enabled with the following command in Indexima console:

SET_ atlas.enable=true;
SQL

Data Catalog Feed

Initialization

After enabling Atlas integration as described in previous section, start the Indexima cluster normally (without any atlas specific command). Once the cluster is up and running, you can test the Atlas integration by creating a new schema and controlling this schema is correctly synchronised with Atlas.

When the Atlas synchronisation is operational, run ./start-node.sh --import-atlas  on an indexima node to send all the objects already created in the past to Atlas. Please note that this command will not start any node, it will only send a command to the running cluster asking to trigger a full atlas synchronisation.

If the integration between Indexima and Atlas has been interrupted, and some objects existing in Atlas have been deleted from Indexima in the meanwhile, you can instead run ./start-node.sh --import-atlas-clean in order to force the deletion in Atlas on any objects already deleted from this Indexima cluster (deletion based on Indexima cluster name matching). This option is equivalent to the deleteNonExisting flag described here.

Operations captured

Once Apache Atlas integration is enabled, any creation of an object in Indexima will trigger a call to propagate this object to Atlas. The metadata of the object, author, creation timestamp, and lineage of objects are available in Atlas. Indexima objects are modelized as standard hive objects in Atlas, as described in https://atlas.apache.org/#/HookHive.

The following hive operations are currently captured:

  • create database/table/view

  • alter database/table/view