Kerberos configuration on Indexima cluster

For example, on CentOS

yum install krb5-workstation

Edit /etc/krb5.conf file

  • Modify default domain
  • admin_server and kdc location
  • Possible problems
    • Comment renew_lifetime parameter
    • Modify default_ccache_name to use /tmp directory and not keyring

HDFS configuration

You will have to choose a user and a keytab to connect to your kerberized cluster. This user needs to be declared as a proxy user in your HDFS configuration.

After modification, you will need to restart your HDFS cluster.

Example of impala as a proxy user :

<property>
   <name>hadoop.impala.indexima.groups</name>
   <value>*</value>
</property>
<property>
    <name>hadoop.impala.knox.groups</name>
    <value>*</value>
</property
CODE

Galactica configuration modification

jaas.conf

You can create a keytab for indexima on your kerberized cluster or use the keytab.

You need to copy your keytab on each machine from the Indexima cluster.

Create a jaas.conf file.

com.sun.security.jgss.initiate {
 com.sun.security.auth.module.Krb5LoginModule required
 principal="impala/ipadress@DOMAIN.COM"
 keyTab="/etc/security/keytabs/impala.keytab"
 useKeyTab=true
 storeKey=true
 debug=true;
};
CODE

Galactica-env.sh

Add the following line

export NODESERVER_JVM_OPTIONS="-Djava.security.auth.login.config=/opt/k/work/indexima/galactica/jaas.conf -Djavax.security.auth.useSubjectCredsOnly=false"
CODE

Additional actions to connect to a Kerberized Impala

Execute a manual Kinit

Depending on your Impala driver version (2.5.5.1007), you will need to do a manual kinit with the user your choose to connect on your Impala cluster.

kinit -kt ... (specify your user and keytab)

Table creation from Impala

create table from_impala from my_impala_table
IN 'jdbc:impala://impala_server_adress:impalaPort;AuthMech=1;KrbRealm=XXX.COM;KrbHostFQDN=ip-FQDN-adress-;KrbServiceName=impala'
(index(id1))
CODE

Table load from Impala

You can load data from Impala by doing a JDBC load but it is more efficient to use an HDFS load

load data inpath 'hdfs://ipadress:8020/user/hive/warehouse/xxx' into table from_impala format parquet
CODE

Help for debug purposes

impala-shell installation

This section is not mandatory but may be useful for debugging purposes.

yum install python-pip gcc gcc-c++ cyrus-sasl-devel
python pip install impala-shell

Try to connect to remote Impala instance

kinit -kt ... (your keytab) ...
impala-shell -k

Check if you can browse tables and data from Impala.