Context

After creating a table based on Hive, the user performs a SYNCHRONIZE command.

Exemple

Drop Table If EXISTS Myschema.Table1;
CREATE TABLE IF NOT EXISTS Myschema.Table1 FROM  HiveMyschema.Table1  IN 'jdbc:hive2://indeximaServeur:port' () ;
SQL

After executing the SYNCHRONIZE command, there is no data left in the Indexima table, whereas there is data in the Hive Table.

Exemple

Synchronize Myschema.Table1 ;
SELECT * FROM Myschema.Table1 limit 10 ;  Returns 0 line
SQL

Symptoms & checks

When checking the logs, we can see the system noticed some partitions.

INFO [RequestRunnners-16-thread-22] io.galactica.u - Find partition Part1=2019-01-24/Part2=2019-01-24 11%3A07%3A00 for Myschema.Table1_0

INFO [RequestRunnners-16-thread-22] io.galactica.u - Find partition Part1=2019-06-13/Part2=2019-06-13 17%3A59%3A37 for Myschema.Table1_0


Did you create partitions with values containing special characters such: ; . : ? (for example, when storing Timestamp in string format: 2019-06-13 17:59:37 )

In Hive, check with Show Partitions Myschema.Table1 that special characters are encoded.

Partitions
Part1=2019-10-29/Part2=2019-10-29 11%3A13%3A59
Part1=2020-01-31/Part2=2020-01-31 10%3A43%3A35 snapshot

Resolution

In the hive cluster, execute the command (admin rights required).



set hive.decode.partition.name=true;
SQL



See the reference: https://issues.apache.org/jira/browse/HIVE-3679

Hive will send partitions with decoded names. Indexima needs those decoded names to load the data.

Best Practice

It is recommended not to create partitions values containing special characters.