Indexima supports loading from the following format:

  • CSV
  • JSON
  • PARQUET
  • ORC

Load data directly from HDFS to avoid Hive

You can use the LOAD DATA command to extract data directly from a Hadoop Datanode.

Command

LOAD DATA HDFS

LOAD DATA INPATH 'hdfs://data_node:8020/apps/hive/warehouse/my_hive_data INTO TABLE my_table';
COMMIT my_table;
SQL

This will read every file located in /apps/hive/warehouse/my_hive_data of the data node. The data will be loaded in an Indexima table named "my_table" and then be committed.

The LOAD DATA INPATH designates a folder and not a file. All the data files inside this directory will be imported. All such files must have the same structure and format to get a consistent result in the final table.

Load partitions from Hive table

You can learn more about the LOAD DATA INPATH commands here.

Load data from files directly on the filesystem of the machines running Indexima.

Command

LOAD DATA LOCAL

LOAD DATA LOCAL INPATH '/tmp/my_data' INTO TABLE my_table FORMAT CSV SEPARATOR ',' SKIP 2;
COMMIT my_table;
SQL

This will load all data in the files located in the folder /tmp/my_data into the Indexima table default.my_table. The files must be CSV files with a comma separator. The first 2 lines are skipped.


More

You can learn more about the LOAD DATA INPATH commands here.

More

You can learn more about the LOAD DATA commands here.