Skip to main content
Skip table of contents

LOAD DATA (Files Load)

The LOAD DATA command allows you to write raw data into Indexima tables.

Syntax

Load Data from local disk

SQL
LOAD DATA [LOCAL] 
INPATH '<protocol>://<absolute_path_to_data>'
INTO TABLE <table_name> 
FORMAT CSV|ORC|PARQUET|JSON
SEPARATOR ',|;|\t' 
[SKIP <number_of_row_to_skip> ]
[LIMIT <limitnumber>]
[PATTERN <pattern>]
[PREFIX <prefix>]
[SUFFIX <suffix>]
[NOCHECK]

Parameters

Parameter nameDescriptionPossible values

LOCAL

If used, this keyword tells the Master node to read data from his own filesystem.
When not specified, every node must be able to read the data located in <absolute_path_to_data>
When specified, <protocol> is not mandatory

Empty | LOCAL

protocol

The type of datalake you want to load data from.s3 | adl | gs | hdfs
absolute_path_to_dataThe absolute path to a folder containing all the files you want to load data from. If LOCAL is specified, only the Master node needs to have read access to this path.Any absolute path is readable by the Indexima nodes.
table_nameThe name of the Indexima table you want to load the data into.The exact name of the Indexima table you want to load data into. The structure of the files must match the structure of the table.
number_of_row_to_skipThe number of rows to skip before starting the load. This is used to skip the first or the 2 first rows, which are usually header rows.Any positive number.
limitnumberMaximum Number of rows that will be read from each fileAny positive number
patternOnly files whose name begins with the defined pattern would be loaded. Must end by a star.
prefixsee below" Load partitions from Hive table "
Suffixsupports only CSV format, same concept as prefix applied for missing column at the end of each line
NoCheck

Without this option, INDEXIMA checks if loaded values are compatible with the datatype set when creating the table. INDEXIMA triggers errors when datatypes do not match.
Specifying NOCHECK skips this test and improve dramatically the performance of the loading. This is a good practice to run once LOAD DATA with data check enable. Then NOCHECK can be set for recurrent data loading to speed up the load process


Load Example

Example

SQL
LOAD DATA 
INPATH 'mypath'
INTO TABLE myschema.mytable
FORMAT CSV
SEPARATOR ',' 
SKIP 1
LIMIT 10
PATTERN 'Mypattern*'
PREFIX 'myprefix'
NOCHECK

Load partitions from Hive table

Provide a path of the root partition if you want to load only a set of partitions and not all partitions from a Hive table and add PREFIX 'column_values' to specify the values for the partition columns.

SQL
LOAD DATA INPATH 'hdfs://data_node:8020/apps/hive/warehouse/my_imported_data_orc/year=2018/month=12/day=10'
INTO TABLE my_test 
FORMAT orc 
PREFIX '2018,12,10,'

If the Hive table is in tab-separated format, the separator must be \t so the statement must be PREFIX '2018\t12\t10\t'

Load Example with a suffix

Add SUFFIX 'column_values' to specify the values that will finish the CSV line when there are fewer column than expected (defined by the list of columns in the table creation)

SQL
LOAD DATA INPATH 'hdfs://data_node:8020/apps/hive/warehouse/myORCtable'
INTO TABLE my_test 
FORMAT CSV
SUFFIX 'XX,YY'





JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.