CREATE TABLE

Creates a new table and specifies its characteristics. While creating a table, you optionally specify aspects such as:

Whether the table is internal or external
The columns and associated data types
The columns used for partitioning the data.

The general syntax for creating a table and specifying its columns is as follows:

Syntax

Explicit columns definition

Creating dataspaces - Explicit columns definition

SQL

CREATE [EXTERNAL] [DIMENSION] [TEMPORARY] TABLE [IF NOT EXISTS] [mydb_name.]my_fact
  (col_name data_type
	[ , <col_name> <col_type> ]
	[ , <col_name> AS <expr>]
	[, INDEX [IndexName] (col_name ,Agg(col_name))]
  )
  [PARTITIONED BY (<col_name>, ...)]

Using a Select Statement

Creating dataspaces

SQL

CREATE [DIMENSION] [TEMPORARY] TABLE [IF NOT EXISTS] [mydb_name.]my_fact
 AS
    select_statement

Implicit columns definition (allows synchronizing with database source)

The purpose of this command is to create a dataspace (with partitions) and uses the INDEXIMA command synchronize to load all partitions from the external data source not present the Indexima dataspace.

Such dataspace creation method can be used to collect all column names.

SQL

CREATE [EXTERNAL] [DIMENSION] TABLE [IF NOT EXISTS] [mydb_name.]my_fact
FROM my_hive_table IN 'jdbc:hive2://my_JDBC_database_connection:my_JDBC_port' 
([INDEX [IndexName] (col_name ,Agg(col_name))]) [PROPERTIES ('timestamp.precision' = 'XXX')]

Parameters

Table properties

[DIMENSION]

used to indicate that this dataspace does not have an index.

[EXTERNAL]: Internal and external tables

By default, Indexima creates an "internal" table, where Indexima manages the underlying data files for the table, and physically deletes the data files when you drop the table.
If you specify the EXTERNAL clause, row data would not be duplicated in the indexima warehouse: Only hyperindexes would be stored in the indexima warehouse.

[TEMPORARY]

defines a table that would be deleted at the next restart.

[IF NOT EXISTS]

Instructs INDEXIMA parser to create the dataspace only if this dataspace does not already exist.

[mydb_name.]

mydb_name defines the schema where the table will be created into. When not specified, the table will be created in the default schema.

Columns

Data types

INDEXIMA supported SQL data types are:

STRING,
TINYINT,
SMALLINT,
INT,
BIGINT,
FLOAT,
DOUBLE,
DECIMAL(precision, scale),
TIMESTAMP,
DATE

Computed columns [ , <col_name> AS <expr>]

Any Expression using indexima built-in functions

Index

Aggregation functions

INDEXIMA supports the following aggregations:

AVG,
COUNT,
MAX,
MIN,
SUM,
count(DISTINCT)

[PARTITIONED BY]

When creating a partitioned table, the partition key columns must already be listed in the column name list.

Partitions in Indexima are only logical. They would allow deletion of a dataset of the table without removing all data of the table.

[PROPERTIES]

defines additional properties related to the created table

timestamp.precision

Allow the user to define a timestamp precision that is different from the rest of the cluster (defined in galactica.conf by timestamp.precision)

Fact dataspace vs Dimension dataspace

INDEXIMA differentiates behavior and usage of dataspaces created with or without hyperindex.

FACT dataspace

Dataspaces created with hyperindex are called Fact dataspace.

Hyperindex can be added/deleted. This action can be done manually or suggested by the INDEXIMA ANALYZER fueled with queries.
Fact tables can be created without any hyperindex. Such tables are qualified "Limited" tables. The user would be allowed to load/insert a maximum amount of data.

Data of Hyperindexes of fact dataspaces are equally split into each indexima node.

DIMENSION dataspace

Dimension dataspace don't and won't contain any hyperindex. A Dimension dataspace is used to help to solve queries that contain joins.

Data of a dimension dataspaces is duplicated in each indexima node.

TEMPORARY dataspace

It behaves as a DIMENSION dataspace that has a limited lifetime: this table would not be persisted when a restart of the INDEXIMA cluster occurs.

Example

SQL

CREATE TABLE IF NOT EXISTS indexima_bench.nyc_yellow (
    vendor_name string,
    Trip_Pickup_DateTime TIMESTAMP(minute),
    Trip_Dropoff_DateTime TIMESTAMP(minute),
    Passenger_Count int,
    Trip_Distance double,
    Start_Lon double,
    Start_Lat double,
    Rate_Code string,
    store_and_forward string,
    End_Lon double,
    End_Lat double,
    Payment_Type string,
    Fare_Amt double,
    surcharge double,
    mta_tax double,
    Tip_Amt double,
    Tolls_Amt double,
    Total_Amt double,
    tripduration_minute AS round((unix_timestamp(Trip_Dropoff_DateTime)-unix_timestamp(Trip_Pickup_DateTime))/60),
    speed_miles_per_hour AS if(TripDuration_Minute>0, round(60*Trip_Distance/TripDuration_Minute,3),-1),
    speed_miles_per_minute AS if(TripDuration_Minute>0, round(Trip_Distance/TripDuration_Minute,3),-1),
    start_lon_round3 AS round(Start_Lon,3),
    start_lat_round3 AS round(Start_Lat,3),
    end_lon_round3 AS round(End_Lon,3),
    end_lat_round3 AS round(End_Lat,3),
    index(year(Trip_Pickup_DateTime),month(Trip_Pickup_DateTime),day(Trip_Pickup_DateTime),vendor_name,sum(Total_Amt))
);