Creates a new table and specifies its characteristics. While creating a table, you optionally specify aspects such as:
- Whether the table is internal or external
- The columns and associated data types
- The columns used for partitioning the data.
The general syntax for creating a table and specifying its columns is as follows:
Explicit columns definition
Creating dataspaces - Explicit columns definition
CREATE [EXTERNAL] [DIMENSION] [TEMPORARY] TABLE [IF NOT EXISTS] [mydb_name.]my_fact (col_name data_type [ , <col_name> <col_type> ] [ , <col_name> AS <expr>] [, INDEX [IndexName] (col_name ,Agg(col_name))] ) [PARTITIONED BY (<col_name>, ...)]
Using a Select Statement
CREATE [DIMENSION] [TEMPORARY] TABLE [IF NOT EXISTS] [mydb_name.]my_fact AS select_statement
Implicit columns definition (allows synchronizing with database source)
The purpose of this command is to create a dataspace (with partitions) and uses the INDEXIMA command synchronize to load all partitions from the external data source not present the Indexima dataspace.
Such dataspace creation method can be used to collect all column names.
CREATE [EXTERNAL] [DIMENSION] TABLE [IF NOT EXISTS] [mydb_name.]my_fact FROM my_hive_table IN 'jdbc:hive2://my_JDBC_database_connection:my_JDBC_port' ([INDEX [IndexName] (col_name ,Agg(col_name))]) [PROPERTIES ('timestamp.precision' = 'XXX')]
used to indicate that this dataspace does not have an index.
[EXTERNAL]: Internal and external tables
By default, Indexima creates an "internal" table, where Indexima manages the underlying data files for the table, and physically deletes the data files when you drop the table.
If you specify the EXTERNAL clause, row data would not be duplicated in the indexima warehouse: Only hyperindexes would be stored in the indexima warehouse.
defines a table that would be deleted at the next restart.
[IF NOT EXISTS]
Instructs INDEXIMA parser to create the dataspace only if this dataspace does not already exist.
mydb_name defines the schema where the table will be created into. When not specified, the table will be created in the default schema.
INDEXIMA supported SQL data types are:
- DECIMAL(precision, scale),
Computed columns [ , <col_name> AS <expr>]
Any Expression using indexima built-in functions
INDEXIMA supports the following aggregations:
When creating a partitioned table, the partition key columns must already be listed in the column name list.
Partitions in Indexima are only logical. They would allow deletion of a dataset of the table without removing all data of the table.
defines additional properties related to the created table
Allow the user to define a timestamp precision that is different from the rest of the cluster (defined in galactica.conf by timestamp.precision)
Fact dataspace vs Dimension dataspace
INDEXIMA differentiates behavior and usage of dataspaces created with or without hyperindex.
Dataspaces created with hyperindex are called Fact dataspace.
- Hyperindex can be added/deleted. This action can be done manually or suggested by the INDEXIMA ANALYZER fueled with queries.
- Fact tables can be created without any hyperindex. Such tables are qualified "Limited" tables. The user would be allowed to load/insert a maximum amount of data.
Data of Hyperindexes of fact dataspaces are equally split into each indexima node.
Dimension dataspace don't and won't contain any hyperindex. A Dimension dataspace is used to help to solve queries that contain joins.
Data of a dimension dataspaces is duplicated in each indexima node.
It behaves as a DIMENSION dataspace that has a limited lifetime: this table would not be persisted when a restart of the INDEXIMA cluster occurs.
CREATE TABLE IF NOT EXISTS indexima_bench.nyc_yellow ( vendor_name string, Trip_Pickup_DateTime TIMESTAMP(minute), Trip_Dropoff_DateTime TIMESTAMP(minute), Passenger_Count int, Trip_Distance double, Start_Lon double, Start_Lat double, Rate_Code string, store_and_forward string, End_Lon double, End_Lat double, Payment_Type string, Fare_Amt double, surcharge double, mta_tax double, Tip_Amt double, Tolls_Amt double, Total_Amt double, tripduration_minute AS round((unix_timestamp(Trip_Dropoff_DateTime)-unix_timestamp(Trip_Pickup_DateTime))/60), speed_miles_per_hour AS if(TripDuration_Minute>0, round(60*Trip_Distance/TripDuration_Minute,3),-1), speed_miles_per_minute AS if(TripDuration_Minute>0, round(Trip_Distance/TripDuration_Minute,3),-1), start_lon_round3 AS round(Start_Lon,3), start_lat_round3 AS round(Start_Lat,3), end_lon_round3 AS round(End_Lon,3), end_lat_round3 AS round(End_Lat,3), index(year(Trip_Pickup_DateTime),month(Trip_Pickup_DateTime),day(Trip_Pickup_DateTime),vendor_name,sum(Total_Amt)) );