Context

Every period, there is a process of loading data related to the period.

1. Data owner may want to reload a specific period since data was revised

There may be additional data, functional data fix…

Example

DELETE FROM mytable WHERE Period='PeriodX';
COMMIT mytable;
Load DATA INPATH '...' INTO mytable PREFIX 'PeriodX';
SQL

2. Data owner may want to keep rolling X periods of (historical) data

As a consequence, there is a need to delete the x+1th month.

Example

DELETE FROM mytable WHERE Period='PeriodX-n';
COMMIT mytable;
Load DATA INPATH '...' INTO mytable PREFIX 'PeriodX';
SQL

What requires to be put in place within Indexima

A. Create a partitioned table with the period as a partition

In order to delete data within a table, it is required to partition the table by the column(s) used for the deletion

CREATE TABLE mytable (
Period String,
ProductID String,
Sales double,
Index (Period, sum(Sales))
PARTITONED BY (Period)
CODE

More Details: Go to CREATE TABLE

Unlike Hive, columns used for partitioning need to be declared as fields in the create table statement.

Easy to use & administrate

  • The analyzer will always suggest Hyper-indexes with partition fields (mandatory columns)
  • Users can’t manually create an index without partition fields

Partitioned tables, the reason why ?  or How deletion works within Indexima

When deleting data (from an Indexima table), the Indexima SQL engine:

  • logically deletes data (ie just having a hidden field storing storing the fact  (a group of) lines are usable not)
    • Users cannot query deleted data
  • does not recompute indexes

As a consequence, fields used to delete data are required to be present in every index. In order to guaranty so, it is required a partitioned table (partitioned by the fields used to delete data)

B. Consider compaction

Deletion through the commands DELETE FROM mytable WHERE .... & COMMIT mytable will result in the logical deletion of rows. As a result, the sizes of the table and Hyperindexes remain the same.

Compact Table through the command ALTER TABLE mytable COMPACT will actually/physically delete lines.

Compaction is useful when ...

  • a partition/period is actually deleted, especially in the context of rolling data

Compaction is less useful when ...

  • the same partition is reloaded
    • Size would still be close to the same since data would be overwritten.

The compaction may take time. It is recommended to use it, once every period (week or month)