Rolling Data / Reload Data
Context
Every period, there is a process of loading data related to the period.
1. Data owner may want to reload a specific period since data was revised
There may be additional data, functional data fix…
Example
DELETE FROM mytable WHERE Period='PeriodX';
COMMIT mytable;
Load DATA INPATH '...' INTO mytable PREFIX 'PeriodX';
2. Data owner may want to keep rolling X periods of (historical) data
As a consequence, there is a need to delete the x+1th month.
Example
DELETE FROM mytable WHERE Period='PeriodX-n';
COMMIT mytable;
Load DATA INPATH '...' INTO mytable PREFIX 'PeriodX';
What requires to be put in place within Indexima
A. Create a partitioned table with the period as a partition
In order to delete data within a table, it is required to partition the table by the column(s) used for the deletion
CREATE TABLE mytable (
Period String,
ProductID String,
Sales double,
Index (Period, sum(Sales))
PARTITONED BY (Period)
More Details: Go to CREATE TABLE
Easy to use & administrate
- The analyzer will always suggest Hyper-indexes with partition fields (mandatory columns)
- Users can’t manually create an index without partition fields
Partitioned tables, the reason why ? or How deletion works within Indexima
When deleting data (from an Indexima table), the Indexima SQL engine:
- logically deletes data (ie just having a hidden field storing storing the fact (a group of) lines are usable not)
- Users cannot query deleted data
- does not recompute indexes
As a consequence, fields used to delete data are required to be present in every index. In order to guaranty so, it is required a partitioned table (partitioned by the fields used to delete data)
B. Consider compaction
Deletion through the commands DELETE FROM mytable WHERE .... & COMMIT mytable
will result in the logical deletion of rows. As a result, the sizes of the table and Hyperindexes remain the same.
Compact Table through the command ALTER TABLE mytable COMPACT
will actually/physically delete lines.
Compaction is useful when ...
- a partition/period is actually deleted, especially in the context of rolling data
Compaction is less useful when ...
- the same partition is reloaded
- Size would still be close to the same since data would be overwritten.
The compaction may take time. It is recommended to use it, once every period (week or month)