Rolling Data / Reload Data
Every period, there is a process of loading data related to the period.
1. Data owner may want to reload a specific period since data was revised
There may be additional data, functional data fix…
DELETE FROM mytable WHERE Period='PeriodX'; COMMIT mytable; Load DATA INPATH '...' INTO mytable PREFIX 'PeriodX';
2. Data owner may want to keep rolling X periods of (historical) data
As a consequence, there is a need to delete the x+1th month.
DELETE FROM mytable WHERE Period='PeriodX-n'; COMMIT mytable; Load DATA INPATH '...' INTO mytable PREFIX 'PeriodX';
What requires to be put in place within Indexima
A. Create a partitioned table with the period as a partition
In order to delete data within a table, it is required to partition the table by the column(s) used for the deletion
CREATE TABLE mytable ( Period String, ProductID String, Sales double, Index (Period, sum(Sales)) PARTITONED BY (Period)
More Details: Go to CREATE TABLE
Easy to use & administrate
- The analyzer will always suggest Hyper-indexes with partition fields (mandatory columns)
- Users can’t manually create an index without partition fields
Partitioned tables, the reason why ? or How deletion works within Indexima
When deleting data (from an Indexima table), the Indexima SQL engine:
- logically deletes data (ie just having a hidden field storing storing the fact (a group of) lines are usable not)
- Users cannot query deleted data
- does not recompute indexes
As a consequence, fields used to delete data are required to be present in every index. In order to guaranty so, it is required a partitioned table (partitioned by the fields used to delete data)
B. Consider compaction
Deletion through the commands
DELETE FROM mytable WHERE .... & COMMIT mytable will result in the logical deletion of rows. As a result, the sizes of the table and Hyperindexes remain the same.
Compact Table through the command
ALTER TABLE mytable COMPACT will actually/physically delete lines.
Compaction is useful when ...
- a partition/period is actually deleted, especially in the context of rolling data
Compaction is less useful when ...
- the same partition is reloaded
- Size would still be close to the same since data would be overwritten.
The compaction may take time. It is recommended to use it, once every period (week or month)