Rolling Data / Reload Data
At every time period of the year, you may want to load data related to that time period.
1. The data owner may want to reload a specific period from the time data was last revised
There may be additional data, functional data fixes…
DELETE FROM mytable WHERE Period='PeriodX'; COMMIT mytable; Load DATA INPATH '...' INTO mytable PREFIX 'PeriodX';
2. The data owner may also want to keep rolling X periods of historical data
In this case, you need to delete the x+1th month, in order for your data to be accurate.
DELETE FROM mytable WHERE Period='PeriodX-n'; COMMIT mytable; Load DATA INPATH '...' INTO mytable PREFIX 'PeriodX';
What requires to be put in place within Indexima
A. Create a partitioned table with the period as a partition
In order to delete data within a table, it is required to partition the table by the column(s) used for the deletion.
CREATE TABLE mytable ( Period String, ProductID String, Sales double, Index (Period, sum(Sales)) PARTITONED BY (Period)
More Details: Go to CREATE TABLE.
Easy to use & administrate
- The analyzer will always suggest Hyper-indexes with partition fields (mandatory columns).
- Users can’t manually create an index without partition fields.
Why partitioned tables ? or how deletion works within Indexima
When deleting data (from an Indexima table), the Indexima SQL engine:
- logically deletes data (i.e. using a hidden field storing which lines are usable and not).
- Users cannot query deleted data.
- does not recompute indexes.
As a consequence, fields used to delete data need to be present in every index. To garantee that, the SQL Indexima Engine needs a partitioned table that is partitioned by the fields used to delete data.
B. Consider compaction
Deletion through the commands
DELETE FROM mytable WHERE .... & COMMIT mytable will result in the logical deletion of rows. As a result, the sizes of the table and Hyperindexes remain the same.
Compact Table through the command
ALTER TABLE mytable COMPACT will physically delete lines.
Compaction is useful when ...
- a partition/period is physically deleted, especially in the context of rolling data.
Compaction is less useful when ...
- the same partition is reloaded.
- The size of the partition remains close to the same since data is overwritten.
The compaction may take time. It is recommended to use it, once every period (week or month).