Skip to main content
Skip table of contents

Rolling Data / Reload Data

Context

Every period, there is a process of loading data related to the period.

1. Data owner may want to reload a specific period since data was revised

There may be additional data, functional data fix…

Example

SQL
DELETE FROM mytable WHERE Period='PeriodX';
COMMIT mytable;
Load DATA INPATH '...' INTO mytable PREFIX 'PeriodX';

2. Data owner may want to keep rolling X periods of (historical) data

As a consequence, there is a need to delete the x+1th month.

Example

SQL
DELETE FROM mytable WHERE Period='PeriodX-n';
COMMIT mytable;
Load DATA INPATH '...' INTO mytable PREFIX 'PeriodX';

What requires to be put in place within Indexima

A. Create a partitioned table with the period as a partition

In order to delete data within a table, it is required to partition the table by the column(s) used for the deletion

CODE
CREATE TABLE mytable (
Period String,
ProductID String,
Sales double,
Index (Period, sum(Sales))
PARTITONED BY (Period)

More Details: Go to CREATE TABLE

Unlike Hive, columns used for partitioning need to be declared as fields in the create table statement.

Easy to use & administrate

  • The analyzer will always suggest Hyper-indexes with partition fields (mandatory columns)
  • Users can’t manually create an index without partition fields

Partitioned tables, the reason why ?  or How deletion works within Indexima

When deleting data (from an Indexima table), the Indexima SQL engine:

  • logically deletes data (ie just having a hidden field storing storing the fact  (a group of) lines are usable not)
    • Users cannot query deleted data
  • does not recompute indexes

As a consequence, fields used to delete data are required to be present in every index. In order to guaranty so, it is required a partitioned table (partitioned by the fields used to delete data)

B. Consider compaction

Deletion through the commands DELETE FROM mytable WHERE .... & COMMIT mytable will result in the logical deletion of rows. As a result, the sizes of the table and Hyperindexes remain the same.

Compact Table through the command ALTER TABLE mytable COMPACT will actually/physically delete lines.

Compaction is useful when ...

  • a partition/period is actually deleted, especially in the context of rolling data

Compaction is less useful when ...

  • the same partition is reloaded
    • Size would still be close to the same since data would be overwritten.

The compaction may take time. It is recommended to use it, once every period (week or month)

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.