Skip to main content
Skip table of contents

Rolling Data / Reload Data

Context

At every time period of the year, you may want to load data related to that time period.

1. The data owner may want to reload a specific period from the time data was last revised

There may be additional data, functional data fixes…

Example

SQL
DELETE FROM mytable WHERE Period='PeriodX';
COMMIT mytable;
Load DATA INPATH '...' INTO mytable PREFIX 'PeriodX';

2. The data owner may also want to keep rolling X periods of historical data

In this case, you need to delete the x+1th month, in order for your data to be accurate.

Example

SQL
DELETE FROM mytable WHERE Period='PeriodX-n';
COMMIT mytable;
Load DATA INPATH '...' INTO mytable PREFIX 'PeriodX';

What requires to be put in place within Indexima

A. Create a partitioned table with the period as a partition

In order to delete data within a table, it is required to partition the table by the column(s) used for the deletion.

CODE
CREATE TABLE mytable (
Period String,
ProductID String,
Sales double,
Index (Period, sum(Sales))
PARTITONED BY (Period)

More Details: Go to CREATE TABLE.

Unlike Hive, columns used for partitioning need to be declared as fields in the create table statement.

Easy to use & administrate

  • The analyzer will always suggest Hyper-indexes with partition fields (mandatory columns).
  • Users can’t manually create an index without partition fields.

Why partitioned tables ? or how deletion works within Indexima

When deleting data (from an Indexima table), the Indexima SQL engine:

  • logically deletes data (i.e. using a hidden field storing which lines are usable and not).
    • Users cannot query deleted data.
  • does not recompute indexes.

As a consequence, fields used to delete data need to be present in every index. To garantee that, the SQL Indexima Engine needs a partitioned table that is partitioned by the fields used to delete data.

B. Consider compaction

Deletion through the commands DELETE FROM mytable WHERE .... & COMMIT mytable will result in the logical deletion of rows. As a result, the sizes of the table and Hyperindexes remain the same.

Compact Table through the command ALTER TABLE mytable COMPACT will physically delete lines.

Compaction is useful when ...

  • a partition/period is physically deleted, especially in the context of rolling data.

Compaction is less useful when ...

  • the same partition is reloaded.
    • The size of the partition remains close to the same since data is overwritten.

The compaction may take time. It is recommended to use it, once every period (week or month).

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.