Fastest way to backup / delete from table

Question

Fastest way to backup / delete from table

129 views Asked by Horst At 30 May 2021 at 12:59

I am trying to backup some data from a table with > 50M rows. The table looks like this

# 
entry_id             : int unsignedauto_increment   # unique entry id
---
time=CURRENT_TIMESTAMP : timestamp                  # current timestamp
room                 : varchar(255)                 # Room
id                   : varchar(255)                 # Sensor ID / NAME
value                : double                       # sensor value

My plan is to

Fetch a limited amount of KEYS (i.e. fetch('KEY', limit=some_large_number))
Retrieve table entries for those KEYS
Save data chunk on disk
Delete table entries belonging to KEYS
Start over ...

Just fetching, let's say 1_000_000 entries, is reasonably fast (a few seconds), however if I want to go through retrieving primary keys -> fetching data -> deleting those entries - I am basically stuck at fetching the data (>20 minutes for ~ 100_000 entries or timeout(?)).

What is the most elegant / time efficient way to perform this task using datajoint logic?

(Datajoint python version '0.13.2')

Original Q&A

There are 1 answers

**Dimitri Yatsenko** · Accepted Answer · 2021-05-30T14:11:46+00:00

If your data are not changing, then you can use the limit and offset keywords to stride through the table, fetching in chunks without deleting or restricting. There is no need to retrieve the primary key first, to restrict the fetch, or to delete the block.

step = 1_000_000
for chunk in range((len(table) + step - 1) // step):
    block = table.fetch(limit=step, offset=step * chunk)
    ... # save block

If this is a simple backup, then deletes are not necessary. If you need to delete the data, you can delete it all at once at the end.

TechQA.

Fastest way to backup / delete from table

There are 1 answers

Related Questions in DATAJOINT

Popular Questions

Popular Tags

Trending Questions