Insert latest 3 records from a huge table(40 million records) which has data from different years

31 views Asked by At

I have data like below.. This is just a sample data gear wise but having more than 10 million records a year which we can insert into another table from this below table at a shot

Id name date
------------------
1  a    2018-01-01
2  b    2018-02-01
3  a    2018-6-01
4  a    2018-07-01
5  a    2018-10-01

6  a    2019-01-01
7  b    2019-02-01
8  a    2019-6-01
9  a    2019-07-01
10 a    2019-10-01
11 a    2020-01-01
12 b    2020-02-01
13 a    2020-6-01
14 a    2020-07-01
15 a    2020-10-01

Tried many approaches as data is huge and we need only 3 latest records from all these years; fetching from this large amount of data is not working using SQL queries.

Expected output:

Id  name date
----------------
15  a    2020-10-01
14  a    2020-07-01
13  a    2020-6-01

7   b    2019-02-01
2   b    2018-02-01
1

There are 1 answers

0
Adrian Maxwell On

This is a query that should run in recent versions of most SQL compliant databases.

select 
  Id
, name
, "date"
from (
    select 
      Id
    , name
    , "date"
    , row_number() over(partition by name order by "date" DESC) as row_num
    from your_table
    ) d
where row_num <= 3

The row_number() function simply supplies a set of integers starting at 1, this occurs within each partition (here it will start at 1 for each name) and the specific row that is given 1 is determined by the order by (here we sort by descending dates so the most recent is 1, next is 2 and next is 3 etc.)

If row_number() isn't available then please tell use which database type and the version.

nb "date" is a reserved word in some implementations of SQL, so you may need to quote that column name to be safe.

Also note that no matter what because you want the "latest 3 rows" and "per name" you cannot avoid sorting through the ~40 million rows. If you need to do this regularly then an index to suit this query may ne worthwhile e.g.

CREATE INDEX idx_your_table_name_date ON your_table(name, "date" DESC);

However, you would need to test if it does actually help or not, or, if the index slows down inserts too much.