I have data like below.. This is just a sample data gear wise but having more than 10 million records a year which we can insert into another table from this below table at a shot
Id name date
------------------
1 a 2018-01-01
2 b 2018-02-01
3 a 2018-6-01
4 a 2018-07-01
5 a 2018-10-01
6 a 2019-01-01
7 b 2019-02-01
8 a 2019-6-01
9 a 2019-07-01
10 a 2019-10-01
11 a 2020-01-01
12 b 2020-02-01
13 a 2020-6-01
14 a 2020-07-01
15 a 2020-10-01
Tried many approaches as data is huge and we need only 3 latest records from all these years; fetching from this large amount of data is not working using SQL queries.
Expected output:
Id name date
----------------
15 a 2020-10-01
14 a 2020-07-01
13 a 2020-6-01
7 b 2019-02-01
2 b 2018-02-01
This is a query that should run in recent versions of most SQL compliant databases.
The
row_number()
function simply supplies a set of integers starting at 1, this occurs within eachpartition
(here it will start at 1 for each name) and the specific row that is given 1 is determined by theorder by
(here we sort by descending dates so the most recent is 1, next is 2 and next is 3 etc.)If
row_number()
isn't available then please tell use which database type and the version.nb "date" is a reserved word in some implementations of SQL, so you may need to quote that column name to be safe.
Also note that no matter what because you want the "latest 3 rows" and "per name" you cannot avoid sorting through the ~40 million rows. If you need to do this regularly then an index to suit this query may ne worthwhile e.g.
CREATE INDEX idx_your_table_name_date ON your_table(name, "date" DESC);
However, you would need to test if it does actually help or not, or, if the index slows down inserts too much.