Problem
I have a starting table that looks like this:
I'm attempting to write a SQL query that will write this data into a similarly formatted table, but with additional adornments indicating when records expired and what records are active. The results would look like this:
identifier | loaddate | loadenddate | activeflag | symbol
723a90699e99ec9e00216910910384bd | 2020-04-01 | 2020-04-07 | 0 | DXB
723a90699e99ec9e00216910910384bd | 2020-04-08 | 2999-12-31 | 1 | DXB CL
Note that there are 1000s of different identifiers, some with one, two, three+ different symbols over various different timeframes.
Requirements
- Anytime an identifier is seen for the first time, it must be created in the final table with today's date as a load date and a 2999-12-31 loadenddate and activeflag=1
- When that identifier is seen on a second day, only add a row if the symbol has changed. If it has, "expire" the previous row by setting the previous row's loadenddate to this new row's loaddate - 1 day and activeflag = 0
- The sql query (or queries) need to be able to re-run on the source table each day going forward as well, so they correctly handle existing data in the destination table as well as the destination table being blank (initial run)
What I've Got So Far
To load initially (and not duplicate), I have the following SQL:
INSERT INTO finaltable(
listinghashkey
symbol,
loaddate,
loadenddate,
activeflag
)
SELECT
s.listinghashkey
s.symbol,
MAX(s.loaddate),
'2999-12-31 00:00:00.0',
1
FROM
startingtable s
LEFT JOIN finaltable f ON s.listinghashkey = f.listinghashkey
WHERE (f.listinghashkey IS NULL)
GROUP BY s.listinghashkey, s.symbol
Converting your initial format to the new format is pretty easy as a gaps-and-islands problem:
This eschews the "is active" flag and the arbitrary future date. It just uses
NULL
for the infinite future. (You can easily tweak the logic for your version; this is just simpler to work with.)If you have this table and you want to add the next load date, then you have can construct the next version in its entirety using
union all
. The idea is to divide the processing into four steps:The SQL looks like:
I prefer to do this as a
select
/replace operation rather than a series ofinsert
/update
steps (or usingmerge
perhaps). But you this is the basic idea.