I'm trying to identify a consecutive sequence in dates, per person
, as well as sum amount
for that sequence. My records
table looks like this:
person start_date end_date amount
1 2015-09-10 2015-09-11 500
1 2015-09-11 2015-09-12 100
1 2015-09-13 2015-09-14 200
1 2015-10-05 2015-10-07 2000
2 2015-10-05 2015-10-05 300
2 2015-10-06 2015-10-06 1000
3 2015-04-23 2015-04-23 900
The resulting query should be this:
person sequence_start_date sequence_end_date amount
1 2015-09-10 2015-09-14 800
1 2015-10-05 2015-10-07 2000
2 2015-10-05 2015-10-06 1400
3 2015-04-23 2015-04-23 900
Below, I can use LAG and LEAD to identify the sequence start_date
and end_date
, but I don't have a way to aggregate the amount
. I'm assuming the answer will involve some sort of ROW_NUMBER()
window function that will partition by sequence, I just can't figure out how to make the sequence identifiable to the function.
SELECT
person
,COALESCE(sequence_start_date, LAG(sequence_start_date, 1) OVER (ORDER BY person, start_date)) AS "sequence_start_date"
,COALESCE(sequence_end_date, LEAD(sequence_end_date, 1) OVER (ORDER BY person, start_date)) AS "sequence_end_date"
FROM
(
SELECT
person
,start_date
,end_date
,CASE WHEN LAG(end_date, 1) OVER (PARTITION BY person ORDER BY start_date) + interval '1 day' = start_date
THEN NULL
ELSE start_date
END AS "sequence_start_date"
,CASE WHEN LEAD(start_date, 1) OVER (PARTITION BY person ORDER BY start_date) - interval '1 day' = end_date
THEN NULL
ELSE end_date
END AS "sequence_end_date"
,amount
FROM records
) sq
Even your updated (sub)query still isn't quite right for the data you've presented, which is inconsistent about whether the start date of the second and subsequent rows in a sequence should be equal to their previous rows' end date or one day later. The query can be updated pretty easily to accommodate both, if that's needed.
In any case, you cannot use COALESCE as a window function. Aggregate functions may be used as window functions by providing an
OVER
clause, but not ordinary functions. There are nevertheless ways to apply window function to this task. Here's a way to identify the sequences in your data (as presented):That relies on
MAX()
andMIN()
instead ofCOALESCE()
, and it applies window framing to get the appropriate scope for each of those within each partition. Results:Do note that that does not require an exact match of end date with subsequent start date; all rows for each person that abut or overlap will be assigned to the same sequence. If (
person
,start_date
) cannot be relied upon to be unique, however, then you probably need to order the partitions by end date as well.And now you have a way to identify the sequences: they are characterized by the triple
person, sequence_start_date, sequence_end_date
. (Or actually, you need only the person and one of those dates for identification purposes, but read on.) You can wrap the above query as an inline view of an outer aggregate query to produce your desired result:Of course you need both dates as grouping columns if you're going to select them.