I am working with Snowflake healthcare data focusing on specific medical services. I need to group medical visits together if:
- The medical visits' dates overlap, or
- The medical visits' dates do not overlap, but they overlap with another unrelated visit that encompasses both of them (this connects the medical visits together)
For example:
Claims #2 and #3 do not overlap on their dates, but because Claim #4 covers both of their date ranges, I should group them together as 1 visit.
I am using two datasets:
- Table1 has the specific medical service claims I am interested in
- Table2 has all the claims that I'm working with (Table1 was created based on filtering Table2)
I am trying to compare a member's rows to each other to see if the medical visits' dates overlap or are connected by an unrelated visit's date range.
My tables creation (with dummy data) and logic code are below:
Table with specific medical visits I want to count
CREATE OR REPLACE TEMP TABLE table1 (
ProcedureCode varchar,
MemberID NUMERIC,
StartDate date,
EndDate date,
StartYearMonth varchar,
EndYearMonth varchar);
Insert in dummy data
INSERT INTO table1 (ProcedureCode, MemberID, StartDate, EndDate, StartYearMonth, EndYearMonth)
VALUES
('12345', 1, '7/6/2021', '7/6/2021', 202107, 202107),
('12345', 1, '8/17/2021', '8/17/2021', 202108, 202108),
('12345', 2, '8/3/2021', '8/3/2021', 202108, 202108),
('12345', 2, '8/9/2021', '8/9/2021', 202108, 202108),
('12345', 3, '11/5/2021', '11/5/2021', 202111, 202111),
('12345', 3, '11/11/2021', '11/11/2021', 202111, 202111);
Table with full claims
CREATE OR REPLACE TEMP TABLE table2 (
ProcedureCode varchar,
MemberID NUMERIC,
StartDate date,
EndDate date,
StartYearMonth varchar,
EndYearMonth varchar);
Insert in dummy data
INSERT INTO table2 (ProcedureCode, MemberID, StartDate, EndDate, StartYearMonth, EndYearMonth)
VALUES
('12345', 1, '7/6/2021', '7/6/2021', 202107, 202107),
('12345', 1, '8/17/2021', '8/17/2021', 202108, 202108),
('98765', 2, '8/1/2021', '8/2/2021', 202108, 202108),
('12345', 2, '8/3/2021', '8/3/2021', 202108, 202108),
('12345', 2, '8/9/2021', '8/9/2021', 202108, 202108),
('98765', 2, '8/11/2021', '8/15/2021', 202108, 202108),
('98765', 3, '11/1/2021', '11/15/2021', 202111, 202111),
('12345', 3, '11/5/2021', '11/5/2021', 202111, 202111),
('12345', 3, '11/11/2021', '11/11/2021', 202111, 202111);
My attempt to group claims together based on overlapping dates
CREATE OR REPLACE TEMP TABLE table3 AS
SELECT
a.MemberID,
a.StartDate,
a.EndDate,
LEAD(a.StartDate, 1) OVER (PARTITION BY a.MemberID ORDER BY a.StartDate) AS next_row_start_date,
LEAD(a.EndDate, 1) OVER (PARTITION BY a.MemberID ORDER BY a.EndDate) AS next_row_end_date,
MIN(b.StartDate) AS MinStartDate,
MAX(b.EndDate) AS MaxEndDate,
ROW_NUMBER() OVER (PARTITION BY a.MemberID ORDER BY a.StartDate) AS row_num
FROM table1 A
LEFT JOIN table2 B
ON A.MemberID = B.MemberID
AND A.StartYearMonth = B.StartYearMonth
AND A.EndYearMonth = B.EndYearMonth
GROUP BY
A.MemberID,
a.StartDate,
a.EndDate;
My attempt to flag claims if they should be grouped together, although it is incorrect
CREATE OR REPLACE TEMP TABLE table4 AS
SELECT
A.*,
CASE WHEN
(
((a.StartDate BETWEEN a.MinStartDate AND a.MaxEndDate) AND (a.next_row_start_date BETWEEN a.MinStartDate AND a.MaxEndDate))
OR
((a.EndDate BETWEEN a.MinStartDate AND a.MaxEndDate) AND (a.next_row_end_date BETWEEN a.MinStartDate AND a.MaxEndDate))
)
THEN 1
ELSE 0
END AS flag
FROM table3 A;
The problem is with MemberID 2: For the month of August, they have their first visit from 08/01 - 08/02, and their last visit 08/11 - 08/15. In between those visits, they have two specific medical visits that I'd like to count as 2 visits (they do not overlap, nor have any other unrelated visit that covers their dates). However, my logic is returning a minimum-maximum of 08/01 - 08/15 (which is incorrect); this would cover both specific medical visits and incorrectly group them as 1 episode.
To help visualize the above example's results:
Members 1 and 2 should not have any specific medical visits that overlap, while Member 3 does. The correct output should look like this, with the column FLAG having a 0 for non-overlapping visits, and a 1 for overlapping visits. Just to be clear, Member 3 is flagged as a 1 (and so their two visits should be counted as one visit) because they had an unrelated visit that covered both specific medical visit dates (from the full claims dataset). The other two members do not have that.
Thank you! I know this was a very long post. I appreciate any input/suggestions you may have.

