Sessionize by source and duration in PostgreSQL

22 views Asked by At

For a sample like


create table events  (
  uid int,
  ts timestamp
);
insert into events values 
-- uid session_id
-- 1 1
(1, '2024-02-01 00:10:00'),
(1, '2024-02-02 00:10:00'),
(1, '2024-02-02 00:20:00'),
(1, '2024-02-03 00:20:00'),
(1, '2024-02-04 00:00:00'),
-- 1 2
(1, '2024-02-04 00:15:00'),
(1, '2024-02-05 00:10:00'),
-- 1 3
(1, '2024-02-08 00:20:00'),
(1, '2024-02-08 00:30:00'),
(1, '2024-02-11 00:05:00'),
-- 2 1
(2, '2024-02-05 00:10:00'),
(2, '2024-02-06 00:10:00'),
(2, '2024-02-07 00:20:00'),
-- 2 3
(2, '2024-02-08 00:15:00'),
(2, '2024-02-08 00:20:00')
;

How can we identify and count sessions as is it shown in the comments above each group of inserts?

In other words, I need to identify or at least count the number of unique sessions, where each can be no longer than 72 hours.

For example, (1, '2024-02-04 00:00:00') is still in session_id 1 for uid 1, because the end of the session_id 1, which started at 2024-02-01 00:10:00, will not occur until 2024-02-04 00:10:00. But (1, '2024-02-04 00:15:00') is a new session_id 2 for uid 1.

I've been beating around the gaps-and-islands, windows, frame_clauses, but it just doesn't come together. For me, it either just clomps all timestamps into one huge session by uid, or as in case with range offests, creates non-existing session start timestamps.

It's so simple with procedural language

CREATE OR REPLACE FUNCTION count_unique_sessions() RETURNS INTEGER AS $$
DECLARE
    current_uid INT;
    current_ts TIMESTAMP;
    prev_uid INT;
    session_start TIMESTAMP;
    session_count INTEGER := 0;
    uid_ts RECORD;
BEGIN
    prev_uid := 0;
    session_start := '1990-01-01 00:00:00';
    
    FOR uid_ts IN
        SELECT uid, ts
        FROM events
        ORDER BY uid, ts
    LOOP
        current_uid := uid_ts.uid;
        current_ts := uid_ts.ts;
       
        if prev_uid <> current_uid then
            session_count := session_count + 1;
            session_start := current_ts;
        else
            if current_ts - session_start > interval '72 hours' then
                session_count := session_count + 1;
                session_start := current_ts;
            end if;     
        end if;

        prev_uid := current_uid;        
        
    END LOOP;

   RETURN session_count;
END;
$$ LANGUAGE plpgsql;

But can it be done with SQL only?

0

There are 0 answers