How to find the most consecutive weeks

73 views Asked by At

My data:

INSERT INTO martin_test (id, user_id, created_at) VALUES
  ('1bb20295-fd7b-4918-a496-313e5babd482', 'abc', '2024-01-04 15:54:51')
, ('08565423-3371-4720-abb3-80c7aef8333d', 'abc', '2024-01-11 15:54:51')
, ('b17443fe-5a4f-4b7c-934d-3d2910a65f44', 'abc', '2024-01-18 15:54:51')
, ('3d267dc3-ee86-44e9-b1fe-fd918a64b77c', 'abc', '2024-02-01 15:54:51')
, ('d28d73d3-bc9c-4192-998a-a6bafce604a5', 'abc', '2024-02-08 15:54:51')
, ('401d38f8-d277-4605-af33-b4a9bd2eef25', 'abc', '2024-02-22 15:54:51')
, ('b804fa29-23af-4d93-a5c9-187767fec3c9', 'abc', '2024-02-29 15:54:51')
;

My query:

SELECT *,CASE WHEN gap_weeks > 1 THEN 1 END, avg(CASE WHEN gap_weeks > 1 THEN 7 END) OVER (PARTITION BY user_id ORDER BY week_number) AS grp_avg,
    sum(CASE WHEN gap_weeks > 1 THEN 6 END) OVER (PARTITION BY user_id ORDER BY week_number) AS grp_sum,

           COUNT(CASE WHEN gap_weeks > 1 THEN 1 END) OVER (PARTITION BY user_id ORDER BY week_number) AS grp
    FROM (
      SELECT user_id,
             week_number,
             lag(week_number) OVER (PARTITION BY user_id ORDER BY week_number) AS pre_week_number, 
             week_number - lag(week_number) OVER (PARTITION BY user_id ORDER BY week_number) AS gap_weeks
      FROM (
        select distinct EXTRACT(
              WEEK 
              FROM 
                CAST(created_at AS DATE)
            ) AS week_number ,user_id  from martin_test ) AS a
    ) AS subquery1

Why is abc's grp '0.1.2.3'? I don't know why the next gap_weeks > 1 then grp will be added 1), and how's count & avg & sum (case when) over(PARTITION by) works?
My results

My requirement is:
"Find the most consecutive weeks!"

I found this related blog post.

[Supplementary Notes]
Result2 Sorry,My questions are
1.Why gap_weeks more than 1,the grp increasing is "1" not "3" (see my photo Result2)
2.Why week_number (1&2&3 is a group,5%6 is a group,8&9&10 is a group,12 is a group)

1

There are 1 answers

5
Erwin Brandstetter On BEST ANSWER

I focused on your declared objective:

Find the most consecutive weeks!

Can be done like this:

SELECT dense_rank() OVER (ORDER BY count(*) DESC) AS rank
     , user_id
     , count(*) AS consecutive_weeks
     , concat_ws(' - ', min(week_nr), max(week_nr)) AS week_range
FROM (
   SELECT *
        , week_nr - row_number() OVER (PARTITION BY user_id ORDER BY week_nr) AS grp
   FROM  (
      SELECT DISTINCT user_id, extract(week FROM created_at)::int AS week_nr
      FROM   martin_test
      ) sub
   ) sub1
GROUP  BY user_id, grp
ORDER  BY consecutive_weeks DESC;

fiddle

Consecutive week numbers would increase in lockstep with the result of row_number() over the same partition. If you subtract that row number from actual week numbers (week_nr), consecutive weeks get the same grp number. Then just aggregate.

Works only for weeks of the same year, obviously. Else you need to do more.
Related:

Notes

Not sure what you are trying to achieve with the CASE expressions. I presume you don't need them.

EXTRACT works just fine with timestamp input. No need to cast to date. But consider timestamptz instead of timestamp, if multiple time zones may be involved.

The cast to integer ::int is optional. Cheaper to operate with integer instead of numeric for just week numbers.