Group customer transactions based on the subsequent transaction

75 views Asked by At

Small change in the requirement

Needed Teradata SQL query to create a grouping from the first transaction to its subsequent transactions. If a subsequent transaction is made within 30 days of the previous transaction, consider them all to belongs to the same group. Break the group and start a new one if any transaction is made more than 30 days after the previous transaction.

data:

trans_Id customer_id trans_date
001 1101 2020-11-02
002 1101 2020-11-14
003 1101 2020-11-18
004 1101 2021-12-04
005 1101 2021-01-06
006 1101 2021-01-08
007 1101 2021-02-17
008 1101 2021-03-01
009 1101 2021-03-04
010 1102 2021-03-02
011 1102 2021-03-08
012 1102 2021-04-01
013 1102 2021-04-02
014 1102 2021-04-12
015 1102 2021-04-29
016 1102 2021-06-10
017 1102 2021-06-12

Expected result (expecting grouping like below). from 001 till 004 fall under within 30 day to its previous transaction. 005 has elapsed 30 day from it's previous transaction so new group has to be started.

trans_Id customer_id trans_date Group
001 1101 2020-11-02 1
002 1101 2020-11-14 1
003 1101 2020-11-18 1
004 1101 2021-12-04 1
005 1101 2021-01-06 2
006 1101 2021-01-08 2
007 1101 2021-02-17 3
008 1101 2021-03-01 3
009 1101 2021-03-04 3
010 1102 2021-03-02 1
011 1102 2021-03-08 1
012 1102 2021-04-01 2
013 1102 2021-04-02 2
014 1102 2021-04-12 2
015 1102 2021-04-29 2
016 1102 2021-06-10 3
017 1102 2021-06-12 3
2

There are 2 answers

0
MT0 On BEST ANSWER

From Oracle 12, you can use MATCH_RECOGNIZE to perform row-by-row pattern matching:

SELECT *
FROM   table_name
MATCH_RECOGNIZE(
  PARTITION BY customer_id
  ORDER BY trans_date
  MEASURES
    MATCH_NUMBER() AS grp
  ALL ROWS PER MATCH
  PATTERN ( within_30_days+ )
  DEFINE
    within_30_days AS trans_date <= FIRST(trans_date) + INTERVAL '30' DAY
)

Which, for the sample data:

CREATE TABLE table_name (trans_id, customer_id, trans_date) AS
  SELECT '001', '1101',  DATE '2020-11-02' FROM DUAL UNION ALL
  SELECT '002', '1101',  DATE '2020-11-14' FROM DUAL UNION ALL
  SELECT '003', '1101',  DATE '2020-11-18' FROM DUAL UNION ALL
  SELECT '004', '1101',  DATE '2021-12-04' FROM DUAL UNION ALL
  SELECT '005', '1101',  DATE '2021-12-05' FROM DUAL UNION ALL
  SELECT '006', '1101',  DATE '2021-12-08' FROM DUAL UNION ALL
  SELECT '007', '1101',  DATE '2021-01-17' FROM DUAL UNION ALL
  SELECT '008', '1101',  DATE '2021-05-01' FROM DUAL UNION ALL
  SELECT '009', '1101',  DATE '2021-05-04' FROM DUAL UNION ALL
  SELECT '010', '1102',  DATE '2021-03-02' FROM DUAL UNION ALL
  SELECT '011', '1102',  DATE '2021-03-08' FROM DUAL UNION ALL
  SELECT '012', '1102',  DATE '2021-04-01' FROM DUAL UNION ALL
  SELECT '013', '1102',  DATE '2021-04-02' FROM DUAL UNION ALL
  SELECT '014', '1102',  DATE '2021-04-12' FROM DUAL UNION ALL
  SELECT '015', '1102',  DATE '2021-04-29' FROM DUAL UNION ALL
  SELECT '016', '1102',  DATE '2021-06-10' FROM DUAL UNION ALL
  SELECT '017', '1102',  DATE '2021-06-12' FROM DUAL;

Outputs:

CUSTOMER_ID TRANS_DATE GRP TRANS_ID
1101 2020-11-02 00:00:00 1 001
1101 2020-11-14 00:00:00 1 002
1101 2020-11-18 00:00:00 1 003
1101 2021-01-17 00:00:00 2 007
1101 2021-05-01 00:00:00 3 008
1101 2021-05-04 00:00:00 3 009
1101 2021-12-04 00:00:00 4 004
1101 2021-12-05 00:00:00 4 005
1101 2021-12-08 00:00:00 4 006
1102 2021-03-02 00:00:00 1 010
1102 2021-03-08 00:00:00 1 011
1102 2021-04-01 00:00:00 1 012
1102 2021-04-02 00:00:00 2 013
1102 2021-04-12 00:00:00 2 014
1102 2021-04-29 00:00:00 2 015
1102 2021-06-10 00:00:00 3 016
1102 2021-06-12 00:00:00 3 017

fiddle

1
JuSun Lee On
Other Way(ORACLE):
with jobdata as (
    select '001' as trans_Id,   '1101' as customer_id,  to_date('2020-11-02', 'YYYY-MM-DD') as trans_date from dual union all
    select '002' as trans_Id,   '1101' as customer_id,  to_date('2020-11-14', 'YYYY-MM-DD') as trans_date from dual  union all
    select '003' as trans_Id,   '1101' as customer_id,  to_date('2020-11-18', 'YYYY-MM-DD') as trans_date from dual  union all
    select '004' as trans_Id,   '1101' as customer_id,  to_date('2021-12-04', 'YYYY-MM-DD') as trans_date from dual  union all
    select '005' as trans_Id,   '1101' as customer_id,  to_date('2021-12-05', 'YYYY-MM-DD') as trans_date from dual  union all
    select '006' as trans_Id,   '1101' as customer_id,  to_date('2021-12-08', 'YYYY-MM-DD') as trans_date from dual  union all
    select '007' as trans_Id,   '1101' as customer_id,  to_date('2021-01-17', 'YYYY-MM-DD') as trans_date from dual  union all
    select '008' as trans_Id,   '1101' as customer_id,  to_date('2021-05-01', 'YYYY-MM-DD') as trans_date from dual  union all
    select '009' as trans_Id,   '1101' as customer_id,  to_date('2021-05-04', 'YYYY-MM-DD') as trans_date from dual  union all
    select '010' as trans_Id,   '1102' as customer_id,  to_date('2021-03-02', 'YYYY-MM-DD') as trans_date from dual  union all
    select '011' as trans_Id,   '1102' as customer_id,  to_date('2021-03-08', 'YYYY-MM-DD') as trans_date from dual  union all
    select '012' as trans_Id,   '1102' as customer_id,  to_date('2021-04-01', 'YYYY-MM-DD') as trans_date from dual  union all
    select '013' as trans_Id,   '1102' as customer_id,  to_date('2021-04-02', 'YYYY-MM-DD') as trans_date from dual  union all
    select '014' as trans_Id,   '1102' as customer_id,  to_date('2021-04-12', 'YYYY-MM-DD') as trans_date from dual  union all
    select '015' as trans_Id,   '1102' as customer_id,  to_date('2021-04-29', 'YYYY-MM-DD') as trans_date from dual  union all
    select '016' as trans_Id,   '1102' as customer_id,  to_date('2021-06-10', 'YYYY-MM-DD') as trans_date from dual  union all
    select '017' as trans_Id,   '1102' as customer_id,  to_date('2021-06-12', 'YYYY-MM-DD') as trans_date from dual 
)
select a.trans_Id, a.customer_id, a.trans_date 
    , dense_rank() over (partition by a.customer_id order by gap_group) as group_id
from (
    select a.*, min(a.trans_date) over (partition by a.customer_id order by customer_id, trans_date) as min_trans_date
        , a.trans_date - min(a.trans_date) over (partition by a.customer_id order by customer_id, trans_date) as gap_day
        , trunc((a.trans_date - min(a.trans_date) over (partition by a.customer_id order by customer_id, trans_date)) / 30) + 1 as gap_group
    from jobdata a
) a       
;