String Aggregration

104 views Asked by At

My source table looks like this:

Account_Number, Code, Transaction_Amount, Comment, Sequence_Number
4321, W, 10.21,,1
4321, C,,'Payment',2
4321, C,,'From Checking Account',3
4321, D,20.00,,4
4321, C,,'Direct Deposit',4
7430, W, 40.00,,5
7430, C,,'From Checking',5

the source is a transaction ledger, W stands for "Withdraw", D stands for "Deposit", and C stand for comment. when every transaction occurs, it could be follow by. 0 or more comment lines. I would like to aggregate all the comments that comes after a "W" or a "D" and add them to the "W" or "D" row. Example:

Account_Number, Code, Transaction_Amount, Comment, Sequence_Number
4321, W, 10.21,'Payment,From Checking Account',1
4321, D,20.00,'Direct Deposit',4
7430, W, 40.00,"From Checking,5
3

There are 3 answers

1
Sowmyadhar Gourishetty On

Here is the Query using the LEAD and STRING_AGG. Assuming some big number for the last Withdraw or Deposit row for a Account_Number.

HERE is the Working Code.

SELECT *
    ,(
        SELECT STRING_AGG(comm.Comment, ', ') Comment
        FROM Transactions comm
        WHERE comm.Account_Number = a.Account_Number
            AND comm.Code = 'C'
            AND comm.Sequence_Number >= a.Sequence_Number
            AND comm.Sequence_Number < a.END_Sequence_Number
        )
FROM (
    SELECT *
        ,ISNULL(LEAD(Sequence_Number) OVER (
                PARTITION BY Account_Number ORDER BY Sequence_Number
                ), 100000000) END_Sequence_Number
    FROM dbo.Transactions t
    WHERE Code IN ('W', 'D')
    ) a

Results: enter image description here

2
GMB On

I understand this as a gaps-and-islands problem, where you want to gather all comments in each gap, and bring them to the preceding island.

SQL Server does not have a window function for string aggregation, which makes things a bit tricky. This might be simpler address with lateral joins:

select t.account_number, t.code, t.transaction_amount, c.comment, t.sequence_number
from mytable t
outer apply (
    select top (1) t1.sequence_number as next_sequence_number
    from mytable t1
    where 
        t1.account_number = t.account_number 
        and t1.sequence_number > t.sequence_number
        and t1.code <> 'C' 
    order by t1.sequence_number
) s
outer apply (
    select string_agg(t1.comment) within group(order by t1.sequence_number) comment
    from mytable t1
    where
        t1.account_number = t.account_number 
        and t1.sequence_number > t.sequence_number
        and (
            t1.sequence_number < s.next_sequence_number 
            or s.next_sequence_number is null
        )
        and t1.code = 'C' 
) c
where t.code <> 'C'

The first lateral join computes the sequence number of the "next" non-comment transaction; the second one aggregates all comments in between the current row and the next one.

Note that string_agg() is available starting SQL Server 2017 only.

2
Gordon Linoff On

I would do this using string_agg() but by assigning groups to each rows. What are the groups? The cumulative number of non-C codes. So:

select account,
       max(case when code <> 'C' then code end) as code,
       sum(amount) as amount,
       string_agg(comment, ',') within group (order by seq) as comments
from (select t.*,
             sum(case when code <> 'C' then 1 else 0 end) over (partition by account order by sequence) as grp
      from t
     ) t
group by account, grp;