I have a transaction table where I am trying to map Outflows to Inflows based on the LIFO.
Input dataset
| Id | Date | Type | Amount |
|---|---|---|---|
| 1 | 26-01-2024 | Inflow | 519 |
| 2 | 26-01-2024 | Outflow | 100 |
| 3 | 26-01-2024 | Outflow | 139 |
| 4 | 26-01-2024 | Outflow | 122 |
| 5 | 29-01-2024 | Outflow | 42 |
| 6 | 29-01-2024 | Inflow | 713 |
| 7 | 29-01-2024 | Inflow | 887 |
| 8 | 29-01-2024 | Outflow | 92 |
| 9 | 29-01-2024 | Outflow | 1593 |
| 10 | 29-01-2024 | Outflow | 25 |
Desired Output mapped Dataset
| Inflow_Id | Inflow_Date | Outflow_Id | Outflow_Date | inflow_amount | outflow_amount | map_amount | inflow_left | outflow_left |
|---|---|---|---|---|---|---|---|---|
| 1 | 26-01-2024 | 2 | 26-01-2024 | 519 | 100 | 100 | 419 | 0 |
| 1 | 26-01-2024 | 3 | 26-01-2024 | 519 | 139 | 139 | 280 | 0 |
| 1 | 26-01-2024 | 4 | 26-01-2024 | 519 | 122 | 122 | 158 | 0 |
| 1 | 26-01-2024 | 5 | 29-01-2024 | 519 | 42 | 42 | 116 | 0 |
| 7 | 29-01-2024 | 8 | 29-01-2024 | 887 | 92 | 92 | 795 | 0 |
| 7 | 29-01-2024 | 9 | 29-01-2024 | 887 | 1593 | 795 | 0 | 798 |
| 6 | 29-01-2024 | 9 | 29-01-2024 | 713 | 1593 | 713 | 0 | 85 |
| 1 | 26-01-2024 | 9 | 29-01-2024 | 519 | 1593 | 85 | 31 | 0 |
| 1 | 26-01-2024 | 10 | 29-01-2024 | 519 | 25 | 25 | 6 | 0 |
If you look at the last four lines of the output dataset, outflows 9,10 are mapped to inflow 1 because inflows 6 and 7 were exhausted first.
Between 6 and 7, which are both inflows, the subsequent outflows are mapped to 7 first(Last In, so first out) before 6.
This is a practical asset allocation problem. Imagine bank account transactions. A customer deposits cash to then wire transfer that amount to family. It seems logical that the wire transfer is tagged towards the amount deposited prior to the transfer( Last In First Out), instead being allocated to existing balance from a previous inflow ( First In First Out). Hence, LIFO is better for this use-case than FIFO.
1)The above output format is essential. It tracks which outflow is tagged to which inflow, and vice versa.
2)There could be multiple outflows(debits) that could be paid from one inflow(credit).
3)There could be multiple inflows(credits) that are then used to withdraw one outflow(debit).
4)Hence, the map amount helps us understand how much of an outflow and inflow was matched, and what was left over to be mapped to a future transaction.
5)Sum(Outflows) is not greater than sum(inflows). Money that is not in the account cannot be spent. Although account going into overdraft (negative) is technically possible, let's ignore that scenario, or just not assign that outflow.
6)The first transaction is usually an Inflow. There needs to be an inflow first before it can be spent.
Logic to be implemented:
For every inflow, the outflows following the inflow should be mapped first
If outflows after an inflow exceed the inflow amount, then the subsequent outflows should be attributed to the previous inflow.
If an inflow has not been attributed, it needs to show up as a inflow without an outflow mapping ( it could be mapped later as new outflows are added to the table)
FIFO logic from the below links using running total and previous total seem straightforward - Row pairing according to FIFO logic in SQL Server 2018
But LIFO seems more complicated because it feels like there is some kind of recursion needed, as running totals need to be reset after every inflow, and outflows need to be assigned to previous inflows. I would actually prefer a solution without recursion, but I don't know if that is possible.
I found a SAS implementation that is exactly what I am looking for. Looking to port this SQL/Python - https://communities.sas.com/t5/SAS-Programming/How-to-create-a-Last-in-First-out-LIFO-Algorithm-Using-Hash/td-p/413730
I am trying to implement this on GCP ( Bigquery ) for a dataset that contains 100s of transactions for every customer, for about 4 million customers.
LIFO works for discrete values:
So the best way would be that you unnest the dataset by each value.
For continuous values, the
partion byonly works, if the value is rounded down to an already present entry. This is done in the subselect.