SQL Server set-based while loop avoiding halloween protection (LAHP) for hierarchy lookthrough halving certain results

188 views Asked by At

I have been researching methods to improve a hierarchical look through (currently carried out using a C# task) on a universe of financial portfolios. The purpose is to determine the ultimate holdings of the top parent portfolio by apportioning out holdings of children portfolios accordingly.

The current set of data takes around 30 minutes to run. This is for about 1000 financial portfolios, and each portfolio can link up to about 4 portfolios (4 deep, with around 40 line items on average each).

The data is stored in SQL Server so this was my first avenue, and initially attempted a CTE approach. The CTE approach shortened the run to around 7 minutes, which is a huge improvement. However when ran against a larger set of data, it then stalls and takes even longer than the C# task.

I then turned to a set-base while loop (LAHP) approach after reading the following article: https://www.simple-talk.com/sql/performance/the-performance-of-traversing-a-sql-hierarchy-/

I don't want to go into too much detail into how it works as this question is already long enough, however it uses a combination of 2 temp tables, and then a while loop to insert into alternating tables, side stepping the Halloween protection.

The current LAHP is even faster than the CTE and seems to be better at handling larger sets of portfolios, and deeper look throughs, however some of the results (financial holding value of the instrument (child/security) within the portfolio) are coming out exactly half of what they should be.

Please see query below, this is used in a stored proc format and then takes each root portfolio as a variable.

    DECLARE @root AS VARCHAR(50) = 'Portfolio1';

    CREATE TABLE #T1
    (
      ID            INT             NOT NULL IDENTITY(1,1) PRIMARY KEY,
      lvl             INT             NOT NULL,
      PortfolioCode       varchar(MAX)    NULL,
      InstrumentCode      varchar(MAX)    NULL,
      LinkedID varchar(MAX)    NULL,
      TradedMarketValue     NUMERIC(28, 8)  NOT NULL

    );
    CREATE NONCLUSTERED INDEX IDX_Pcode ON #T1(ID)
    INCLUDE (lvl)

    CREATE TABLE #T2
    (
      ID              INT             NOT NULL IDENTITY(1,1) PRIMARY KEY,
      lvl           INT         NOT NULL,
      PortfolioCode         varchar(MAX)    NULL,
      InstrumentCode        varchar(MAX)    NULL,
      LinkedID   varchar(MAX)  NULL,
      TradedMarketValue     NUMERIC(28, 8)  NOT NULL
    );
    CREATE NONCLUSTERED INDEX IDX_Pcode ON #T2(ID)
    INCLUDE (lvl)

    DECLARE @lvl AS INT = 0;

    -- insert root node
    INSERT INTO #T1
    (
        lvl, 
        LinkedID, 
        PortfolioCode, 
        InstrumentCode, 
        TradedMarketValue
    )
      SELECT 
          @lvl, 
          I.LinkedID, 
          P.PortfolioCode, 
          I.Code, 
          P.TradedMarketValue
      FROM PortfolioHolding P
      join Instrument I on I.Code = P.InstrumentCode
     WHERE P.PortfolioCode =  @root


    WHILE @@ROWCOUNT > 0
    BEGIN
      SET @lvl += 1;

      -- insert children of nodes in prev level
      IF @lvl % 2 = 1

        INSERT INTO #T2
        (
            lvl, 
            LinkedID, 
            PortfolioCode, 
            InstrumentCode, 
            TradedMarketValue
        )
          SELECT 
          @lvl, 
          I.LinkedID, 
          P.PortfolioCode, 
          I.Code, 

          --calculate ratio of holding and apply to holding
          cast(cast(cast(ISNULL(T.TradedMarketValue, 0) as float) / 
          NULLIF(sum(cast(ISNULL(P.TradedMarketValue, 0) as float)) over         (partition by P.PortfolioCode) , 0)  as float) 
          * ISNULL(P.TradedMarketValue, 0) as float)

          FROM #T1 AS T
            INNER JOIN PortfolioHolding AS P
              ON T.lvl = @lvl - 1
              AND P.PortfolioCode = T.LinkedID
         JOIN Instrument I on I.Code = P.InstrumentCode;

      ELSE

        INSERT INTO #T1
        (
            lvl, 
            LinkedID, 
            PortfolioCode, 
            InstrumentCode, 
            TradedMarketValue
        )
        SELECT 
        @lvl, 
        I.LinkedID, 
        P.PortfolioCode, 
        I.Code, 

        --calculate ratio of holding and apply to holding
         (cast(cast(cast(ISNULL(T.TradedMarketValue, 0) as float) / 
         NULLIF(sum(cast(ISNULL(P.TradedMarketValue, 0) as float)) over         (partition by P.PortfolioCode) , 0) as float) 
        * ISNULL(P.TradedMarketValue, 0) as float))

        FROM #T2 AS T
          INNER JOIN PortfolioHolding AS P
            ON T.lvl = @lvl - 1
            AND P.PortfolioCode = T.LinkedID
        JOIN Instrument I on I.Code = P.InstrumentCode;
    END


    SELECT 
    @root, InstrumentCode ,sum(tradedmarketvalue) As TradedMarketValue
    FROM 
    (SELECT * FROM #T1 UNION ALL SELECT * FROM #T2) AS U
    WHERE LinkedID is null
    group by InstrumentCode

    DROP TABLE #T1, #T2;

As you can see from the query the PortfolioCode (ParentID) holds InstrumentCodes (ChildId), and the LinkedId matches an InstrumentCode to its relevant PortfolioCode.

The TradedMarketValue is summed where LinkedID is null to get all the leaf nodes.

As previously mentioned some of the TradedMarketValues are coming in at half of what they should be. Your help will be greatly appreciated.

1

There are 1 answers

0
Mark Atkinson On BEST ANSWER

Well well well... After spending a good 2 hours creating sample data to post with my question and running through it the answer presented itself!

Please see sample data below:

Instrument Table:

InstrumentCode  LinkedId
Inst1           NULL
Inst2           Port2
Inst3           NULL
Inst4           Port4
Inst5           Port5
Inst6           NULL
Inst7           NULL
Inst8           NULL
Inst9           NULL
Inst10          NULL
Inst11          NULL

Portfolio Table:

PortfolioCode   InstrumentCode  TradedMarketValue
Port1           Inst1           150.00
Port1           Inst2           60.00
Port1           Inst3           45.00
Port2           Inst1           75.00
Port2           Inst4           95.00
Port2           Inst5           100.00
Port3           Inst2           110.00
Port3           Inst6           95.00
Port4           Inst8           145.00
Port4           Inst9           100.00
Port4           Inst7           125.00
Port5           Inst8           150.00
Port5           Inst11          175.00
Port5           Inst10          120.00

So as an example, if we wanted to explode Portfolio 1 Port1, this would end up 3 portfolios deep, first into Port2 then into a 3rd level of Port4and Port5

See diagram:

Working Portfolio Linkage Structure

This whole system works and the query provides the correct results. The problem comes in when two different instruments in one portfolio point to the same LinkedId or 'PortfolioCode'.

See diagram:

enter image description here

What then occurs is this section of the code:

--calculate ratio of holding and apply to holding
          cast(cast(cast(ISNULL(T.TradedMarketValue, 0) as float) / 
          NULLIF(sum(cast(ISNULL(P.TradedMarketValue, 0) as float)) over (partition by P.PortfolioCode) , 0)  as float) 
          * ISNULL(P.TradedMarketValue, 0) as float)

that calculates the ratio of the TradedMarketValue (TMV) of the parent portfolio to the total value of the child portfolio is summing twice, causing the denominator to double.

To resolve this issue, one can change the above section of code to

  --calculate ratio of holding and apply to holding
          cast(cast(cast(ISNULL(T.TradedMarketValue, 0) as float) / 
          NULLIF(sum(cast(ISNULL(P.TradedMarketValue, 0) as float)) over (partition by T.InstrumentCode, P.PortfolioCode) , 0)  as float) 
          * ISNULL(P.TradedMarketValue, 0) as float)

adding an extra variable to partition by (group by) T.InstrumentCode which then stops the above occurring.

I hope this helps someone down the line.