Table self join issue - Need a desired output

84 views Asked by At

I have a table with below structure:

OLD_ID    New_ID    Month_Number
------    ------    ------------
          A         1
A         B         2
B         C         3
C         D         4

I need an output out of this table for an ID 'A' as:

New_ID      Oldest_Id     Latest_ID
------      ---------     ---------
A            A             D
B            A             D
C            A             D
D            A             D

The oldest_Id and Latest_id can be found using month_number.

Please advice, i am new to SQL.

1

There are 1 answers

0
Adam Frederick Wiseman On

This answer assumes Microsoft SQL Server, but with tweaks to the syntax this could work for any DB type provided that it features temporary tables, row counts and user-defined variables (the latter simply for a safety cutoff lest you have a loop). This answer also assumes that the source table is #ExampleTable, the [New_ID] column contains all the IDs, and that older/newer IDs should only be regarded if the respective [Month_Number] values are appropriate, e.g. LTE the "newer" ID's [Month_Number] for older IDs.

Basically what I would envision is to make two temporary tables: the first to list each [ID] along with the first [Old_ID] for the [ID], and the second the same albeit for the last [New_ID]:

-- Create temporary tables and declare cutoff variable
DECLARE @cutoff TINYINT;
SET @cutoff = 0;

CREATE TABLE #ID1 (
    [ID] CHAR(1) NOT NULL
    ,[Oldest_ID] CHAR(1) NOT NULL
    ,[Month_Number] TINYINT NOT NULL
    );
CREATE TABLE #ID2 (
    [ID] CHAR(1) NOT NULL
    ,[Latest_ID] CHAR(1) NOT NULL
    ,[Month_Number] TINYINT NOT NULL
    );

-- Initialise temporary tables
INSERT INTO #ID1 ([ID],[Oldest_ID],[Month_Number])
SELECT [New_ID],[New_ID],[Month_Number]
FROM #ExampleTable;

INSERT INTO #ID2 ([ID],[Latest_ID],[Month_Number])
SELECT [ID],[ID],[Month_Number]
FROM #ID1;

-- Fetch oldest IDs first
WHILE(@@ROWCOUNT != 0 AND @cutoff <= 100)
BEGIN

    SET @cutoff = @cutoff + 1;
    UPDATE oldies
    SET oldies.[Oldest_ID] = d.[OLD_ID]
        ,oldies.[Month_Number] = d.[Month_Number]
    FROM #ID1 AS oldies
    JOIN #ExampleTable d
        ON oldies.[Oldest_ID] = d.[New_ID]
    WHERE d.[Month_Number] <= oldies.[Month_Number]
        AND d.[OLD_ID] IS NOT NULL;

END;

-- Now the... "newies"
WHILE(@@ROWCOUNT != 0 AND @cutoff <= 100)
BEGIN

    SET @cutoff = @cutoff + 1;
    UPDATE newies
    SET newies.[Latest_ID] = d.[New_ID]
        ,newies.[Month_Number] = d.[Month_Number]
    FROM #ID2 AS newies
    JOIN #ExampleTable d
        ON newies.[Latest_ID] = d.[OLD_ID]
    WHERE d.[Month_Number] >= newies.[Month_Number];

END;

-- Finally, do as you wish with #ID1 and #ID2
SELECT d.[New_ID] AS [ID]
    ,f1.[Oldest_ID]
    ,f2.[Latest_ID]
FROM #ExampleTable AS d
JOIN #ID1 AS f1 ON d.[New_ID] = f1.[ID]
JOIN #ID2 AS f2 ON d.[New_ID] = f2.[ID]
;

The records in #ID1 and #ID2 are setup to reference themselves to begin with. Then, for each pass in the loops, the IDs' values are updated with the next step if available. When there is nothing else to do, the iterations cease.

Here is what happens in each pass:

  • Pass #1:   * Old: B -> A; C -> B; D -> C
    • New: A -> B; B -> C; C -> D
  • Pass #2:
    • Old: C -> A; D -> B
    • New: A -> C; B -> D
  • Pass #3:
    • Old: D -> A
    • New: A -> D
  • Pass #4:
    • (no more actions required)

The @cutoff variable is simply to prevent this from looping forever should ID 'A' reference old ID 'B' and ID 'B' reference old ID 'A', to give an example.

This isn't a terribly efficient method, and you could probably do better with some practice: for example, by combining both Oldie/Newie actions in one, but I'll leave that to you. :)