Sort and filter dependencies

344 views Asked by At

I'm optimising the SQL query of a reporting system where user can choose columns but tables are fixed:

-- Dynamically generated
SELECT t1.c4, t4.c8
-- Static
FROM t1
INNER JOIN t2 ON t1.id=t2.t1_id
INNER JOIN t3 ON t1.id=t3.t1_id
INNER JOIN t4 ON t2.id=t4.t2_id
INNER JOIN t5 ON t4.id=t5.t4_id
/*...*/;

I want to remove unnecessary tables and make the entire query dynamically generated:

-- Dynamically generated
SELECT t1.c4, t4.c8
FROM t1
INNER JOIN t2 ON t1.id=t2.t1_id
INNER JOIN t4 ON t2.id=t4.t12_id;

In order to figure out dependencies, I searched in Packagist and found marcj/topsort and topological sorting, which works as advertised:

$sorter = new StringSort();
$sorter->add('t1');
$sorter->add('t2', array('t1'));
$sorter->add('t3', array('t1'));
$sorter->add('t4', array('t2'));
$sorter->add('t5', array('t4'));
print_r($sorter->sort());
Array
(
    [0] => t1
    [1] => t2
    [2] => t3
    [3] => t4
    [4] => t5
)

But what I really need is a method that takes first level tables actually needed and figures out the minimum table set that complies with dependencies, as in:

// Sample interface and expected output
$query->getRequiredTables(array('t1', 't4'));
Array
(
    [0] => t1
    [1] => t2
    [2] => t4
)

I can't just feed ->add() with the only first level tables because I'll get ElementNotFoundException.

Here's where I'm lost. Is even topological sorting the right tool?

1

There are 1 answers

3
Hypuk On

Build dependency tree

First, build dependency tree for your tables. For example, in your case it will be:

          1
         / \
        2   3
        |
        4
        |
        5

Choose one of the nodes as root

Choose any of the tables from your required tables list as root. For example, if you need tables 2 and 3, you can choose 2 as a root and get the following tree:

          2
         / \
        4   1
        |   |
        5   3

Traverse tree and save result

Now traverse the tree from the root and when you hit any of the required tables (3 in our case), add all tables from the stack to the result. For example, when you are in node 3, your stack is [2, 1, 3], you add all those nodes to your result.

Another example

Say, with the same dependency tree user requested tables 1, 3 and 4. In this case you leave 1 as a root and when you get to node 4 your stack is [1, 2, 4] and when you hit node 3 your stack is [1, 3]. Result is [1, 2, 3, 4].

Sample code

Sample code for traversing the tree (not in any particular language, just as an example).

function dfs(node) {
   bool include = requiredTables.contains(node)
   for each child of node {
     if dfs(child) {
       include = true;
     }
   }
   if (include) result.add(node);
   return include;
}