Understanding Depth First Traversal

517 views Asked by At

With the given representation of BST,

/****************tree.h ***********************/
typedef void* (*ProcessItem)(void *, void *);

/**************** BSTNode.c ************************/
typedef struct BinarySearchTreeNode{
  void *key;
  void *value;
  bool visit; // For traversals
  struct BinarySearchTreeNode *parent;
  struct BinarySearchTreeNode *left;
  struct BinarySearchTreeNode *right;
}Node;

typedef struct Tree{ //BST
  Node *root;
  int size;
}Tree;
static void visitFunc(ProcessItem, Node *);

As part of learning, below is the detailed algorithm(in C comments) and corresponding code written for three DFS traversals.


Pre-order traversal

/*
  Pre-order traversal of BST and Binary tree: Node -> Left -> Right
  0) Assuming node pointer is pointing to root node, which is, not NULL.

  1) Visit node, if not visited

  2) From that node,
     If left sub-tree exists, and
     If root node of that left sub-tree never visted,
     then traverse to root node of left sub-tree.
     Go to step 1 and continue until a node (that has visted root of left sub-tree) or (that has no left sub-tree exists)

  3) From that node,
     If right sub-tree exists, and
     It root node of right sub-tree not visited,
     then traverse to root node of right sub-tree.
     Go to step 1 and continue, until a node (that has visited root of right sub-tree) or (that has no right sub-tree exists).

  4) We reach this step, because,
     Either left/right sub-tree exist but visited
           or
     left and right sub trees does not exist

     Reach parent node and go to step-1 for applying pre-order traversal on that node

*/
static void preOrderTraverse(Node *n, ProcessItem action){

  if (n == NULL){
    return; // if tree is empty, then go back
  }

  Node *root = n;                              // |Step-0

  while(root !=  NULL){

    if(root->visit == false){
      visitFunc(action, root);                 // |Step-1: Visit node, if not visited
    }

    if(root->left != NULL &&                   // |Step-2: if left sub-tree exists, and
       (root->left->visit == false) ){         // |Step-2: If root node of that left sub-tree's never visted,
      root = root->left;                       // |Step-2: then traverse to root node of left sub-tree.
      continue;                                // |Step-2: Go-to step 1
    }

    if(root->right != NULL &&                  // |Step-3: if right sub-tree exists,
       (root->right->visit == false) ){        // |Step-3: If root node of right sub-tree not visited,
      root= root->right;                       // |Step-3: then traverse to root node of right sub-tree.
      continue;                                // |Step-3: Go-to step 1
    }

    /*
     If the instruction pointer points to below insruction, then that means,
       Either left/right sub-tree exist but visited
           or
       left and right sub trees are empty

     What can be done now?
     Go to parent's tree root node and apply preOrderTraversal
    */
    root = root->parent;                       // |Step-4 Reach parent node.

  }
}

Post order traversal

/*
  Algorithm: Post-order traversal of BST and Binary tree: Left -> Right -> Node

  0) Assuming node pointer is pointing to root node, which is, not NULL.

  1) From that node,
     If left sub-tree exists, and
     If root node of that left sub-tree never visted,
     then traverse to root node of left sub-tree.
     Repeat step 1 until a node (that has visted root of left sub-tree) or (that has no left sub-tree exists)

  2) From that node,
     If right sub-tree exists, and
     If root node of that right sub-tree never visted,
     then traverse to root node of right sub-tree.
     Go to step 1 and continue until a node (that has visited root of right sub-tree) or (that has no right sub-tree exists)

  3) Visit that node, if not visited

  4) We reach this step, because,
     Either left/right sub-tree exist but all visited
           or
     left and right sub trees does not exist

     Reach parent node and go to step-1 for applying post-order traversal on that node

*/
static void postOrderTraverse(Node *n, ProcessItem action){

  if (n == NULL){
    return; // if tree is empty, then go back
  }

  Node *root = n;                              // |Step-0

  while(root !=NULL){

    while(root->left != NULL &&                // |Step-1: If left sub-tree exists, and
          (root->left->visit == false)){       // |Step-1: If root node of that left sub-tree never visted,
      root=root->left;                         // |Step-1: then traverse to root node of left sub-tree.
    }

    if(root->right != NULL &&                  // |Step-2: If right sub-tree exists, and
       (root->right->visit == false)){         // |Step-2: If root node of that right sub-tree never visted,
      root=root->right;                        // |Step-2: then traverse to root node of right sub-tree.
      continue;
    }

    visitFunc(action, root);                   // |Step-3: Visit node, if not visited

    /*
      If the instruction pointer points to below insruction, then that means,
       Either left/right sub-tree exist but all visited
           or
       left and right sub trees are empty

     What can be done now?
     Go to parent's tree root node and apply postOrderTraversal
    */

    root = root->parent;                       // |Step-4: Reach parent node.
  }
}

In-order traversal

 /*
   Algorithm: In-order traversal of BST and Binary tree: Left -> Node -> Right

   0) Assuming node pointer is pointing to root node, which is, not NULL.

   1) From that node,
      If left sub-tree exists, and
      If root node of left subtree is never visted,
      then traverse to root node of left sub-tree.
      Repeat step1 until a node (that has visited root of left sub-tree) or (has no left sub-tree exists)

   2) Visit that node, if not visited.

   3) From that node,
      If right sub-tree exists,
      If root node of right sub-tree never visited,
      then traverse to root node of right sub-tree
      Goto step-1 and continue until a node (that has visited root of right sub-tree) or (has no right sub-tree exists).

   4) We reach this step, because,
      Either left/right sub-tree exists but visited
         or
      Either left/right sub-tree does not exist.

      What can be done now?
      Reach parent node and go to step-1 for applying In-order traversal on that node.

  */
static void inOrderTraverse(Node *n, ProcessItem action){

  if (n == NULL){
    return; // if tree is empty, then go back
  }

  Node *root = n;                                // |Step-0


  while(root != NULL){

    while(root->left != NULL &&                  // |Step-1: If left sub-tree exists, and
          (root->left->visit == false) ){        // |Step-1: If root node of left subtree is never visted,
      root = root->left;                         // |Step-1: then traverse to root node of right sub-tree
    }

    if(root->visit == false){
      visitFunc(action, root);                   // |Step-2: Visit node, if not visited.
    }

    if(root->right != NULL &&                    // |Step-3: If right sub-tree exists, and
       (root->right->visit == false) ){          // |Step-3: If root node of right sub-tree never visited,
      root = root ->right;                       // |Step-3: then traverse to root node of right sub-tree
      continue;                                  // |Step-3: Go to step 1
    }

    /*
      If instruction pointer reaches below instruction, then,
      Either left/right sub-tree exists but all visited
         or
      Either left/right sub-tree does not exist.
    */

    root = root->parent;                         // |Step-4: Reach parent node
  }
}

where visitFunc is defined in BSTNode.c and processItem comes from the user,

static void visitFunc(ProcessItem processItem, Node *node){

  if(node->visit == TRUE){
    return;
  }
  node->visit=true;
  processItem(node->key, node->value);
  return;
}

Background:

It is pretty obvious, that, above DFS algorithms are available online. But, as per my learning, the level of details required in these algorithms are missing. I would think that, these lack of details have been covered in the above algorithm(in C comments).

With the emphasis on algorithm mentioned(in C comments) and corresponding traversal code using parent pointer but not explicit stack,

My question:

1) Do you see logical flaw in the 4 step algorithm and code for all three traversals?

2) In the above code, Is condition check required before visiting a node?

Note: Beginner in traversal algorithms

2

There are 2 answers

2
M Oehm On

Apparently you want a binary-tree traversal function that isn't recursive and that doesn't use a stack. (You don't say that explicitly in the question, but your comments suggest that.)

Let's look at a postorder traversal. (This answer can easily be expanded to other depth-first traversals, but postorderis the easiest, so we'll just look at this.)

The recursive function is straightforward:

typedef struct Node Node;

struct Node {
    int key;
    Node *left;
    Node *right;
    Node *parent;
};

void postorder_rec(const Node *nd, void (*func)(const Node *nd))
{
    if (nd) {
        postorder_rec(nd->left, func);
        postorder_rec(nd->right, func);
        func(nd);
    };
}

This function is lightweight and will have a good recursion depth. You say that you wouldn't use the recursive version in production code, but in such code, yourconern would be to keep the tree balanced, so that you can have about 1 million nodes with a tree (and therefore recursion) depth of 20.

You have proposed to add a visited flag to your nodes. This approach isn't suitable, because atraversal will alter the state of the tree. You must reset this state before traversal, but to do so, you must traverse the tree. Ouch!

Your suggestion to treat the flag as either "visited" or "unvisited" depending of the state of the tree's root might be a workaround, but it will fail if for some reason you want to stop the traversal in the middle or if you add new nodes between traversals. This approach isn't reliable. Besides, you always need to carry around the extra flag for each node.

There is a solution, though: Your tree nodes have a link to their parent. You can keep a link to the previously visited node and then decide where to go next based on that:

  • Start with a previous node of null.
  • Never visit any child nodes that are null. Otherwise you will not be able to distinguish between null children and the parent of the root node, which is null, too.
  • When the previously visited node is the current node's parent:
        • if there is a left child, visit that next;
        • otherwise, if there is a right child, visit that next;
        • otherwise, call the function and go back to the parent.
  • When the previously visited node is the current node's left child:
        • if there is a right child, visit that next;
        • otherwise, call the function and go back to the parent.
  • When the previously visited node is the current node's right child:
        • call the function and go back to the parent.
  • Stop when the node is null – you have gone up to the root's parent.

These conditions can be simplified and the function looks like this:

void postorder_iter(const Node *nd, void (*func)(const Node *nd))
{
    const Node *prev = NULL;

    while (nd) {
        const Node *curr = nd;

        if (prev == nd->parent && nd->left) {
            nd = nd->left;
        } else if (prev != nd->right && nd->right) {
            nd = nd->right;            
        } else {
            func(nd);
            nd = nd->parent;
        }

        prev = curr;
    }
}

As is, there is no real benefit to the iterative function – the recursive function is straightforward and doesn't need a link to its parent in the nodes.

But this function can be used as a base to an iterator that keeps the state of the traversal:

typedef struct PostIter PostIter;

struct PostIter {
    const Node *nd;
    const Node *prev;
    int count;
};

const Node *postorder_next(PostIter *iter)
{
    if (iter->nd == NULL) return NULL;

    if (iter->count) {
        (iter->prev) = iter->nd;
        iter->nd = iter->nd->parent;
    }

    while (iter->nd) {
        const Node *prev = iter->prev;
        const Node *nd = iter->nd;

        if (prev == nd->parent && nd->left) {
            iter->nd = nd->left;
        } else if (prev != nd->right && nd->right) {
            iter->nd = nd->right;            
        } else {
            iter->count++;
            return nd;
        }

        iter->prev = nd;
    }

    return NULL;
}

Here, the state of the traversal, which was described by the local variables in the iterative version, is bundled into an iterator struct. Instead of calling the function, we now return from the function. The count is a bit of a kludge, but it is needed so that the function knows whether the iterator has already been advanced or not: we don't go to the parent in the first call.

You can now use the iterator like this:

PostIter iter = {head};

while (postorder_next(&iter)) {
    printf("%d -> ", iter.nd->key);
}
puts("nil");

Instead of providing a callback function, which must be defined somewhere else, you can now put the visiting code into the loop body, which can, of course, access variables declared outside the loop. The travesal code has become more complex, but the calling code is now much simpler.

Writing iterative code for pre-order and in-ordertraversal is left as an exercise for the reader. :)

17
Rishikesh Raje On

1) Do you see logical flaw in the 4 step algorithm and code for all three traversals?

I checked the pre-order traversal. It looks OK.

However, as mentioned by M Ohem, the visit field might be set to false initially, but after any traversal it will be set as true. So, you will not be able to traverse the tree a second time.

Tree traversals are typically done using recursive functions, or if you want to use iterative functions, then push and pop to stack is required.

2) In the above code, Is condition check required before visiting a node?

Yes, you must check if a node is visited, before visiting it again, or the traversal will be wrong.

Edit - Further explanation on this point. Let us say we have a tree

      a
    b   c
  d   e
f  g

And the condition, check is not there for pre-order traversal.

You visit, first a, then b, then d, then f, each time reaching the first if condition and then continuing. At root=f, you reach the next if condition, it is false, and you reach the final condition root = root->parent.

Now, root = d. Since the condition check is not there, d is visited again, which is wrong, and then you go to the second if condition and visit f. again root = root->parent and root = d, and d is printed again.

So, you see that the traversal is wrong.