This problem is from the book Cracking the Coding Interview. I have trouble understanding the space complexity of the solution.
Problem:
You have two very large binary trees: T1, with millions of nodes, and T2, with hundreds of nodes. Create an algorithm to decide if T2 is a subtree of T1.
Solution (in Java):
public static boolean containsTree(TreeNode t1, TreeNode t2) {
if (t2 == null)
return true; // The empty tree is a subtree of every tree.
else
return subTree(t1, t2);
}
/* Checks if the binary tree rooted at r1 contains the binary tree
* rooted at r2 as a subtree somewhere within it.
*/
public static boolean subTree(TreeNode r1, TreeNode r2) {
if (r1 == null)
return false; // big tree empty & subtree still not found.
if (r1.data == r2.data) {
if (matchTree(r1,r2)) return true;
}
return (subTree(r1.left, r2) || subTree(r1.right, r2));
}
/* Checks if the binary tree rooted at r1 contains the
* binary tree rooted at r2 as a subtree starting at r1.
*/
public static boolean matchTree(TreeNode r1, TreeNode r2) {
if (r2 == null && r1 == null)
return true; // nothing left in the subtree
if (r1 == null || r2 == null)
return false; // big tree empty & subtree still not found
if (r1.data != r2.data)
return false; // data doesn’t match
return (matchTree(r1.left, r2.left) &&
matchTree(r1.right, r2.right));
}
The book says the space complexity of this solution is O(log(n) +log(m)) where m is the number of nodes in T1 (larger tree) and n number of nodes in T2.
To me, it appears that the solution has O(log(m)*log(n)) space complexity since "subtree" function has log(n) recursive calls and each recursive call executes "matchTree" function which triggers log(m) recursive calls.
Why is this solution O(log(n) + log(m)) complexity?
Since we're not creating any objects on the heap, the space complexity is the size of the stack. So the question is not how many total calls occur, but how big the stack can grow.
containsTree()
can only callsubTree()
,subTree()
can call itself ormatchTree()
, andmatchTree()
can only call itself. So at any point wherematchTree()
has been called, the stack looks like this:This is why you don't multiply the space complexities here: while each call to
subTree()
can callmatchTree()
, those calls tomatchTree()
leave the stack beforesubTree()
continues recursing.Analysis along the lines of the "correct answer"
If the question doesn't specify if the trees are balanced, then a real worst-case analysis would assume they might not be. However, you and the book are assuming they are. We can set aside that question for later by saying the depth of T1 is c, and the depth of T2 is d. c is O(log(m)) if T1 is balanced, and O(m) otherwise. Same thing for T2's d.
Worst case for
matchTree()
is O(d), because the furthest it could recurse would be the height of T2.Worst case for
subTree()
is O(c) for its recursion, because the furthest it could recurse would be the height of T1, plus the cost of callingmatchTree()
, for a total of O(c+d).And
containsTree()
just adds a constant on top of callingsubTree()
, so that doesn't change the space complexity.So if both T1 and T2 are balanced, by replacing c and d you can see that O(log(m)+log(n)) seems reasonable.
Problems with the "correct answer"
Like I said before, it's not right to assume binary trees are balanced until you know for a fact that they are. So a better answer might be O(m+n).
But wait! The question states that the size of T2 is less than the size of T1. That means that n is O(m), and log(n) is O(log(m)). So why have we been wasting time worrying about n?
If the trees are balanced, the space complexity is simply O(log(m)). In the general case where you don't know what's balanced or not, the real answer should be O(m), the size of the larger tree.