A picture of some notes I took during the foobar challenges.

Google foobar, part iii: where's my dad?

written on February 5, 2023

This post is the third installment of the foobar challenge series. You can read the first one to learn more about foobar and see what sort of intro problems the challenge presents if you don't want any real spoilers.

WARNING: Obviously, MAJOR spoilers ahead, for the problems themselves and possible solutions or hints. Even if you're attempting one of these and feel completely stuck, give it another go before reading the rest. Keep reading only if you don't mind getting spoiled!

Problem statement

This one involves my favorite data structure group, trees! Here's how it goes.

Imagine you are given a perfect binary tree (i.e., a tree in which each node has exactly two children and all leaf nodes are on the same depth) that is annotated in post-order traversal. Big fancy words, but at the end of the day it's just something that looks like this:

Example of a perfect binary tree annotated in post-order traversal. — Post-order traversal simply means: visit left subtree, visit right subtree, visit parent. And you simply recurse until subtree means a single node, at which point you annotate the node.

If you don't quite get post-order traversal yet, you can read the annex section where I explain it step by step.

The problem statement is:

Given a number $h$ representing the height of the tree, and an array of numbers $q$, for every number $p$ in $q$, return its parent if $p$ is in the tree, and return $-1$ if $p$ is not in the tree or if it has no parent (i.e., if $p$ is the root node).

So, for the tree in the picture, that is for $h = 3$, we can say that for $q = {2, 6, 7, 200}$ we should get ${3, 7, -1, -1}$.

@@@@@@@@@@

@@@@@@@@@@

SECOND WARNING: If you want to solve this yourself, STOP HERE! Anything past this point is a spoiler of some part of figuring out the solution, or of the solution itself. You've been warned!

@@@@@@@@@@

@@@@@@@@@@

First ideas

So, I actually was dumb enough to try to solve this by constructing the tree. If you've done enough challenges like these (or if you've read the previous posts), if they are well designed then they give you test cases big enough that solving by construction (which most times is the same as solving by enumeration) is impossible. That's the case here too, as the maximum height was something like 30, which means a tree with a whopping $1{,}073{,}741{,}823$ nodes!

But we're getting ahead of ourselves. First of all, let's recall how to calculate the number of nodes of a perfect binary tree given its height. Since we have exactly two children for every node, the number of nodes increases exponentially as the height increases linearly. We have $1$ node at height $h$ (the root), then $2$ at height $h - 1$, then $4$ at height $h - 2$ and so on. If we express that as a sum of powers of $2$, then we have $2^0 + 2^1 + 2^2 = 7$ nodes. Otherwise put: $$ n(h) = \sum_{i = 0}^{h - 1} 2^i $$

But, there's a summation identity for finite geometric series (when $x \neq 1$): $$ \sum_{i = 0}^{n - 1} x^i = \frac{x^n - 1}{x - 1} $$

Applied here, that gives us: $$ n(h) = \frac{2^h - 1}{2 - 1} = 2^h - 1 $$

Great. Why is that useful? Well, we'll want to know what the number of nodes is, because that's how we know what number the root node is annotated with. We can get some solutions for cheap, without constructing the tree:

solution.py


def solution(h, q):
    res = []
    nb_of_nodes = 2**h - 1

    for p in q:
        if p >= nb_of_nodes:
            res.append(-1)

    return res

The reasoning here is pretty simple: since the root node is annotated last, it will have the biggest number out of all the nodes, and its number will always be the number of nodes (since we start annotating at 1). So, if p is equal to the number of nodes then its parent is the parent of the root (that has no parent, so -1), and if it is greater than the number of nodes then clearly it is not in the tree (so -1 again).

Hm. How do we go further than this with no tree to traverse?

My first thought at this point was to virtually traverse the tree, just by tracking where we are at any point in time. In fact, I got super excited at this point, because I noticed a pattern. Starting at the root, you can go down the right subtree by subtracting 1 (so $7 \rightarrow 6 \rightarrow 5$), and down the left subtree by dividing by 2 (actually $\lfloor \frac{x}{2} \rfloor$, so $7 \rightarrow 3 \rightarrow 1$). Eureka, let's code it up, that's it! Well, not so fast cowboy.

What if you go right and then left, in the same tree? Well, you go right, so you subtract $1$, so you get to $6$, everything's alright so far. And then you go left, so divide by $2$, so $\lfloor \frac{6}{2} \rfloor = 3$... whoops, we were supposed to land on $4$. We need something more nuanced than this.

Finding ways to go left

We have a solid way of going right in every possible subtree: just subtract $1$. This will be true no matter what because of the very definition of the post-order traversal (the thing that's annotated immediately after the right subtree is the parent, so parent = right_subtree + 1, or right_subtree = parent - 1). What we need is a way to go left.

Well... there is one type of scenario where we do always know how to go left: the scenario where we're at the parent of two leaf nodes. Since the application of the post-order traversal algorithm is trivial in this case, we know that the left node will be annotated right before the right node. So, if we know the parent $m$, then to go right we subtract $1$ (as always), and to go left we subtract $2$. Now we're getting somewhere.

Now, if we go one level up from where we are (parent of leaf nodes), we now also know how to go left from here: since the right subtree was annotated right after the left one (again, by definition of the post-order traversal), then the last element of the left subtree (the left child, in this case) should be equal to the first element of the right subtree, minus one. And the first element of the right subtree is already known to us: it's the left leaf node we found before.

I know this might be hard to picture, so let's give a concrete example with our good old 7-node tree. Let's start at the bottom right (I'll explain why later), and assume we know that the value of the parent of the leaves is $6$. Again, as we said, the two children are trivial to compute because they're leaves: $6 - 1 = 5$ for the right one, and $6 - 2 = 4$ for the left one. Okay, so now that we know these two, we can go up a level, to the root node ($7$). We know that its right child is $6$ (as I said, assume that we know from the start), but how do we compute its left child? Well, again, it's the node that gets annotated right before the first node of the right subtree. But the first node of the right subtree to be annotated is the left leaf child, $4$! So the left node from the root ($7$) is $4 - 1 = 3$. Finding $3$'s children is trivial as they are leaves.

This is good, but we don't have a full solution yet: we assumed we knew the parent of the bottom right leaves. But why do we need to start there anyway?

Think about it: the only thing we know at the start is the value of the root, because it's the very last node to get annotated in the tree (technically we also know the values of a couple other nodes from the very start, for example the leftmost bottom leaf will always be $1$, but that's not really useful). From the root, we know one way to get to the bottom (or at least to a parent of leaf nodes): go right, since that's always something we can do by subtracting $1$! So we go right, keeping track of the height, and once we hit the parent of the leaf nodes, we use the algorithm I described before to slowly climb up the tree and uncover its left side.

If you've forgotten the goal of the challenge, I don't blame you. We need to find the parent of a given value $p$, so basically we just need to check if any of the children we come across are equal to $p$, in which case it's trivial to find the parent. We don't even have to worry about if $p$ is present or not in the tree, as we already covered those cases with our return -1 guard.

Solution

You can implement this algorithm either recursively or iteratively. My pick was recursively because I find that more intuitive, so I ended up with something like this:

solution.py


def find_parent(root, height, p, parent=-1):
    right_child = root - 1
    left_child = None
    new_left_child = None

    # Since all the leaf nodes are at a height of 1, all the parents of leaf nodes are
    # at a height of 2.
    if height == 2:
        left_child = root - 2
        new_left_child = left_child - 1

    # Check if we found the answer.
    if p == root:
        return (parent, new_left_child)
    elif p in (right_child, left_child):
        return (root, new_left_child)

    result = -1

    # We only need to explore the right and left subtrees when they are not trivial
    # (i.e., when they are not leaf nodes).
    if height > 2:
        # Explore right subtree.
        result, new_left_child = find_parent(
            root=right_child,
            height=height - 1,
            p=p,
            parent=root,
        )

        if result != -1:
            return (result, new_left_child)

        # Explore left subtree.
        result, new_left_child = find_parent(
            root=new_left_child,
            height=height - 1,
            p=p,
            parent=root,
        )

    return (result, new_left_child)


def solution(h, q):
    res = []
    nb_of_nodes = 2**h - 1

    for p in q:
        if p >= nb_of_nodes:
            res.append(-1)
        else:
            res.append(find_parent(root=nb_of_nodes, height=h, p=p, parent=-1)[0])

    return res

Let's see how it works. First of all, the solution() function simply calls another function to compute the non-trivial cases (i.e., the cases where the number is in the tree).

The function that does the actual computing is find_parent. Since it's recursive, it needs to know the current root node, the current height, and the current parent. Of course, it also needs to know $p$ (the number whose parent we're looking for) on every level of recursion, to be able to know when it has found its parent.

At the very beginning of the function, we compute some stuff we already know:

We know the right child (no matter what the current root this, it will always be root - 1);
We don't necessarily know the left child (in the general case);
We don't necessarily have any hints to give for what the parent's left child is (the new_left_child variable basically implements what we described before, when we discussed figuring out how to go left in the nontrivial case).

Now, if we are at the level of the parents of leaf nodes (which we identify using the height on line 8), then we can also trivially calculate the left child's value, and most importantly we can calculate the value of the new_left_child, which will help unlock the left child for the parent of the current root, as we discussed before.

On lines 13-16, we simply check if we've found the solution, in which case we're done. The reason we need two checks is because we might find the solution on the way down or on the way up... sort of. If the number we're looking for is one of the children nodes, then we're done: the answer is the current root. However, in many cases as we go down the tree for the first time, we will not know the left child, so we might miss this check going down. This is why we need to check going up (in reality, we're checking while going down on the nontrivial left branches), if the number we're looking for is actually equal to the current root (if it is, we have the answer as we also know the current parent).

For parents of leaf nodes, we're done going down; there's no point in exploring the leaf nodes themselves (we've already checked if the number we're looking for was in either of them), so if we've gotten to this point, on line 18, then we didn't find an answer in this subtree, so we skip to line 42 and return. Of course, besides from the answer we also return the new_left_child hint to help explore the left subtree of the parent.

For every other node, we're not done yet: we need to explore the right and left subtrees. To do that, we simply recurse: first on the right subtree, which will eventually give us the value of the left subtree, and then on the left subtree itself (now that we know its value). Of course, if we find the answer in the right subtree, there's no reason to waste time exploring the left one; we're done! This is what the check on line 31 is for.

At the end, we return whatever result and new_left_child hint we found. Let's test it out:

$ shell


$ python2
Python 2.7.18 (default, Jul  1 2022, 10:30:50)
[GCC 11.2.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.


>>> import solution
>>> solution.solution(3, [1, 6, 4, 3, 7, 200])
[3, 7, 6, 7, -1, -1]

It seems to be working correctly! That's good, but let's make sure it can handle bigger trees. Let's see: since we're basically going down the right side of the tree, then slowly explore the left side of the right subtree, then go down the right side of the left subtree and so on, basically the last trivial subtree to get explored will always be the first one to get annotated (node $3$, with children $1$ and $2$). So, if we ask, say, what the parent of $1$ is, it should always be $3$ and it should always take the longest time to compute.

>>> Python 2 REPL


>>> solution.solution(3, [1])
[3]
>>> solution.solution(4, [1])
[3]
>>> solution.solution(10, [1])
[3]
>>> solution.solution(20, [1])
[3]
>>> solution.solution(25, [1])
[3]

Wew, that last one took something like 5-6 seconds on my computer. That's not great; I'm pretty sure there is a timeout (so that you can't submit a solution that takes ages), and also even if it took like one second for a height of $30$, an input of (30, [1, 1, 1, 1, 1]) would take 5 seconds. We can of course memorize the result so that we don't have to compute it over and over again, but still the current performance is not great.

What can we do to improve that? Well, I'm not sure if there are many ways to optimize the algorithm of find_parent(), but we can use the initial observation we made (the fact that we can go left by doing $\lfloor \frac{\mathrm{root}}{2} \rfloor$ from the root of the tree if we never go right) to not spend time in subtrees that cannot contain the solution. Let's take the tree of height $5$ for example:

Perfect binary tree of height 5, annotated in post-order traversal.

Let's say we're trying to find $13$'s parent. We start at the root, $31$, but here's an interesting property: everything that's less than the left child ($15$) is not going to be in the right subtree (by definition, the right subtree will start at $15 + 1 = 16$), and inversely, everything that's bigger than the left child is not going to be on the left subtree (as the greatest value of the left subtree is its root).

In the case of $13$ itself, since $13 \lt 15$, we know it's in the left subtree, so we can ignore the entire right subtree. We only have to look in the left one, which reduces the problem to this:

The previous tree of height 5, cut in half. — We've divided the size of the search space by a factor of $2$!

Now, you can notice again that $13 \gt 7$, so we don't actually need the left subtree here; that means we can ignore it and only focus on the right subtree. We could technically delete the left subtree again, but there's no point because our algorithm explores the right subtree first, so it won't ever touch the left subtree anyway.

Let's put everything together: basically, while the value we're looking for is less than or equal to the value of the left child of the absolute root of the tree, we reduce the tree by working on the tree of height $h - 1$. We can keep going until the value of the left child of the root is no longer greater than or equal to the value we're given.

Putting that in code, we have:

solution.py


def solution(h, q):
    res = []
    nb_of_nodes = 2**h - 1

    for p in q:
        if p >= nb_of_nodes:
            res.append(-1)
        else:
            # If possible, cut down the right part of the tree, to reduce the search
            # space.
            starting_height = h
            while p < (2**starting_height - 1) // 2:
                starting_height -= 1

            res.append(
                find_parent(
                    root=2**starting_height - 1,
                    height=starting_height,
                    p=p,
                    parent=-1,
                )[0]
            )

    return res

Let's test it out:

>>> Python 2 REPL


>>> import solution
>>> solution.solution(3, [1, 6, 4, 3, 7, 200])
[3, 7, 6, 7, -1, -1]
>>> solution.solution(30, [1, 6, 4, 3, 7, 200])
[3, 7, 6, 7, 15, 204]

And that's it! Now we compute fast for any height, as we don't necessarily deal with the full height of the tree.

Annex

Understanding post-order traversal

Let's annotate the first example by hand to understand how it works.

The algorithm is simple:

Annotate left subtree;
Annotate right subtree;
Annotate parent.

Let's start with an empty tree with 7 nodes, like in the example:


        _
     ---|---
    /       \
   _         _
  / \       / \
 _   _     _   _

Each underscore (_) represents a node. Of course, we start our algorithm on the root of the tree, > showing our current position in the tree:


       >_
     ---|---
    /       \
   _         _
  / \       / \
 _   _     _   _

Step (1) is to iterate on the left subtree. So we go down the left subtree:


        _
     ---|---
    /       \
  >_         _
  / \       / \
 _   _     _   _

Again, we apply the steps in order. Step (1) is to iterate on the left subtree, so we go down the left subtree once more:


        _
     ---|---
    /       \
   _         _
  / \       / \
>_   _     _   _

Now, since there is no left subtree to go down to, we go to step (2). But there is no right subtree either. So we go to step (3), and we annotate:


        _
     ---|---
    /       \
   _         _
  / \       / \
>1   _     _   _

Returning to the previous level, we now follow step (2), going down the right subtree. There, we apply the same sequence as before: steps (1) and (2) do not apply, so we skip to step (3) and annotate:


        _
     ---|---
    /       \
   _         _
  / \       / \
 1  >2     _   _

Returning to the previous level again, we have now completed steps (1) and (2), so we must perform step (3), which is to annotate the parent of the subtrees we just visited:


        _
     ---|---
    /       \
  >3         _
  / \       / \
 1   2     _   _

Since we're done with the left subtree of the root node, we move on to the right subtree in the exact same fashion. Left sub-subtree:


        _
     ---|---
    /       \
   3         _
  / \       / \
 1   2    >4   _

Right sub-subtree:


        _
     ---|---
    /       \
   3         _
  / \       / \
 1   2     4  >5

Sub-subtree parent:


        _
     ---|---
    /       \
   3        >6
  / \       / \
 1   2     4   5

And now that the right subtree of the root node is done, we annotate the root node itself:


       >7
     ---|---
    /       \
   3         6
  / \       / \
 1   2     4   5

And we're done! That's how post-order traversal works.