Google foobar, part iii: where's my dad?
written on February 5, 2023
categories: challenges
This post is the third installment of the foobar challenge series. You can read the first one to learn more about foobar and see what sort of intro problems the challenge presents if you don't want any real spoilers.
WARNING: Obviously, MAJOR spoilers ahead, for the problems themselves and possible solutions or hints. Even if you're attempting one of these and feel completely stuck, give it another go before reading the rest. Keep reading only if you don't mind getting spoiled!
Problem statement
This one involves my favorite data structure group, trees! Here's how it goes.
Imagine you are given a perfect binary tree (i.e. a tree in which each node has exactly two children and all leaf nodes are on the same depth) that is annotated in post-order traversal. Big fancy words, but at the end of the day it's just something that looks like this:
If you don't quite get post-order traversal yet, you can jump to the next section where I explain it step by step.
The problem statement is:
Given a number
h
representing the height of the tree, and an array of numbersq
, for every numberp
inq
, return its parent ifp
is in the tree, and return-1
ifp
is not in the tree or if it has no parent (i.e. ifp
is the root node).
So for the tree in the picture, that is for h = 3
, we can say that for
q = [2, 6, 7, 200]
we should get [3, 7, -1, -1]
.
Understanding post-order traversal
Let's annotate the previous example by hand to understand how it works.
The algorithm is simple:
- Annotate left subtree
- Annotate right subtree
- Annotate parent
Let's start with an empty tree with 7 nodes like in the example:
_
---|---
/ \
_ _
/ \ / \
_ _ _ _
Each underscore (_
) represents a node. Of course, we start our algorithm on the root
of the tree, >
showing our current position in the tree:
>_
---|---
/ \
_ _
/ \ / \
_ _ _ _
The step one is to iterate on the left subtree. So we go down the left subtree:
_
---|---
/ \
>_ _
/ \ / \
_ _ _ _
Again, we apply the rules in order, so first we go down the left subtree:
_
---|---
/ \
_ _
/ \ / \
>_ _ _ _
Since there is no more left subtree to go down to, we annotate:
_
---|---
/ \
_ _
/ \ / \
>1 _ _ _
Step two is to annotate the right subtree:
_
---|---
/ \
_ _
/ \ / \
1 >2 _ _
And step three is to annotate the parent:
_
---|---
/ \
>3 _
/ \ / \
1 2 _ _
Since we're done with the left subtree of the root node, we move on to the right subtree in the exact same fashion. Left subtree:
_
---|---
/ \
3 _
/ \ / \
1 2 >4 _
Right subtree:
_
---|---
/ \
3 _
/ \ / \
1 2 4 >5
Parent:
_
---|---
/ \
3 >6
/ \ / \
1 2 4 5
And now that the right subtree of the root node is done, we annotate the root node itself:
>7
---|---
/ \
3 6
/ \ / \
1 2 4 5
And we're done! That's how the post-order traversal works.
@@@@@@@@@@
@@@@@@@@@@
SECOND WARNING: If you want to solve this yourself, STOP HERE! Anything past this point is a spoiler of some part of figuring out the solution, or of the solution itself. You've been warned!
@@@@@@@@@@
@@@@@@@@@@
First ideas
So, I actually was dumb enough to try to solve this by constructing the tree. If you've done enough challenges like these (or if you've read the previous posts), if they are well designed then they give you test cases big enough that solving by construction (which most times is the same as solving by enumeration) is impossible. That's the case here too, as the maximum height was something like 30, which means a tree with a whopping 1,073,741,823 nodes!
But we're getting ahead of ourselves. First of all, let's remind ourselves how to
calculate the number of nodes of a perfect binary tree given its height. Since we have
exactly two children for every node, the number of nodes increases exponentially as the
height increases linearly. We have 1 node at height h
(the root), then 2 at height
h - 1
, then 4 at height h - 2
and so on. If we express that as a sum of powers of 2,
then we have 2^0 + 2^1 + 2^2 = 7
nodes. Otherwise put:
n(h) = <sum for i from 0 to h - 1 of 2^i>
But, there's a summation identity for finite geometric series (when x != 1
):
<sum for i from 0 to n - 1 of x^i> = (x^n - 1) / (x - 1)
Applied here, that gives us:
n(h) = (2^h - 1) / (2 - 1) = 2^h - 1
Great. Why is that useful? Well, we'll want to know what the number of nodes is, because that's how we know what number the root node is annotated with. We can get some solutions for cheap, without constructing the tree:
solution.py
1 2 3 4 5 6 7 8 9 |
|
The reasoning here is pretty simple: since the root node is annotated last, it will have
the biggest number out of all the nodes, and its number will always be the number of
nodes (since we start annotating at 1). So, if p
is equal to the number of nodes then
its parent is the parent of the root (that has no parent, so -1
), and if it is
greater than the number of nodes then clearly it is not in the tree (so -1
again).
Hm. How do we go further than this with no tree to traverse?
My first thought at this point was to virtually traverse the tree, just by tracking
where we are at any point in time. In fact, I got super excited at this point, because
I noticed a pattern. Starting at the root, you can go down the right subtree by
subtracting 1 (so 7 -> 6 -> 5
), and down the left subtree by dividing by 2 (actually
floor(x / 2)
, so 7 -> 3 -> 1
). Eureka, let's code it up, that's it! Well, not so
fast cowboy.
What if you go right and then left, in the same tree? Well, you go right, so you
subtract 1, so you get to 6
, everything's alright so far. And then you go left, so
divide by 2, so floor(6 / 2) = 3
... whoops, we were supposed to land on 4
. We need
something more nuanced than this.
Finding ways to go left
We have a solid way of going right in every possible subtree: just subtract 1. This will
be true no matter what because of the very definition of the post-order traversal (the
thing that's annotated immediately after the right subtree is the parent, so
parent = right_subtree + 1
, or right_subtree = parent - 1
). What we need is a way to
go left.
Well... there is one type of scenario where we do always know how to go left: the
scenario where we're at the parent of two leaf nodes. Since the application of the
post-order traversal algorithm is trivial in this case, we know that the left node will
be annotated right before the right node. So, if we know the parent m
, then to go
right we subtract 1 (as always), and to go left we subtract 2. Now we're getting
somewhere.
Now, if we go one level up from where we are (parent of leaf nodes), we now also know how to go left from here: since the right subtree was annotated right after the left one (again, by definition of the post-order traversal), then the last element of the left subtree (the left child, in this case) should be equal to the first element of the right subtree, minus one. And the first element of the right subtree is already known to us: it's the left leaf node we found before.
I know this might be hard to picture, so let's give a concrete example with our good old
7-node tree. Let's start at the bottom right (I'll explain why later), and assume we
know that the value of the parent of the leaves is 6
. Again, as we said, the two
children are trivial to compute because they're leaves: 6 - 1 = 5
for the right one,
and 6 - 2 = 4
for the left one. Okay, so now that we know these two, we can go up a
level, to the root node (7
). We know that its right child is 6
(as I said, assume
that we know from the start), but how do we compute its left child? Well, again, it's
the node that gets annotated right before the first node of the right subtree. But the
first node of the right subtree to be annotated is the left leaf child, 4
! So the left
node from the root (7
) is 4 - 1 = 3
. Finding 3
's children is trivial as they are
leaves.
This is good, but we don't have a full solution yet: we assumed we knew the parent of the bottom right leaves. But why do we need to start there anyway?
Think about it: the only thing we know at the start is the value of the root, because
it's the very last node to get annotated in the tree (technically we also know the
values of a couple other nodes from the very start, for example the leftmost bottom leaf
will always be 1
, but that's not really useful). From the root, we know one way to get
to the bottom (or at least to a parent of leaf nodes): go right, since that's always
something we can do by subtracting 1! So we go right, keeping track of the height, and
once we hit the parent of the leaf nodes, we use the algorithm I described before to
slowly climb up the tree and uncover its left side.
If you've forgotten the goal of the challenge, I don't blame you. We need to find the
parent of a given value p
, so basically we just need to check if any of the children
we come across are equal to p
, in which case it's trivial to find the parent. We don't
even have to worry about if p
is present or not in the tree, as we already covered
those cases with our return -1
"guard".
Solution
You can implement this algorithm either recursively or iteratively. My pick was recursively because I find that more intuitive, so I ended up with something like this:
solution.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
|
Let's see how it works. First of all, the solution
function simply calls another
function to compute the non-trivial cases (i.e. the cases where the number is in the
tree).
The function that does the actual computing is find_parent
. Since it's recursive, it
needs to know the current root node, the current height, and the current parent. Of
course, it also needs to know p
(the number we're looking for) on every level of
recursion, to be able to know when it has found its parent.
At the very beginning of the function, we compute some stuff we already know:
- We know the right child (no matter what the current root this, it will always be
root - 1
) - We don't necessarily know the left child (in the general case)
- We don't necessarily have any "hints" to give for what the parent's left child is
(the
new_left_child
variable basically implements what we described before, when we discussed figuring out how to go left in the nontrivial case)
Now, if we are at the level of the parents of leaf nodes (which we identify using the
height on line 8), then we can also trivially calculate the left child's value, and most
importantly we can calculate the value of the new_left_child
, which will help "unlock"
the left child for the parent of the current root, as we discussed before.
On lines 13-16, we simply check if we've found the solution, in which case we're done. The reason we need two checks is because we might find the solution on the way down or on the way up... sort of. If the number we're looking for is one of the children nodes, then we're done: the answer is the current root. However, in many cases as we go down the tree for the first time, we will not know the left child, so we might miss this check going down. This is why we need to check "going up" (in reality, we're checking while going down on the nontrivial left branches), if the number we're looking for is actually equal to the current root (if it is, we have the answer as we also know the current parent).
For parents of leaf nodes, we're done going down; there's no point in exploring the
leaf nodes themselves (we've already checked if the number we're looking for was in
either of them), so if we've gotten to this point, on line 18, then we didn't find an
answer in this subtree, so we skip to line 42 and return. Of course, besides from the
answer we also return the new_left_child
"hint" to help explore the left subtree of
the parent.
For every other node, we're not done: we need to explore the right and left subtrees. To do that, we simply recurse: first on the right subtree, which will eventually give us the value of the left subtree, and then on the left subtree itself (now that we know its value). Of course, if we find the answer in the right subtree, there's no reason to waste time exploring the left one; we're done! This is why the check on line 31 is there.
At the end, we return whatever result and new_left_child
"hint" we found. Let's test
it out:
$ python2
Python 2.7.18 (default, Jul 1 2022, 10:30:50)
[GCC 11.2.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import solution
>>> solution.solution(3, [1, 6, 4, 3, 7, 200])
[3, 7, 6, 7, -1, -1]
It seems to be working correctly! That's good, but let's make sure it can handle bigger
trees. Let's see: since we're basically going down the right side of the tree, then
slowly explore the left side of the right subtree, then go down the right side of the
left subtree and so on, basically the last trivial subtree to get explored will always
be the first one to get annotated (node 3
, with children 1
and 2
). So, if we ask,
say, what the parent of 1
is, it should always be 3
and it should always take the
longest time to compute.
>>> solution.solution(3, [1])
[3]
>>> solution.solution(4, [1])
[3]
>>> solution.solution(10, [1])
[3]
>>> solution.solution(20, [1])
[3]
>>> solution.solution(25, [1])
[3]
Wew, that last one took something like 5-6 seconds on my computer. That's not great; I'm
pretty sure there is a timeout (so that you can't submit a solution that takes ages),
and also even if it took like one second for a height of 30, an input of
(30, [1, 1, 1, 1, 1])
would take 5 seconds. We can of course memorize the result so
that we don't have to compute it over and over again, but still the current performance
is not great.
What can we do to improve that? Well, I'm not sure if there are many ways to optimize
the algorithm of find_parent
, but we can use the initial observation we made (the fact
that we can go left by doing floor(root / 2)
from the root of the tree if we never go
right) to not spend time in subtrees that cannot contain the solution. Let's take the
tree of height 5
for example:
Let's say we're trying to find 13
's parent. We start at the root, 31
, but here's an
interesting property: everything that's less than the left child (15
) is not going
to be in the right subtree (by definition, the right subtree will start at
15 + 1 = 16
), and inversely, everything that's bigger than the left child is not going
to be on the left subtree (as the greatest value of the left subtree is its "root").
In the case of 13
itself, since 13 < 15
, we know it's in the left subtree, so we can
ignore the entire right subtree. We only have to look in the left one, which reduces the
problem to this:
Now, you can notice again that 13 > 7
, so we don't actually need the left subtree
here; that means we can ignore it and only focus on the right subtree. We could
technically delete the left subtree again, but there's no point because our algorithm
explores the right subtree first, so it won't ever touch the left subtree anyway.
Let's put everything together: basically, while the value we're looking for is less than
or equal to the value of the left child of the absolute root of the tree, we reduce the
tree by working on the tree of height h - 1
. We can keep going until the value of the
left child of the root is no longer greater than or equal to the value we're given.
Putting that in code, we have:
solution.py
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 |
|
Let's test it out:
$ python2
Python 2.7.18 (default, Jul 1 2022, 10:30:50)
[GCC 11.2.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import solution
>>> solution.solution(3, [1, 6, 4, 3, 7, 200])
[3, 7, 6, 7, -1, -1]
>>> solution.solution(30, [1, 6, 4, 3, 7, 200])
[3, 7, 6, 7, 15, 204]
And that's it! Now we compute fast for any height, as we actually don't necessarily deal with the full height of the tree.