Google foobar, part vi: growing cells
written on March 10, 2023
categories: challenges
This post is the sixth installment of the foobar challenge series. You can read the first one to learn more about foobar and see what sort of intro problems the challenge presents if you don't want any real spoilers.
WARNING: Obviously, MAJOR spoilers ahead, for the problems themselves and possible solutions or hints. Even if you're attempting one of these and feel completely stuck, give it another go before reading the rest. Keep reading only if you don't mind getting spoiled!
Problem statement
This one was by far the most confusing/hard to understand for me. I think it's because they're trying to incorporate the whole bunny story into it, and sometimes it's not clear whether something is there to support the narrative or whether it's a hard constraint on the problem itself. Anyway, I'll try to explain it more clearly here; the original problem talks about bombs, but to me it makes more intuitive sense with cells:
Consider two types of cells:
M
andF
. On each generation, the cells of one type can produce one cell of the other type each; however, only one type of cell can reproduce on a given generation. For example, if you start with 3M
cells and 2F
cells, either the 3M
cells can produce oneF
cell each (resulting in 3M
cells and 5F
cells), or the 2F
cells can produce oneM
cell each (resulting in 5M
cells and 2F
cells).Given a number of
M
cells and a number ofF
cells (as strings), write a function that returns the number (as a string) of generations needed to arrive at that configuration starting from 1M
cell and 1F
cell, or the string"impossible"
if it is not possible to arrive at that configuration.
Let's give some concrete examples: for M = "2"
and F = "7"
, we would need 4
generations:
M cells | F cells | Generation |
---|---|---|
1 | 1 | 0 |
2 | 1 | 1 |
2 | 3 | 2 |
2 | 5 | 3 |
2 | 7 | 4 |
But for M = "2"
and F = "4"
, it's impossible (you can convince yourself by trying
all the possible evolution options).
@@@@@@@@@@
@@@@@@@@@@
SECOND WARNING: If you want to solve this yourself, STOP HERE! Anything past this point is a spoiler of some part of figuring out the solution, or of the solution itself. You've been warned!
@@@@@@@@@@
@@@@@@@@@@
First ideas
As always, let's get a solid understanding of the problem first by figuring out all
the possible outcomes for, say, 4 generations. To simplify the notation, let's write
the amount of M
cells and F
cells as a tuple (m, f)
(where obviously m
is the
number of M
cells and f
is the number of F
cells):
Let's now make some observations:
- The left and right subtrees (from the absolute root,
(1, 1)
) are symmetric: this is good to keep in mind because it means we do not need to produce both of them to get the entire tree. I guess that a corollary of this is that, if(x, y)
is a valid configuration,(y, x)
is valid, too (for anyx
,y
). (x, 1)
/(1, x)
is always a valid configuration, as is(x, x + 1)
/(x + 1, x)
(forx >= 1
), and there are probably more patterns like this.
Okay... so where do we go from here? We could try going through the generations and
check if the input matches any of the generated configurations. It's not great in terms
of complexity: for example, for the biggest input (10^50
) we would potentially need to
go up to the 10^50 - 1
-th generation (see the leftmost children of the tree in the
previous figure), and don't forget that that's not simply O(n)
as the number of
generated configurations increases exponentially with each generation (even ignoring
the high-level symmetry of the tree I mentioned before).
Plus, how would we cover the case of the impossible configurations? It would be
extremely inefficient to have to go all the way up to (10^50, 1)
to return
"impossible"
for something like (4, 2)
.
Let's give this a try anyway, though, to get our feet wet:
solution.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
|
Remember, our inputs are strings, and our output needs to be a string too. This is why
the solution()
function is really a wrapper, that makes sure to feed (long) integers
to find_generation()
, that does the actual work.
The load-bearing function (find_generation()
) is pretty simple: it starts at (2, 1)
to get rid of the extra symmetric results we talked about. This is why we need to handle
the (1, 1)
case separately on lines 2-3. From that point on, we continuously generate
new configurations until we hit one of two conditions:
(M, F)
or(F, M)
is in the configurations of the current generation, which means that we found the configuration (lines 10-11)- One of
(M + 1, F)
,(M, F + 1)
,(F + 1, M)
,(F, M + 1)
is in the configurations of the current generation, which means that the configuration is impossible (lines 13-19)
I admit that I'm not sure if (2) completely covers the "impossible"
condition, but
hey, this is just a first try.
Let's try it out:
$ python2 solution.py 1 1
0
$ python2 solution.py 2 3
2
$ python2 solution.py 5 8
4
$ python2 solution.py 2 8
impossible
It seems to work! What about performance?
$ time python2 solution.py 1 10
9
python2 solution.py 1 10 0,00s user 0,00s system 97% cpu 0,008 total
$ time python2 solution.py 1 20
19
python2 solution.py 1 20 0,26s user 0,02s system 99% cpu 0,278 total
$ time python2 solution.py 1 24
23
python2 solution.py 1 24 5,49s user 0,40s system 99% cpu 5,885 total
That's really not great. We're at barely > 0% of the maximum value of the input and the execution time is already almost 6 seconds! We need to change strategy.
I spent some time trying to figure out a formula that links a given (m, f)
pair to a
generation number (kind of like in the second problem),
but I couldn't find something reliable; for instance, I looked at the minimum and
maximum sums (m + f
) and differences (|m - f|
) for each generation, but that doesn't
help because it doesn't generate unique "characteristics" for each generation. For
example, the minimum-maximum difference for generation 4 is 1-5, but the one for
generation 2 for example is 1-2. So if you get a pair (m, f)
where |m - f| = 2
, you
can't know if it belongs to generation 2 or 4 (e.g. (3, 1)
belongs to generation 2,
but (7, 5)
belongs to generation 4).
Solution
The biggest problem with this approach is something that had been bugging me since the
start: it's so... awkward to solve this problem going "upwards" in generations,
because there's no clear end condition. What if we did the opposite? If we started to
go "downwards" in generations, counting them until we reach (1, 1)
, we have a clear
end condition: either we hit it and we return the generation count, or we don't, which
means that the configuration is impossible starting from (1, 1)
.
How do we go backwards in generations though? Well, let's look at the reproduction rule:
for a given pair (m, f)
, we can generate (m + f, f)
and (m, f + m)
. This means
that, in order to go backwards, we only have to do the opposite, meaning we need to
subtract either m
from f
or f
from m
for the bigger term, and keep the smaller
term the same. Simply put, for an (m, f)
pair, we go backwards one generation by
computing (|m - f|, min(m, f))
.
There's another thing: we don't have to check that we arrived at (1, 1)
precisely;
when we're at any (1, x)
/(x, 1)
configuration, we can just return x - 1
as the
generation. This gives us the following algorithm:
solution.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
Again, the solution()
function is just a wrapper; the interesting stuff happens in
rewinder()
.
As we said, if either of the terms of the tuple is 1
, we can return the other one
minus one to get the generation. If either of them is negative or zero, that means that
we've come here from an invalid configuration, so we need to return "impossible"
.
Finally, we recurse by going down a generation (and adding one to the generation count).
Let's test it out:
$ python2 solution.py 1 1
0
$ python2 solution.py 4 2
impossible
$ python2 solution.py 3 32
12
$ python2 solution.py 3 1000
335
$ python2 solution.py 3 5000
...
RuntimeError: maximum recursion depth exceeded in cmp
Wow, interesting! This is bad news though; if we're stuck at 5000
, how are we supposed
to get to 10^50
?
Let's examine what rewinder()
gets called with:
(3, 5000)
(4997, 3)
(4994, 3)
(4991, 3)
(4988, 3)
(4985, 3)
(4982, 3)
...
See the problem? If one term is big and the other one is small, we spend ages subtracting the small term from the big term, no kidding this hits the maximum recursion depth!
The solution to this is actually trivial: what's a way to speed up a subtraction? Well... a division! Instead of spending ages subtracting a small number from a big one, we can just divide the big one by the small one and skip over a big number of generations!
solution.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
|
Of course, we need to not forget to handle the case where we can't skip generations;
for example, if we have (2, 3)
, we simply need to subtract like before. This is why
the if-else
block is needed on lines 12-15.
Let's test it out:
$ python2 solution.py 3 5000
1668
$ python2 solution.py 38219 109382738937
2862063
$ time python2 solution.py 3 100000000000000000000000000000000000000000000000000 # 10^50
33333333333333333333333333333333333333333333333335
python2 solution.py 3 100000000000000000000000000000000000000000000000000 0,00s user 0,00s system 96% cpu 0,008 total
And that's it!