A photo of some notes I took during foobar challenges.

Google foobar, part vi: growing cells

written on March 10, 2023

categories: challenges

tags: google-foobar, Python

This post is the sixth installment of the foobar challenge series. You can read the first one to learn more about foobar and see what sort of intro problems the challenge presents if you don't want any real spoilers.

WARNING: Obviously, MAJOR spoilers ahead, for the problems themselves and possible solutions or hints. Even if you're attempting one of these and feel completely stuck, give it another go before reading the rest. Keep reading only if you don't mind getting spoiled!

Problem statement

This one was by far the most confusing/hard to understand for me. I think it's because they're trying to incorporate the whole bunny story into it, and sometimes it's not clear whether something is there to support the narrative or whether it's a hard constraint on the problem itself. Anyway, I'll try to explain it more clearly here; the original problem talks about bombs, but to me it makes more intuitive sense with cells:

Consider two types of cells: M and F. On each generation, the cells of one type can produce one cell of the other type each; however, only one type of cell can reproduce on a given generation. For example, if you start with 3 M cells and 2 F cells, either the 3 M cells can produce one F cell each (resulting in 3 M cells and 5 F cells), or the 2 F cells can produce one M cell each (resulting in 5 M cells and 2 F cells).

Given a number of M cells and a number of F cells (as strings), write a function that returns the number (as a string) of generations needed to arrive at that configuration starting from 1 M cell and 1 F cell, or the string "impossible" if it is not possible to arrive at that configuration.

Let's give some concrete examples: for M = "2" and F = "7", we would need 4 generations:

M cells F cells Generation
1 1 0
2 1 1
2 3 2
2 5 3
2 7 4

But for M = "2" and F = "4", it's impossible (you can convince yourself by trying all the possible evolution options).

@@@@@@@@@@

@@@@@@@@@@

SECOND WARNING: If you want to solve this yourself, STOP HERE! Anything past this point is a spoiler of some part of figuring out the solution, or of the solution itself. You've been warned!

@@@@@@@@@@

@@@@@@@@@@

First ideas

As always, let's get a solid understanding of the problem first by figuring out all the possible outcomes for, say, 4 generations. To simplify the notation, let's write the amount of M cells and F cells as a tuple (m, f) (where obviously m is the number of M cells and f is the number of F cells):

Number of cells for each generation, from 0 to 4

The different configurations attainable in 0 to 4 generations.

Let's now make some observations:

Okay... so where do we go from here? We could try going through the generations and check if the input matches any of the generated configurations. It's not great in terms of complexity: for example, for the biggest input (10^50) we would potentially need to go up to the 10^50 - 1-th generation (see the leftmost children of the tree in the previous figure), and don't forget that that's not simply O(n) as the number of generated configurations increases exponentially with each generation (even ignoring the high-level symmetry of the tree I mentioned before).

Plus, how would we cover the case of the impossible configurations? It would be extremely inefficient to have to go all the way up to (10^50, 1) to return "impossible" for something like (4, 2).

Let's give this a try anyway, though, to get our feet wet:

solution.py

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
def find_generation(M, F):
    if (M, F) == (1, 1):
        return 0

    starting_point = [2, 1]  # (M, F)
    counter = 1
    current_points = [list(starting_point)]

    while True:
        if [M, F] in current_points or [F, M] in current_points:
            return str(counter)

        if (
            [M + 1, F] in current_points
            or [M, F + 1] in current_points
            or [F + 1, M] in current_points
            or [F, M + 1] in current_points
        ):
            return "impossible"

        new_current_points = []
        for point_pair in current_points:
            new_current_points += [
                [point_pair[0] + point_pair[1], point_pair[1]],
                [point_pair[0], point_pair[0] + point_pair[1]],
            ]
        counter += 1
        current_points = new_current_points


def solution(M, F):
    res = find_generation(long(M), long(F))

    return res

Remember, our inputs are strings, and our output needs to be a string too. This is why the solution() function is really a wrapper, that makes sure to feed (long) integers to find_generation(), that does the actual work.

The load-bearing function (find_generation()) is pretty simple: it starts at (2, 1) to get rid of the extra symmetric results we talked about. This is why we need to handle the (1, 1) case separately on lines 2-3. From that point on, we continuously generate new configurations until we hit one of two conditions:

  1. (M, F) or (F, M) is in the configurations of the current generation, which means that we found the configuration (lines 10-11)
  2. One of (M + 1, F), (M, F + 1), (F + 1, M), (F, M + 1) is in the configurations of the current generation, which means that the configuration is impossible (lines 13-19)

I admit that I'm not sure if (2) completely covers the "impossible" condition, but hey, this is just a first try.

Let's try it out:

$ python2 solution.py 1 1
0
$ python2 solution.py 2 3
2
$ python2 solution.py 5 8
4
$ python2 solution.py 2 8
impossible

It seems to work! What about performance?

$ time python2 solution.py 1 10
9
python2 solution.py 1 10  0,00s user 0,00s system 97% cpu 0,008 total
$ time python2 solution.py 1 20
19
python2 solution.py 1 20  0,26s user 0,02s system 99% cpu 0,278 total
$ time python2 solution.py 1 24
23
python2 solution.py 1 24  5,49s user 0,40s system 99% cpu 5,885 total

That's really not great. We're at barely > 0% of the maximum value of the input and the execution time is already almost 6 seconds! We need to change strategy.

I spent some time trying to figure out a formula that links a given (m, f) pair to a generation number (kind of like in the second problem), but I couldn't find something reliable; for instance, I looked at the minimum and maximum sums (m + f) and differences (|m - f|) for each generation, but that doesn't help because it doesn't generate unique "characteristics" for each generation. For example, the minimum-maximum difference for generation 4 is 1-5, but the one for generation 2 for example is 1-2. So if you get a pair (m, f) where |m - f| = 2, you can't know if it belongs to generation 2 or 4 (e.g. (3, 1) belongs to generation 2, but (7, 5) belongs to generation 4).

Solution

The biggest problem with this approach is something that had been bugging me since the start: it's so... awkward to solve this problem going "upwards" in generations, because there's no clear end condition. What if we did the opposite? If we started to go "downwards" in generations, counting them until we reach (1, 1), we have a clear end condition: either we hit it and we return the generation count, or we don't, which means that the configuration is impossible starting from (1, 1).

How do we go backwards in generations though? Well, let's look at the reproduction rule: for a given pair (m, f), we can generate (m + f, f) and (m, f + m). This means that, in order to go backwards, we only have to do the opposite, meaning we need to subtract either m from f or f from m for the bigger term, and keep the smaller term the same. Simply put, for an (m, f) pair, we go backwards one generation by computing (|m - f|, min(m, f)).

There's another thing: we don't have to check that we arrived at (1, 1) precisely; when we're at any (1, x)/(x, 1) configuration, we can just return x - 1 as the generation. This gives us the following algorithm:

solution.py

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
def rewinder(M, F):
    if M == 1:
        return F - 1
    elif F == 1:
        return M - 1

    if M <= 0 or F <= 0:
        return -float("inf")

    return 1 + rewinder(abs(M - F), min(M, F))


def solution(M, F):
    res = rewinder(long(M), long(F))

    if res == -float("inf"):
        return "impossible"
    else:
        return str(int(res))

Again, the solution() function is just a wrapper; the interesting stuff happens in rewinder().

As we said, if either of the terms of the tuple is 1, we can return the other one minus one to get the generation. If either of them is negative or zero, that means that we've come here from an invalid configuration, so we need to return "impossible". Finally, we recurse by going down a generation (and adding one to the generation count).

Let's test it out:

$ python2 solution.py 1 1
0
$ python2 solution.py 4 2
impossible
$ python2 solution.py 3 32
12
$ python2 solution.py 3 1000
335
$ python2 solution.py 3 5000
...
RuntimeError: maximum recursion depth exceeded in cmp

Wow, interesting! This is bad news though; if we're stuck at 5000, how are we supposed to get to 10^50?

Let's examine what rewinder() gets called with:

(3, 5000)
(4997, 3)
(4994, 3)
(4991, 3)
(4988, 3)
(4985, 3)
(4982, 3)
   ...

See the problem? If one term is big and the other one is small, we spend ages subtracting the small term from the big term, no kidding this hits the maximum recursion depth!

The solution to this is actually trivial: what's a way to speed up a subtraction? Well... a division! Instead of spending ages subtracting a small number from a big one, we can just divide the big one by the small one and skip over a big number of generations!

solution.py

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
def rewinder(M, F):
    if M == 1:
        return F - 1
    elif F == 1:
        return M - 1

    if M <= 0 or F <= 0:
        return -float("inf")

    divs = max(M, F) // min(M, F)

    if divs == 0:
        return 1 + rewinder(abs(M - F), min(M, F))
    else:
        return divs + rewinder(max(M, F) - divs * min(M, F), min(M, F))


def solution(M, F):
    res = rewinder(long(M), long(F))

    if res == -float("inf"):
        return "impossible"
    else:
        return str(int(res))

Of course, we need to not forget to handle the case where we can't skip generations; for example, if we have (2, 3), we simply need to subtract like before. This is why the if-else block is needed on lines 12-15.

Let's test it out:

$ python2 solution.py 3 5000
1668
$ python2 solution.py 38219 109382738937
2862063
$ time python2 solution.py 3 100000000000000000000000000000000000000000000000000  # 10^50
33333333333333333333333333333333333333333333333335
python2 solution.py 3 100000000000000000000000000000000000000000000000000  0,00s user 0,00s system 96% cpu 0,008 total

And that's it!


< Google foobar, part v: breaking the n-th wall Who's afraid of the big bad Math >