A photo of some notes I took during foobar challenges.

Google foobar, part ii: stashing bunnies

written on February 4, 2023

This post is the second installment of the foobar challenge series. You can read the first one to learn more about foobar and see what sort of intro problems the challenge presents if you don't want any real spoilers.

WARNING: Obviously, MAJOR spoilers ahead, for the problems themselves and possible solutions or hints. Even if you're attempting one of these and feel completely stuck, give it another go before reading the rest. Keep reading only if you don't mind getting spoiled!

Problem statement

Okay, so this one is about stashing bunnies. Bunnies? Well, anything that's fuzzy and warm and you can assign an ID to.

Assume you have a bunch of bunnies stashed in a corner, represented by their ID numbers, revealing the order you stacked them in:

Bunny heap

Uh... sure... bunnies.

We want to figure out what the ID of the bunny is given a pair of coordinates (x, y), where x is the distance from the leftmost "wall" and y the distance from the "floor". For example, the bunny with ID 1 is at (1, 1), the one with ID 8 is at (2, 3) and so on.

@@@@@@@@@@

@@@@@@@@@@

SECOND WARNING: If you want to solve this yourself, STOP HERE! Anything past this point is a spoiler of some part of figuring out the solution, or of the solution itself. You've been warned!

@@@@@@@@@@

@@@@@@@@@@

First ideas

A naive solution to this could be to simply construct this thing as a 2D array, and index into it to get the answer immediately. Of course that wouldn't be very interesting, so the challenge prevents us from doing so by setting the max values of x and y to a considerably large number (something like 100,000 if I remember correctly). While you might be able to get away with "constructing" the solution like this here, I haven't tried it and I think it's more fun to find a solution that doesn't need that anyway.

So... where do we start? Well, as always it's a good idea to start looking for patterns in the numbers, so let's start by writing down the first few coordinates, ordered by the ID they point to:

(1, 1) -> 1
(1, 2) -> 2
(2, 1) -> 3
(1, 3) -> 4
(2, 2) -> 5
(3, 1) -> 6
(1, 4) -> 7
(2, 3) -> 8
(3, 2) -> 9
(4, 1) -> 10
    ...

See a pattern? I'll try to describe what came to me first, and it's okay if it's kind of wishy-washy and not clearly formulated yet, we will get to a more "mathematical" version later.

What I first observed when writing those down is that you always seem to come back to a (1, M) pair, then some other pairs, then (M, 1), and then (1, M + 1). For example, you have the pair (1, 4), then some other pairs, then (4, 1), and then (1, 5) immediately after.

The second thing I observed was that those "intermediate" pairs between (1, M) and (M, 1) were kind of "stepping through" the values between 1 and M. More specifically, the value of the x coordinate increments until it reaches M, and the value of the y coordinate decrements until it reaches 1. So, finally, the sequence is:

(1    , M    )
(2    , M - 1)
(3    , M - 2)
     ...
(M - 2, 3    )
(M - 1, 2    )
(M    , 1    )
(1    , M + 1)
     ...

Great, but.. what ID is associated to (1, M), how does this get us anywhere? Well, at this point we could pretty much solve this by iterating on this sequence until we reach the given coordinate pair.

I don't love this solution; sure, maybe there's some clever dynamic programming trick we could use to optimize this, but basically if we're given (100000, 100000) we're screwed: we'd have to go through literally every other ID before we find that one. There's also another problem: our "sequence algorithm" gets us IDs in the same order as the one they're put in the stash in, so diagonally! This means that we will only get the pair (M, M) if it's in the diagonal of (1, N) ... (N, 1), but of course M <= N. If the max value of x and y was 4, we would never be able to get (M, M) for any M greater than 2!

NOTE: if you don't get why that is, think that the diagonal containing (2, 2) ends at (3, 1). This means that the next diagonal starts at (1, 4), but it doesn't contain an (M, M) pair (you can convince yourself by writing down the entire diagonal). The diagonal after that one, (1, 5), does contain (3, 3), but as you can clearly tell (1, 5) is "out-of-bounds", because for this example we said that the max value of x and y is 4.

Looking for more patterns

We can do much better than this. The next idea I had was to look at what IDs were associated with coordinates of the form (1, M), for different values of M:

(1, 1) -> 1
(1, 2) -> 2
(1, 3) -> 4
(1, 4) -> 7
(1, 5) -> 11
    ...

Let's forget about the pairs of any x and y and focus on the (1, M) pairs for a moment: reducing a problem by adding a constraint like this often makes it easier to find a partial solution to the original problem, and then by removing the constraint we can start to build a complete solution.

When I got to this point while solving this, I was trying to find a sort of "law" for the sequence of IDs (1, 2, 4, 7, 11, ...), or for the relationship of the ID and M, the y coordinate. Maybe there is one, but I didn't find it, and I came up with another idea instead: what if we thought of every (1, M) pair as the beginning of a group? In that case, the ID of the beginning of a group would basically be the number of IDs below it, plus one (since IDs start at 1). The first group is the (1, 1) ... (1, 1) group, which contains a single element ((1, 1)), so the ID of the first element of the second group is <number of elements in group 1> + 1 = 1 + 1 = 2.

Let's do another example to convince ourselves: group 2 has 2 elements ((1, 2), (2, 1)), so we can say that (1, 3) will have an ID of <number of elements in group 1> + <number of elements in group 2> + 1 = 1 + 2 + 1 = 4, which is indeed the case.

So, for a given "group leader" pair (1, M), we have two things to figure out in order to be able to find its ID:

Which group it is in
How many elements are "below" it

The first one is easy: (1, M) is the first element of the M-th group! The second one is more complicated. I mean, we can sort of answer the question by saying "the number of elements below M is the sum of elements of the groups 1 through M - 1", but of course that's not the complete answer. It does simplify things though: now we only need to find how many elements are in a given group M. Well... if you think of it, the "groups" we've been talking about are simply the diagonals!

Bunny heap, with colored groups

Red is group 1, green is group 2, yellow is group 3, orange is group 4. Ugly colors!

So, this question is actually trivial: group M has M elements! Now we can start assembling our answer.

Every pair (1, M) has an ID of <number of elements of previous groups> + 1, which is equal to (1 + 2 + ... + (M - 1)) + 1.

That's great! We now know the ID of every "group leader" pair. How can we solve the general problem of finding the ID of any (x, y) pair? Well, all we have to do at this point is figure out two things, again:

Which group (x, y) belongs to
What is its offset from the "group leader"

The first one is easy to figure out if we remember how we obtained the sequence of each group: increment x until M, decrement y until 1. Since we're incrementing x and decrementing y at the same time, that means that their sum is constant, so no matter which (x, y) we land on in the sequence, x + y will give us the same value (for group 3 for instance, 1 + 3 = 2 + 2 = 3 + 1). But what's that sum? Well, since the "group leader" of group M is (1, M), the sum is M + 1! That's the answer to the first question: (x, y) belongs to group x + y - 1.

The second question is also easy to answer if again we look at the patterns: since we are looking for an offset from (1, M), and the sequence pattern is formed by incrementing x, all we have to do is look at x! For the first element of the group x will be 1, for the second 2 and so on. So, to get the offset from the "group leader" for any given (x, y) pair, we simply calculate x - 1.

Solution

Let's put everything together. For a given pair (x, y), belonging to group x + y - 1, the ID is:

<ID of "group leader" of (x + y - 1)> + <offset from "group leader" of (x + y - 1)>

Expanding on that, we have:

(<number of elements before (x + y - 1)> + 1) + (x - 1)

And by expanding further we get:

(<sum from 1 to (x + y - 2)> + 1) + (x - 1)

With a bit of simplifying, we finally get:

<sum from 1 to (x + y - 2)> + x

If we wanna be even cooler and pretend we actually know some basic math, we can simplify it further since <sum from 1 to N> is equal to N * (N + 1) / 2. So the final formula actually is:

ID(x, y) = (x + y - 2) * (x + y - 1) / 2 + x

Okay, that's good and all, but does it work? Let's test our formula on a couple of pairs:

ID(2, 3) = (2 + 3 - 2) * (2 + 3 - 1) / 2 + 2 = 8

Nice! And another one:

ID(4, 2) = (4 + 2 - 2) * (4 + 2 - 1) / 2 + 4 = 14

Woo! The formula seems to be correct!

This is great because, since it's a closed-form solution, the time complexity will be O(1) (or, in layman's terms, really frickin fast). Let's implement it in code:

solution.py

def solution(x, y):
    return (x + y - 2) * (x + y - 1) // 2 + x

And that's it!