Can You Solve The Three Erupting Geysers Riddle? (Amazon Interview Question)

preview_player
Показать описание
Thanks to Brian Galebach who created and sent me this problem! You arrive at a park where geysers A, B, and C erupt at intervals of precisely 2, 4, and 6 hours, respectively. Each started erupting independently at a random time in history, unknown to you. What are the probabilities that each geyser (A, B, and C) will be the first to erupt after your arrival?

I also received an email this problem was asked during an interview for Amazon.

David Weaver created a simulation on Khan Academy where you can see the probabilities after many trials

My blog post for this video

Source: Brian Galebach created the problem and gave permission for me to post it in a video. It was previously shared on Oliver Roeder's Riddler column on 538:

This is a creative approach, but I personally felt it was hard to visualize and calculate the volumes. But it is neat:

If you like my videos, you can support me at Patreon:

Connect on social media. I update each site when I have a new video or blog post, so you can follow me on whichever method is most convenient for you.

If you buy from the links below I may receive a commission for sales. This has no effect on the price for you.

My Books
"The Joy of Game Theory" shows how you can use math to out-think your competition. (rated 3.9/5 stars on 32 reviews)

"The Irrationality Illusion: How To Make Smart Decisions And Overcome Bias" is a handbook that explains the many ways we are biased about decision-making and offers techniques to make smart decisions. (rated 4.6/5 stars on 3 reviews)

"Math Puzzles Volume 1" features classic brain teasers and riddles with complete solutions for problems in counting, geometry, probability, and game theory. Volume 1 is rated 4.4/5 stars on 13 reviews.

"Math Puzzles Volume 2" is a sequel book with more great problems. (rated 4.3/5 stars on 4 reviews)

"Math Puzzles Volume 3" is the third in the series. (rated 3.8/5 stars on 5 reviews)

"40 Paradoxes in Logic, Probability, and Game Theory" contains thought-provoking and counter-intuitive results. (rated 4.3/5 stars on 12 reviews)

"The Best Mental Math Tricks" teaches how you can look like a math genius by solving problems in your head (rated 4.7/5 stars on 4 reviews)

"Multiply Numbers By Drawing Lines" This book is a reference guide for my video that has over 1 million views on a geometric method to multiply numbers. (rated 5/5 stars on 3 reviews)
Рекомендации по теме
Комментарии
Автор

Brian Galebach, the creator of this problem, explains one counter-intuitive aspect of the problem as follows: "One thing that I find fascinating between the contrast between the on-average and on-interval problems is the following: At the moment you arrive, the ratio of the probabilities that A, B, and C will erupt within the next instant (say the next second) is in fact 6:3:2. However, because you know that A must erupt within 2 hours, and B must erupt within 4 hours, as soon as you start waiting, the instantaneous probabilities start changing. If you've been waiting for 1 hour and 59 minutes, then the instantaneous probability for A will be much higher than those for B and C!"

(Edit May 2023). David Weaver created a simulation on Khan Academy where you can see the probabilities after many trials

MindYourDecisions
Автор

Great puzzle with a surprising solution! I tested the solution using a program that created millions of random values, and got identical results.

JohnSmithEx
Автор

Would love to see a computer simulation of this situation, to show those final percentages slowly appearing as the number of trials tends towards infinity. Interesting problem!

stephenmessano
Автор

Wow I did this way differently and with a lot more effort too! For geyser A, I knew it’s probabily density was 0.5, I had to multiply this by the chances of the other two not having erupted. The chance of one geyser to have erupted is respectively: 1/2t, 1/4t, 1/6t with t being the time. This way the probability-density-graph of geyser a being the first one to erupt is: 1/2*(1-1/4t)*(1-1/6t). This graph shows how likely it will be for you on a certain time t after arrival to see the first geyser errupt, which is geyser a.
Know if we take the integral from 0 to 2, we will find the probability of a being the first one to errupt.
You can do the same thing for B and C ofcourse.
I did it before watching the video and ended up with the same result, funny how different means come to the same end.

sebastiaanhoek
Автор

TL:DR: where each letter represents 40 minutes, the following sequences represent which geyser is next to erupt on a cyclical timeline:
there are 2 sequences because there are 2 different equally likely schedules for the geysers.

by counting how many letters there are representing each geyser, we get our answer:
A: 23/36, B: 8/36=2/9, C: 5/36


I solved it in a kind of graphical way by writing 6 different timelines indicating in what order geysers would erupt, given that the timeline cycles and it starts at a point when all 3 geysers have already started erupting.
I made the timeline by making 6 copies of a permutation of the letters ABC:
ABCABCABCABCABCABC
then I removed every second B
ABCA0CABCA0CABCA0C
then after each C, I removed the next 2 Cs.
ABCA00AB0A0CAB0A00
so now I have a timeline of eruptions. if I were to walk in on a random point when a geyser erupts, then that would be the one I would see, else (if I walk in on a point marked 0 when no geyser erupts) I would have to wait until the next eruption, so lets replace all 0s with whatever letter comes after it, wrapping around to the start for the trailing 0s.
ABCAAAABAACCABAAAA
so if the eruptions are happening at this pattern, then we can just count how many letters there are of each kind to get the odds of seeing that geyser erupt first.
A = 11, B = 3, C = 3.
giving the odds of seeing A = 11/17, B = 3/17, C = 3/17
but that's the wrong answer and there are 2 objections you might rightfully have.
A. doesn't this assume that each letter in the sequence represents the same amount of time (40 minutes, since 3 letters represent 2 hours)? yes it does and the fact that eruptions does not need to be synced this way in the stated problem means that that might need to be something we would need to take into account, luckily the chances that a certain window of time (marked by a letter) is smaller than 40 minutes is exactly matched by the chance that the same window is larger than 40 minutes, these chances end up exactly canceling each other out in the total probability calculations. primarily for this it is fine to assume that each letter, when generated this way, represents exactly the same amount of time. furthermore, I haven't done the math on it but I have a suspicion that even if the chances didn't cancel out because of odds of larger windows are the same as smaller windows, it's possible that they will be canceled out by the fact that a smaller window in one place will necessarily mean a larger window in a different place, and that could very well be such that the probabilities of eruption sightings end up canceling each other out that way.
B. how do we know the geysers erupt with this timing? answer: we don't and that's why we have the wrong answer. I made 5 more timelines so I had a representation of each of the 6 different permutations and counted all As, Bs, and Cs in all permutations and used those odds, and that produces different result. interestingly enough though as I found out after writing out all permutations, there are actually not 6 important permutations (ABC, ACB, BAC, BCA, CAB, CBA) but 2 families of 3 permutations each ({ABC, BCA, CAB}, {ACB, BAC, CBA}) where each family produces the same patterns of eruptions. this is because in the first method of generating sequences the permutations in the same family simplifies to each other, just shifted, however there is an easier way to understand this by generating the sequence in a different way:
1. by first writing six As (which there is obviously only 1 way to do):

2. then inserting 3 Bs evenly spaced out, which there is only 1 way to do given the cyclical nature of the sequence:
ABAABAABA
3. then inserting 2 Cs evenly spaced out (in relation to the As), which there ends up being 2 ways to do, the C that ends up next to a B can be either to the left or the right of the B:
ABCAABACABA
ACBAABACABA
4. then filling in with 0s so that each A is 3 letters apart, each B 6 letters apart and each C 9 letters apart which there is exactly 1 way to do for each sequence:
ABCA00AB0A0CAB0A00
ACBA00A0BAC0A0BA00
5. and finally replacing each 0 with whatever letter comes next, obviously, there is only 1 way to do this
(the sequences in parenthesis afterwards show the shift of the permutations of the families, [>] marks the start of a sequence for one of the permutations):
ABCAAAABAACCABAAAA (>A>BCAAAABAAC>CABAAAA)
ACBAAAABBACAABBAAA (>A>CBAAAAB>BACAABBAAA)
and since there was only 1 way to perform steps 1, 2, 4 and 5 and 2 ways to perform step 3, there is 1*1*2*1*1=2 different sequences we can get and those are all the sequences we can get and each sequence is equally likely.
the first sequence is the one we got from ABC, it has 11 As, 3 Bs and 3Cs. the second one has 12 As, 5 Bs and 2 Cs.
in total we get 23 As, 8 Bs and 5 Cs, which gives us our answer:
the chances of seeing the respective geyser first is:
A: 23/36, B: 8/36=2/9, C: 5/36

robinlindgren
Автор

Consider a tree structure. Each geyser counts out its own 2 hour intervals Geyser A erupts every 2 hour interval. After geysers B and C each make a random start, geyser B erupts every other 2 hour interval and geyser C every third. We have 3 branches from the starting point: A's interval expires first, B's expires first or C's expires first, each with 1/3 probability. We really only have to find the probabilities that B or C erupts first. We add them together and deduct from 1 to find the probability that A erupts first.

There are 2 cases where B erupts first. It's interval may expire first, 1/3 probability, and it may then erupt with 1/2 probability, for 1/6 overall probability. The other case is where C's interval expires first (1/3) but it does not erupt (2/3), B's interval expires next (1/2) and B erupts (1/2). Multiply all 4 together to get 1/18. Pr(B) = 1/6 + 1/18 = 2/9.

There are 2 cases where C erupts first. It's interval may expire first, 1/3 probability, and it may then erupt with 1/3 probability, for 1/9 overall probability. The other case is where B's interval expires first (1/3) but it does not erupt (1/2). C's interval expires next (1/2) and C erupts (1/3). Multiply all 4 together to get 1/36. Pr(C) = 1/9 + 1/36 = 5/36.

P(A) = 1 - P(B) - P(C) = 1 - 2/9 - 5/36 = 1 - 13/36 = 23/36.

jimlocke
Автор

I did this geometrically and got the same answer. It might be worth doing the graphics on it. It's a neat method (although less generalizable than yours).

Basically, imagine a 2x4x6 region of space, representing when the geysers erupt relative to your position at the origin.

Then you can color the space according to which one erupts next.

Then you can calculate the volume of the three regions.

GlentonJelbert
Автор

I first thought about it in terms of probabilities, but figured that it would be cumbersome to treat each geyser’s eruption as a probability with respect to time.

My approach was essentially like calculating equity in poker, because this kind of calculation is done all the time while playing a game. Suppose you’re spectating a three handed game of poker. You know that player A will only play the top third of all hands, player B will play the top two thirds, and player C will play every hand they’re dealt. The equity of each player is precisely the probability of them winning the hand. We consider first the case where all three players are dealt a hand in the top tercile of strength (1/6 probability). If this happens, each player has an equal likelihood of winning, so the equity is split evenly among them (1/18 for each person). Consider player B hits the top tercile but player C misses (1/3 probability); then, players A and B split the equity (1/6 for each A and B). If player B misses but player C hits (1/6 probability), players A and C gets 1/12 equity; lastly, if B and C miss (1/3 probability), player A wins the whole share of equity.

Summing up the equity for each person in every scenario gives player A winning 23/36 of hands, player B winning 8/36 of hands, and player C winning 5/36 of hands. What surprised me the most was that player B does not have much of an edge on player C that one might expect, while player A maintains quite a substantial lead.

mossy
Автор

Or we just integrate:
P(A)*2*4*6 = int_0^2 (4-x)(6-x) dx = 92/3, so P(A) = 23/36.
P(B)*2*4*6 = int_0^2 (2-x)(6-x) dx = 32/3, so P(B) = 2/9.
P(C)*2*4*6 = int_0^2 (2-x)(4-x) dx = 20/3, so P(C) = 5/36.

SmileyMPV
Автор

I tried it graphically by drawing a 2x4x6 box and dividing it into 3 chunks representing the probability of the whole thing to be that geyser first. For a 48 volume box, 30-2/3 was A, 10-2/3 was B, and 6-2/3 was C. Normalizing this is 23/36 A, 8/36 B, 5/36 C, or 64%, 22%, 14%

Let's see if I was anywhere close.

MichaelOnines
Автор

I did it through integrals and got the same results, but the calculations were insane!

NestorAbad
Автор

I also found the general case. Order the eruption intervals from least to greatest, a <= b <= c, and define Geysers A, B, and C accordingly.
The probability of A erupting first is (2a^2 - 3ab - 3ac +6bc)/6bc
The probability of B erupting first is (3ac - a^2)/6bc
The probability of C erupting first is (3ab - a^2)/6bc

P(Case I) = a^2/bc
P(Case II) = (ac - a^2)/bc
P(Case III) = (ab - a^2)/bc
P(Case IV) = (a^2 - ab - ac + bc)/bc
The conditional probabilities for each geyser erupting under each case are the same as in the video.

charlescastleman
Автор

You can also do this using calculus as follows: consider the interval [0, 2] where 0 is the time of arrival. At any time x in [0, 2], the probability that none of them have occurred yet and then A occurs in the next dx interval is = 1/2.(1-x/4)(1-x/6)dx, which can be integrated from 0 to 2 to find that the probability that A erupts first is 23/36. Likewise for B and C.

aroundandround
Автор

I'd solve it like this: I'd assume that the next eruption of A is in A hours, variable A uniformly distributed on 0..2, and likewise B on 0..4, c on 0..6. I also assume these are all independent. The set of possible combinations can be considered a cuboid of size 2•4•6. One part is when B, C > 2, then A always is first (let's say it "wins" the race). This region has size 2•(4-2)•(6-2), for a probability of 1/3. The next part is B ≤ 2 but C > 2. This part is split equally between A and B winning, and it has a size of 2•2•(6-2), for a probability of 1/6 each A and B. Similarly, A, C ≤ 2 < B gives us 1/12 each A and C winning. Finally A, B, C ≤ 2 gives us 1/18 for each winning. Summing up for
A: 1/3 + 1/6 + 1/12 + 1/18 = 23/36
B: 1/6 + 1/18 = 8/36
C: 1/12 + 1/18 = 5/36

cmilkau
Автор

Just ran across this one. It's a great problem. I did it in my head without paper and pencil. I had all the same numbers as you, but I came to them in a slightly different way. Think about geyser A as laying down the two-hour interval pattern. Now geyser B will erupt within half of those 2-hour stretches. Let's start that in the second interval. Then geyser C will erupt in every third interval, so let's start that with the third interval. After 6 intervals, the pattern repeats, and it looks like this: Nothing, B, C, B, Nothing, B&C, repeat. So when there is nothing in the interval (no B or C), then A will erupt first (at the end of the interval because A eruptions define the 2-hour intervals). When B or C erupts within an interval, there is a 50% chance that it erupts before A because there is a 50% chance that you arrived early enough in the interval that it happens. When both B and C erupt in the interval, there is a 1/3 chance for each of A, B and C to go first because you arrived at a random point in the interval. Anyway, just add up the probabilities for the 6 intervals and divide the result by 6 to get the answer: for A we have (1+1+1/2+1/2+1/2+1/3)/6 = 23/36, for B we have (1/2+1/2+1/3)/6 = 8/36 or 2/9, and for C we have (1/2+1/3)/6 = 5/36.

mbmillermo
Автор

This one made me think a bit, I liked it.

Consider the probability space as a three dimensional box.
The x-dimension runs from 0 to a, the y dimension from 0 to b, the z-dimension from 0 to c.
We are assuming a <= b <= c (in our case we will put a=2, b=4, c=6).
 The x value signifies the time it will take befor A erupts, etc. Every point in this space is equally likely,   so we will compare volumes. We are interested in the lowest coordinate value (among x, y, z) because the corresponding geyser (A, B, C) will erupt first.
Total volume is abc, divided over 4 subregions:
Subregion where x, y, z in [0, a]. Volume is a^3 Any coordinate is lowest in 1/3 of this subregion.
Subregion where y and z >a. Volume is a(b-a)(c-a). In this region x has the lowest coordinate.
Subregion where y in [0, a] but z in (a, c]. Volume a^2(c-a) In half this region is y the smallest coordinate, the other half x
Subregion where z in[0, a] but y in (a, b]. Volume a^2(b-a). In half this region is z the smallest, the other half x
Adding up the volumes and dividing by total volume we get:
P(A first) = a^2/3bc - a/2c - a/2b + 1
P(B first) = a/2b - a^2/6bc
P(C first) = a/2c - a^2/6bc
Plugging in a=2, b=4, c=6 we get P(A first)=23/36, P(B first)=8/36, P(C first)=5/36.

koenth
Автор

My thought was to calculate the fraction of all possible geyser configurations where you are closest to A, closest to B and closest to C at a given time t, then integrate out t. Three separate triple integrals of the product of step functions later and I'm at the same answer. Definitely made it more complicated than it needed to be.

josh-edri
Автор

I think it's worth mentioning that you can get to this same answer without the mention of conditional probability or dot products.


The time unit is irrelevant, so 2 hour will be 1 time unit, 4 hour = 2 time unit and 6 hour = 3 time unit
Since the least common multiple of 1, 2 and 3 is 6, we know that the 3 geysers together has a 6 unit eruption cycle. Now it's enough to calculate the probabilities for one cycle, since every cycle is the same and every cycle has the same chance to contain the start point of our test.


Every cycle will have a time unit which contains 3 eruptions.
This can easily be seen if we use the 1 unit long eruption as a scale so I won't go into detail.
It does not matter where is a cycle's beginning so let's say it's the unit where all 3 eruptions occur and it starts on the eruption of the 1 unit long geyser.
Now we can see 6 time unit. the 2nd and the 6th is empty and end with the 1 unit long geyser. This is 2/6 probability for the 1 unit geyser.
The 3rd, 4th and 5th time unit contains 1 eruption. 2 of them are the 2 unit long eruption and the 3rd is the 3 unit long. In each of these time units the contained eruptions have 1/12 chance, and the 1 unit long eruption has 1/12 chance as well in every one of them since every time unit ends with that geyser.
The first time unit has 2 eruptions, and ends with the 3rd which means each of them has 1/18 chance.


Results:
1 unit long: 2/6 + 3/12 + 1/18 = 23/36
2 unit long: 1/6 + 1/18 = 2/9
3 unit long: 1/12 + 1/18 = 5/36

heck-r
Автор

Nice problem ! - where I must admit I also thought Pa=2*Pb =3*Pc => 6/11, 3/11, 2/11 - done! :)
But after learning that :\ this is wrong, I found the correct answers by integration,
and I think it is a bit more straightforward to calculate in the following way :

Set the time axis origin so that the observer arrives at t=0.
Geysers A, B, C have their next eruptions uniformly distributed in the intervals
[0, 2[ and [0, 4[ and [0, 6[, respectively.
The first eruption happens in the two-hour interval.

The probability that the first eruption is from geyser A in the interval [t, t+dt[ is
dP_A = (dt/2) times (4-t)/4 times (6-t)/6
The three independent probabilities are
(dt/2) : A erupts on that interval
(4-t)/4 : B erupts after t
(6-t)/6 : C erupts after t

Then the probability that the first eruption is from geyser A is the sum
P(A first) = int dP_A = int{t=0}^{2} [ (dt/2)((4-t)/4)(6-t)/6 ]
= (1/48) int{t=0}^{2} [(4-t)(6-t)dt] = 23/36

Similarly,
the probability that the first eruption is from geyser B in the interval [t, t+dt[ is
dP_B = (dt/4) times (2-t)/2 times (6-t)/6
and then
P(B first) = int dP_B = int{t=0}^{2} [ (dt/4)((2-t)/2)(6-t)/6 ]
= (1/48) int{t=0}^{2} [(2-t)(6-t)dt] = 8/36 = 2/9

P(C first) = 1 - 23/36 - 8/36 = 5/36

arnok.
Автор

I just got to 2:44 and have this to say: the fact that the geysers erupt at fixed time intervals does not mean they don't "average". In fact they do! The fixedness of the time interval is immaterial to the "average". That's what makes an average an average.

nerdyengineer