Stand-up Maths' max-of-dice conjecture finally proven!

preview_player
Показать описание


**Known errors/erratta in video:**
-At 28 31, I say "k=1, m=3 but I meant to say "k=1, m=2"
-At 38 15, I have the factor (1-2/n) but it should be (1-1/n). And then the product should go to (1-(m-2)/n) as the last term.
- When I say "k-th largest" I mean the k-th from the left (so the maximum is when k=m and the minimum is when k=1)
-Note that the 3rd correction term does not apply when m=1. This is because P(N_maxers = 2)=0 in that case! (Not P(N_maxers=2)=m/2n+O(1/n^2) like we got in the video)! You can also see this because we had a factor of (m-1)/(m-1) in our derivation.

**Related Videos:**

**Chapters:**
0:00 Rolling with Advantage and Stand-up Maths Conjecture
1:48 Graphs of Approximation Error Sizes
3:54 Probability Ninja Proof and Intuition for Each Term
6:15 Expected Minimum of Dice Rolls
9:08 One Term Approximation
13:10 Expected Maximum of Uniform Random Variables
18:16 Expected Value of a Beta Random Variable
19:40 Two Term Approximation
24:39 Example of Dealing with Max X minus Delta
26:30 Three Term Approximation
33:00 Probability of Two Dice Tied for Max via Birthday Paradox
40:28 Bonus Fourth Term

**Music Credits:**
Creative Commons / Attribution 3.0 Unported License (CC BY 3.0)
Рекомендации по теме
Комментарии
Автор

Ooooh this is very nice - the perfect math christmas present:D

DrTrefor
Автор

(1/2) + ((m n) / (m+1)) - (m / (12 n)) + ((m(m-1)(m-2)) / (720 n^3)) - ((m(m-1)(m-2)(m-3)(m-4)) / (30240 n^5) + / (1209600 n^7)) - ... Denominators follow OEIS sequence A060055. Numerators initially look like only 0 or 1 (with sign), but follow A060054. For example, for (1/n^11) term has 691 in numerator and 1307674368000 in denominator. Include all terms of (1/n^k) such that m>k. For example, if m=10 stop at (1/n^9) term.

sheppa
Автор

amazing video! i use the whole "stretching" out of a uniform distribution all the time in programming! i the trick of inverting a rounded-up uniform distribution over 0-n into rolling a dice and subtracting a uniform distribution over 0-1. that was so cool! lots of neat ways to think about probability that i'd never seen before!

lunafoxfire
Автор

This is one of the best descriptions of series approximations that I've seen on YT. It's great that you've given each term a qualitatively different explanation, which definitely has parallels with perturbation theory for me (while being simple enough for a general audience!). I particularly liked the graphs at the start, they really show what you gain at each step

mikeflowerdew
Автор

Really lovely video, lots of good intuition, and yet just enough details that you can see how a rigorous argument proving the same thing would shake out, without bogging people down in tedium. Thank you for taking the time and effort to explain this, and the "reveal" at the end that the 1/n² term vanishes was quite the surprise!
Just as a heads-up, whatever greenscreen technology you are using seems to not quite be able to mask the very top right corner of your camera feed correctly, and so there is a small artifact more or less in the middle of the screen for most of the video. Not a big deal at all, but probably easy to correct.

droid-droidsson
Автор

really cool video man!
funny as it is, the bit that was most confusing to me was the part you subtracted and added the 1/2. took me a few good seconds to realize what has happened there.

yedidiapery
Автор

So, the P(Nmax=2) perspective on the third term explains why the third correction term doesn't make sense when m=1 - intuitively you should anticipate that m=1 should still be a valid input, but Nmax>=2 being the cause of the third term means m needs to be >1 (you can't have two dice being maximum out of a set of one 😂)

djsmeguk
Автор

I wrote a comment to Matt Parker's video with my own attempt at proving this 2 years ago after watching it. I'll copy it here, since I think it works but want to know where I've erred if it doesn't.

For n, k >= 1 and m <= n, let D(n, k; m) := # ways that the max value of k rolls of n-sided dice is m. (m > n obviously gives 0, so that's not very interesting)
Equivalently, D(n, k; m) = #{ s in {1, . . ., n}^k | max(s) = m}. An interesting thing happens here, though: Since we're taking m <= n, we can actually restrict the domain to {1, . . ., m}^k without losing any elements. This is because any set of k rolls where some roll > m occurs, we won't be counting it. Therefore,
D(n, k; m) = #{s in {1, . . ., m}^k | max(s) = m}

Now, let's enumerate all the possibilities. By definition, we have to roll at least one m, and nothing larger than m.
Suppose m shows up as the first roll. Then, the other (k-1) rolls can be anything between 1 and m, so this accounts for m^(k-1) elements.
If m doesn't show up as the first roll, but is the second roll, that means we had (m-1) possible values for the first roll, 1 value for the second (this is where m is first rolled), and then m^(k-2) possibilities for the last k-2 rolls. This accounts for m^(k-2) * (m-1) elements.
This process continues until the only case is where the final roll is m, leaving (m-1)^(k-1) possibilities for the first (k-1) rolls.

This gives us the sum: Sum( m^(k-j) * (m-1)^(j-1), j = 1 to k ). I'll leave it as an exercise to the reader to show that this sum comes out to m^k - (m-1)^k.

Therefore, D(n, k; m) = m^k - (m-1)^k. Since there are n^k possible sets of k rolls, the probability that the max value of k rolls of n-sided dice is m = (m^k - (m-1)^k)/n^k.

Now, we can compute the expected value. Let E(n, k) denote the expected value for the k-roll experiment with n-sided dice.

E(n, k) = Sum( m * (m^k - (m-1)^k)/n^k, m = 1 to n ) = (1 / n^k) * Sum( m * (m^k - (m-1)^k), m = 1 to n ) = (1 / n^k) * (n^(k+1) - Sum( m^k, m = 1 to n-1 )). As before, I'll leave the final equality as an exercise to the reader.

For k = 1, 2, 3, it's easy to simplify this down further using the well-known formulas for the sum of the first (n-1) integers, squares, and cubes (resp.):
E(n, 1) = (1/n) * (n^2 - [n * (n - 1)] / 2) = (n + 1) / (2) -> For a single roll of a D6, we get (6 + 1) / (2) = 3.5, as expected.
E(n, 2) = (1/n^2) * (n^3 - [n * (n - 1) * (2n - 1)] / 6) = (4n^2 + 3n - 1) / (6n) -> For 2 rolls of a D20, we get (1600 + 60 - 1)/(120) = 13.825, which agrees with the 2 rolls of D20 simulations in the video.
E(n, 3) = (1/n^3) * ([n^2 * (n-1)^2] / 4) = (3n^2 + 2n - 1) / (4n) -> Matches the value computed in the video.

Taking limits as n->infinity also work (using the ~ asymptotic notation):
E(n, k) ~ kn/(k+1) and so E(n, k) / n -> k/(k+1) as n->infinity.

For the full expression of E(n, k) / n, we'd need to make use of Faulhaber's formula, which is a big complicated expression involving Bernoulli numbers.
This means that your conjecture isn't quite true, but it is "true" in an asymptotic sense. Using Faulhaber's formula, E(n, k) = kn/(k+1) + 1/2 + o(1) as n->infinity.
[Note: Little-o notation - f(n) = o(g(n)) means f(n)/g(n) -> 0 as n -> infinity. In this case, f(n) = o(1) means f(n) -> 0]

For rolling with disadvantage (formulas provided without justification; it's essentially the same as above):
d(n, k; m) := # ways that the min value of k rolls of n-sided dice is m = (n-m+1)^k - (n-m)^k
probability of min value after k rolls of n-sided dice is m = ( (n-m+1)^k - (n-m)^k ) / n^k
e(n, k) := expected value of taking the min of k rolls of n-sided dice = (1/n^k) * Sum( m^k, m = 1 to n )

Using Faulhaber's formula for the sum of the first n kth powers, we find:
e(n, k) = n/(k+1) + 1/2 + o(1) as n->infinity
where that o(1) packs in a bunch of complicated expressions involving Bernoulli numbers.

MathNerdGamer
Автор

a great video, such an underrated channel

thatisthewaz
Автор

For this problem i'd probably have done with descriptive shorthands, like s for sides, and n for number, or d for dice

shadeblackwolf
Автор

I'm not gonna lie, when I saw this thumbnail, I thought it was going to be a pointless video. After all, Matt's asymptotic result is not just simple but very easy to prove. But by 5 minutes in, I realized it was a way better video than I expected. It's a really nice explanation of where the formula comes from and how to interpret it in at least two different ways. I do wish it would give the complete series, but I think there is enough in this video for me to work it out. And I loved the connection to Faulhaber's formula.

EDIT: You even mentioned Matt's video that showed max(U₁, U₂) ~ U². I feel like you two could do a whole video on dice and order statistics (e.g. dice of different colors, or dice beating other dice, can be directly related to order statistics, especially when you generalize to more than 2 players).

The only thing that makes me sad is the singular "dice." We are being conquered by the Brits once more! I'm cool with our English language, our English common law, etc. But in the singular of dice, I'd rather die.

EebstertheGreat
Автор

Very cool math tricks, and I appreciate that you show us exactly where the approximation differs from the real value and by what terms

Patashu
Автор

In the calculation of P(All Different) around 37:40 shouldn't the error term be of order O(m^3/n^2), which is not negligible when m^3 is comparable to n^2?

ComputerNerd
Автор

As a mathy person who struggles with probability beyond just crunching examples, I'm loving this! I'm at 28:07, and almost shivering with antici...pation to discover where that 12th is gonna emerge from.

Juttutin
Автор

Cool video, but the thing that catches the eye about 2term vs 3rd term approximations is: if we fix the N and let M get arbitrary large, with only 2 terms E[max] approaches N, which sounds correct since no matter the size N, if we throw the dice many times (M >> N) then surelly we will get the biggest possible value sooner or later, but with 3rd turm E[max] goes to negative inf, maybe the possitive O(N^2) term could fix this? idk but the concept of better approximation which only works on the N ~ M interval seems strange...
surely the simple (M * N)/(M+1) +0.5 formula cant be the best for every pair (N, M)?

Lost_Evanes
Автор

At 38:15 why is it that it's multiplied by (1-2/n)? Shouldn't it be (1-1/n) ? Because the first two dices have the same values so there is only one unique value that the next dice should be different from?

mars_titan
Автор

Great presentation, that was surprisingly easy to follow.

FireStormOOO_
Автор

Isn't the P(N_maxers) dependent on the value of the maxiumum? E.g. if the max of 3 dice is 1 then the probability that all three dice have the maximum value of 1 is 100%. It's probably a higher order term, but I think this isn't considered in your exact formula

Kaepsele
Автор

A formula that is exact is 1 + n - sum_{k=1}^{n-1} (k/n)^m. I found this by defining a matrix A such that pi' = A*pi takes the probability distribution of the max value of rolling m n-sided die to the probability distribution of the max value of rolling m+1 n-sided die. The Eigendecomposition of A is easy to compute by hand as it has a nice form. Using the Eigendecomposition you can compute pi_m = Q*(diag(v)^(m-1))*(Q^-1)*(1/n, ..., 1/n) efficiently by exponentiating the Eigenvalues. If you just want the expected value of the max, you can simplify further to the formula provided.

kevinwaugh
Автор

18:56 correction: it's the "kth smallest", not the "kth largest", but we can just use the min-max trick from earlier. The kth largest would be (m-k)/(m+1).

simonwillover