filmov
tv
R Tutorial: The Birthday Problem
Показать описание
---
In this lesson, we will solve our first puzzle, a well-known problem called the birthday problem.
The setup is as follows: there are n people in a room, and we want to know the probability that there is at least one common birthday among any two or more people in the room.
To make this more manageable, we need to make the following assumptions. First, we exclude leap years, meaning no one has a birthday on February 29th. Next, each birthday is equally likely to fall on any day of the year. Finally, all individuals in the room are independent of each other.
For this puzzle, we will write a simulation-based solution to estimate the true theoretical probability.
To illustrate this concept, consider the following example. Suppose we want to estimate the probability of rolling a 12 with two ordinary dice. We start by defining a variable called counter, to keep track of the number of times that a 12 is rolled. It starts at 0 since no twelves have been rolled yet.
Then, we simulate a single roll, using the roll underscore dice function created previously. Here we see that the roll is indeed a 12. In practice, our code will not need to print any result. We simply check whether roll is equal to 12 and, if so, it will increment the counter by adding 1.
After a 12 was rolled, the counter is now equal to 1. If the roll had not been a 12, the counter would still be 0.
To do this many times, we will use a for loop. The counter is set to 0 before the loop begins, and then within the loop, we roll two dice and check the resulting value. If it is equal to 12, the counter is incremented by 1.
Once the loop is complete, we divide the counter by the number of iterations to obtain our estimate of the true probability. Notice that our value is very close, but not exactly equal, to the correct value of one over 36.
R has a built-in function, called pbirthday, that can solve the birthday problem theoretically. After completing a simulated solution, we can use the pbirthday function to compare our answer and calculate the birthday problem probability over a range of sample sizes to get a better idea of the trend.
Using the pbirthday function only requires providing a sample size. With a sample size of 10, the output is the probability of at least one match in a room of 10 people.
We can calculate the match probability over a variety of room sizes, using the sapply function. Here this is shown for room sizes from 1 to 10. Notice that the last probability in the output matches the value obtained for a sample size of 10.
We can display the relationship between room size and match probability in a scatterplot using the plot function, by indicating the two variables we want to plot, separated by a tilde, with the first variable on the y-axis and the second variable on the x-axis.
Let's do this.
#RTutorial #DataCamp #Probability #Puzzles