Python Tutorial : Expected value, mean, and variance

Показать описание

---

Is there an outcome in the data that is more likely than others? How spread out is the data?

Let's explore three metrics for measuring these attributes: the expected value, the mean, and the variance.

A discrete random variable has finite outcomes. For instance, the roll of a die has only six possible outcomes.

For discrete random variables, the expected value is the sum of the possible outcomes weighted by their probability.

To calculate the expected value, each outcome is multiplied by its probability, and the products are summed.

In simple terms, the expected value is a value where the probability will concentrate when you repeat the experiment.

We multiply our tails outcome by its probability, 1 - p then we add the heads outcome, which is 1 multiplied by its probability, p, and we get the expected value p.

The arithmetic mean is the sum of each outcome divided by the number of samples. It is based on data.

We will see that the arithmetic mean of the outcomes converges to the expected value as we increase the number of random experiments.

If we calculate the mean from many coin flips we will get a number near the probability of heads for that particular coin.

In Python, we will use the describe function from the scipy dot stats library to get the mean: all we have to do is pass an array to the method to get its mean.

If we repeat the experiment and make hundreds of fair coin flips we should get a number near 0.5.

So, the more coins we flip, the more the sample mean of all the throws approaches the expected value

If we add even more coin flips, it becomes even clearer that the sample mean tends to the expected value.

That is the relationship between the expected value and the mean.

This is known as the law of large numbers which we will review later in the course.

The variance measures how concentrated or spread out from the expected value the data is.

Variance is the expected value of the squared deviation of a random variable from its expected value.

In Python, we will again use the describe function and take the variance from the result.

In the particular case of the binomial distribution, the expected value is the product of the number of coin flips and the probability of getting heads.

The variance is the product of the expected value and the probability of failure (not getting heads).

For 10 coin flips, with a fair coin, the expected value is 5 and the variance is 2.5.

In Python, we will use the binom dot stats method to get the expected value and variance of a binomial distribution.

Using binom dot stats, we can make some calculations so we know what we can expect in our simulations.

The expected value and variance for one fair coin flip are 0.5 and 0.25. To get these values, we just use binom dot stats and specify n as 1 and p as 0.5 for a fair coin flip.

The expected value for one biased coin flip with 30% probability of getting heads is 1 times 0.3, which is 0.3, and the variance is 0.3 times 0.7, which is the probability of failure, as we saw earlier.

In our last example, we can see that the expected value for 10 fair coin flips is 5 and the variance is 2.5.

Expected value, mean, and variance are essential to probability and statistics.
In fact, these are the most important measures to calculate to determine if the data is spread out or concentrated around the expected value.
So let's get busy calculating!

#DataCamp #PythonTutorial #FoundationsofProbabilityinPython