Probability Distributions Made Easy: Top 3 to Know for Data Science Interviews

preview_player
Показать описание
In this video, we will go over the top 3 probability distributions commonly seen in data science interviews. Here are the topics covered in this video:
- Normal distribution
- Binomial distribution
- Geometric distribution

Each distribution will be explained on what they are and examples of where they are commonly used.



🟢Get all my free data science interview resources

// Comment
Got any questions? Something to add?
Write a comment below to chat.

// Let's connect on LinkedIn:

====================
Contents of this video:
====================
0:00 Intro
1:06 Normal distribution
4:25 Binomial distribution
6:15 Geometric distribution
Рекомендации по теме
Комментарии
Автор

Hi Emma, thanks for your wonderful video.

In your Binomial example, I would like to point out that click through rate follows a normal distribution due to Central Limit Theorem.

Assuming the total number of clicks follows a Binomial(n, p), which means that there are total n impressions in consideration, and whether each impression ends up as a click is a Bernoulli(p) variable. In other words, there are only two outcomes for each impression, and with probability p it ended up as a click.

The click through rate is the average of the results of all the above n Bernoulli variables. By CLT, the average of all these Bernoulli variables follows a normal distribution.

After all, click through rate is a continuous variable, while a Bernoulli distribution is a discrete distribution with only 2 outcomes.

chihirobabuska
Автор

In the first example (Avg time spent per user per day), the sample size is 10. Can we assume normality, given our sample size is too small?

danielrad
Автор

Thanks, Emma for the great video! If we map the distribution to the AB test distribution, for binary outcomes it will be binomial distribution. At the same time, will other cases all be normal distribution according to the Central limit theorem? I do not have enough practical experience in AB Testing, would love to know how we decide how different distributions are used in the AB test. Why do we have to specify a T-test or a Z-test?

songsong
Автор

I think the green and blue parameters are swapped for the normal distribution diagram.

emmysway
Автор

Hii Mam Remember me
I love the way you taught us everything 😍😍🤗🤗

keshavgupta
Автор

Hi Emma, I have a doubt - How would one calculate the average time spend per user per day?
Say we select a random sample of 10 users as in your example in the video. For those 10 users we have data on the time spent per day for each of the users. Now a user might have multiple time spent per day values depending on if they were active on several days. So for a particular user we calculate the average time spent per day by that user and then take the average time spent for the 10 users using average of individual averages?

raghavmittal
Автор

Hi Emma 你好 本土国内大学数学专业 未留过学的 希望竞争国外的数据科学家有希望吗

lenka