An easier way to do sample size calculations

Показать описание

Stay updated with the channel and some stuff I make!

Комментарии

I've been shocked at how broadly useful Monte Carlo approaches are in general.

I remember one problem I spent weeks figuring out the correct way to solve an issue, by that point it had been so convoluted to figure out, I decided writing a Monte Carlo simulation to verify I hadn't made a mistake would be smart.

The simulation got the exact same results out to three decimal points, and took about 10 minutes to write.

The other great thing about Monte Carlo simulations is they make all of your assumptions exceedingly clear, while equations tend to obfuscate your assumptions.

Eckster

Yup, this is standard practice in particle physics. Eventually the technicality boils down to the modeling of the physical process being investigated, which may involve hundreds of gigabytes of equations. One of the reasons that this is necessary is that there can be signal-background interference in the particle physics processes. What also makes it extra worth it is that the same MC simulation will be used again during data analysis when the actual data collection reaches a checkpoint.

Hopefully, the day-to-day business applications do not often involves complex modeling, and formulae for rough estimations may still be the most economic, especially when the signal and background do not interfere significantly. However, when heavy machiniary, such as MC simulation based on complex model is built, its value can exceed mere advising on sample size. For example. After the statistical analysis with real-life data is done, if the business wants to improve its operation, the model and simulation can be adjusted to provide outlooks for the improvements being considered.

YaofuZhou

Yes. I found this method myself after being unsatisfied with traditional power analysis. What's nice is how flexible it is, and how it can be used to quantify and challenge assumptions you have about your data / population.

marcusstoica

this video comes exactly at the right time for me as I'm trying to run a power analysis for maximum likelihood fitted sigmoid functions and I was really running out of ideas :))

bingobongo

This is similar to a method I use to show why we "fail to reject the null" instead of just rejecting it. If we change the criteria from the confidence interval not including the null to simply the p-value, then plot the returned sims as a histogram, we see when the null is actually true the p-value is simply a uniform random variable. The "falser" the null becomes, the more right tailed our p-value distribution becomes.

library("foreach")
sims = foreach(i=1:10000, combine = c) %do% {
groupA = rnorm(30, mean=0, sd=1)
groupB = rnorm(30, mean=0.125, sd=1)

test = t.test(groupA, groupB, conf.level = 0.95)
result = test$p.value
}

hist(unlist(sims), freq = FALSE)

ronaldjensen

great video man. really enjoy brushing up on my skills via your channel.

deltax

I love your videos!!
When i was thinking about creating an statistical test, I thought about doing the same to find out how powerful my test could be!

santiagodm

Just one step away from using a Bayesian approach :-)

ronbally

I personally still prefer deriving the sample size needed for my estimators from concentration bounds given a certain level of control, which makes more intuitive sense to me. But I also like having other tools in my belt, so thank you for the video, great as usual 😀

AllemandInstable

Very helpful, thank you! In your code you should replace the magrittr pipe: %>% with the new native pipe in R: |>
Just a thought for future videos, so that no one gets hung up with an error that "%>% doesn't exist" if they don't load the tidyverse.

_r_ma_

Hi man! quite useful! thanks! i've just used it to estimate how many cross validation folds i'd need to determine if a small improvement between two machine learning models is significant (which is about 45 for a power of 95% ... i'll need more training data '-' )

I'm just missing references to the original material (papers, books, etc), my advisor don't like me putting yt videos as reference (it's not boring enough to work as a serious reference for serious academics hehehe)

It would be a nice detail if you include then in your next ones! Also it would make easier to learn more about the subject too. Thanks!

joaopedrorocha

Hey! I saw that simulations are used to estimate sample size for mixed models too, but it seemed a bit more complex. If you'd like to make a video on that, it would be super super useful :)

anne-katherine

One plus to the mathematical formulae (which are not always equations but sometimes also inequations) is that they are computationally fast. A Monte Carlo simulation requires more electrical power than most formulae. The downside of the formulae is primarily that they can be very technical to obtain in the first place and they are only known to be valid under the assumptions they were derived. What's the electrical power cost of spending some length of time working on a formula? I don't know.

galenseilis

You should make a class on Udemy covering how to use statistics for different job titles. Maybe partner with ZeroToMastery? A lot of us have degrees but have to make a shift to a new field that needs more statistics. I'm biology but now I'm going towards Business Analyst and Project Management. I need help connecting the theory to the business world. Coding examples that use SQL, Python, and R is needed, too.

BrakeForLoop

Quick question here. With the MC approach, we need to know the difference we are looking for right?. In this scenario you had 0.5 as the difference to create the second sample and apply the test afterwards. Should we always try to have a specific difference in mind before running an experiment? Or how could I approach this issue if I'm not sure what difference I'm expecting. Thanks for the content!

diegodelgadocaceres

"you can't" - I cackled

melm

can you do a video of renewal processes or renewal theory it's rare to find videos about it, i would really appreciate it.

itexsoo

The hard part is that you don't really know the true effect, and it heavily affects the sample size you need to get the same confidence

innerbloomset

What do you use to create your videos? Manim? What video editing program?

Possumman

Idk if you saw my reply from the other video but.. Could you possibly look into doing a video on set theory? I feel as if that is a foundation on making statistics more accessible as it is a whole different language from basic xyz variables.

Dondo