Chris Fonnesbeck - Probabilistic Python: An Introduction to Bayesian Modeling with PyMC

preview_player
Показать описание
Chris Fonnesbeck presents:

Probabilistic Python: An Introduction to Bayesian Modeling with PyMC

Bayesian statistical methods offer a powerful set of tools to tackle a wide variety of data science problems. In addition, the Bayesian approach generates results that are easy to interpret and automatically account for uncertainty in quantities that we wish to estimate and predict. Historically, computational challenges have been a barrier, particularly to new users, but there now exists a mature set of probabilistic programming tools that are both capable and easy to learn. We will use the newest release of PyMC (version 4) in this tutorial, but the concepts and approaches that will be taught are portable to any probabilistic programming framework.

This tutorial is intended for practicing and aspiring data scientists and analysts looking to learn how to apply Bayesian statistics and probabilistic programming to their work. It will provide learners with a high-level understanding of Bayesian statistical methods and their potential for use in a variety of applications. They will also gain hands-on experience with applying these methods using PyMC, specifically including the specification, fitting and checking of models applied to a couple of real-world datasets.

PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.

PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.

00:00 Welcome!
0:08 Introduction
1:19 Probabilistic programming
1:53 Stochastic language ”primitives”
3:06 Bayesian inference
3:27 What is Bayes?
3:57 Inverse probability
4:39 Why Bayes
5:13 The Bayes formula
4:21 Stochastic programs
6:51 Prior distribution
8:12 Likelihood function
8:29 Normal distribution
8:53 Binomial distribution
9:14 Poisson distribution
9:32 Infer values for latent variables
9:54 Posterior distribution
9:47 Probabilistic programming abstracts the inference procedure
10:56 Bayes by hand
12:18 Conjugacy
16:43 Probabilistic programming in Python
17:24 PyMC and its features
19:15 Question: Among the different probabilistic programming libraries, is there a difference in what they have to offer?
20:39 Question: How can one know which likelihood distribution to choose?
21:35 Question: Is there a methodology used to specify the likelihood distribution?
22:30 Example: Building models in PyMC
27:31 Stochastic and deterministic variables
37:11 Observed Random Variables
41:00 Question: To what extent are the features of PyMC supported if compiled in different backends?
41:47 Markov Chain Monte Carlo and Bayesian approximation
43:04 Markov chains
44:19 Reversible Markov chains
45:06 Metropolis sampling
48:00 Hamiltonian Monte Carlo
49:10 Hamiltonian dynamics
50:49 No U-turn Sampler (NUTS)
52:11 Question: How do you know the number of leap frog steps to take?
52:55 Example: Markov Chain Monte Carlo in PyMC
1:13:30 Divergences and how to deal with them
1:15:08 Bayesian Fraction of Missing Information
1:16:25 Potential Scale Reduction
1:17:57 Goodness of fit
1:22:40 Intuitive Bayes course
1:23:09 Question: Do bookmakers use PyMC or Bayesian methods?
1:23:53 Question: How does it work if you have different samplers for different variables?
1:25:09 Question: What route should one take in case of data with many discrete variables and many possible values?
1:25:39 Question: Is there a natural way to use PyMC over a cluster of CPUs?
Рекомендации по теме
Комментарии
Автор

*Abstract*

This tutorial provides an introduction to Bayesian modeling with PyMC,
a probabilistic programming library in Python. It covers the
fundamental concepts of Bayesian statistics, including prior
distributions, likelihood functions, and posterior distributions. The
tutorial also explains Markov Chain Monte Carlo (MCMC) methods,
specifically the No U-Turn Sampler (NUTS), used to approximate
posterior distributions. Additionally, it emphasizes the importance of
model checking and demonstrates techniques for assessing convergence
and goodness of fit. The tutorial concludes with examples of building
and analyzing models in PyMC, including predicting the outcomes of
sporting events.

*Summary*
*Introduction (**0:03**)*
- This tutorial is intended for data scientists and analysts interested in applying Bayesian statistics and probabilistic programming.
- No prior knowledge of statistics, machine learning, or Python is assumed.
- The tutorial provides a high-level overview of Bayesian statistics, probabilistic programming, and PyMC.
*Probabilistic Programming (**1:24**)*
- Probabilistic programming involves writing programs with outputs partially determined by random numbers.
- It allows for specifying statistical models using stochastic language primitives like probability distributions.
- The main purpose of probabilistic programming is to facilitate Bayesian inference.
*What is Bayes? (**3:30**)*
- Bayesian statistics uses probability models to make inferences from data about unknown quantities.
- It involves updating prior beliefs based on observed data to obtain posterior distributions.
- Bayes' formula is the foundation of Bayesian inference.
*Why Bayes? (**4:39**)*
- Bayesian inference is attractive due to its utility and conceptual simplicity.
- It allows for incorporating prior knowledge and quantifying uncertainty in estimates and predictions.
*Prior distribution (**6:51**)*
- Prior distributions quantify uncertainty in unknown variables before observing data.
- Uninformative priors can be used when little is known beforehand.
- Informative priors can be based on domain knowledge or previous data.
*Likelihood function (**8:13**)*
- The likelihood function describes how the data relates to the model.
- It is a probability distribution conditioned on the model and observed data.
- Different likelihood functions are appropriate for different types of data (e.g., normal, binomial, Poisson).
*Infer values for latent variables (**9:34**)*
- Bayesian inference combines prior and likelihood information to generate the posterior distribution.
- The posterior distribution represents updated knowledge about unknown variables after observing data.
- Calculating the posterior distribution often requires numerical methods due to the complexity of integration.
*Probabilistic programming in Python (**16:48**)*
- Several probabilistic programming libraries are available in Python, including PyMC, Stan, Pyro, and TensorFlow Probability.
- PyMC is specifically designed for fitting Bayesian statistical models using MCMC methods.
*PyMC and its features (**17:29**)*
- PyMC provides various features for Bayesian modeling, including:
- Built-in statistical distributions
- Tools for output analysis and plotting
- Extensibility for custom distributions and algorithms
- GPU support and different computational backends
*Example: Building models in PyMC (**22:30**)*
- The tutorial demonstrates building a changepoint model in PyMC to analyze baseball spin rate data.
- The model estimates the changepoint and mean spin rates before and after the sticky stuff crackdown.
- The example showcases specifying stochastic and deterministic variables, priors, likelihoods, and running MCMC sampling.
*Markov Chain Monte Carlo and Bayesian approximation (**41:47**)*
- MCMC methods are used to approximate posterior distributions by simulating a Markov chain.
- The simulated chain converges to the posterior distribution as its stationary distribution.
- Metropolis sampling and Hamiltonian Monte Carlo (HMC) are two MCMC algorithms.
*Hamiltonian Monte Carlo (**48:03**)*
- HMC uses gradient information to efficiently explore the posterior distribution.
- It simulates a physical analogy of a particle moving through the parameter space.
- The No U-Turn Sampler (NUTS) is an improved HMC algorithm that automatically tunes parameters.
*Example: Markov Chain Monte Carlo in PyMC (**53:06**)*
- The tutorial demonstrates fitting a model for predicting rugby scores using MCMC in PyMC.
- The example showcases specifying priors, likelihoods, running MCMC sampling, and analyzing the results.
*Model checking (**1:11:58**)*
- Model checking is crucial to ensure the validity of the fitted model.
- It involves assessing convergence diagnostics and goodness of fit.
*Convergence diagnostics (**1:12:10**)*
- Convergence diagnostics verify whether the MCMC algorithm has effectively explored the posterior distribution.
- Techniques include visually inspecting trace plots, checking for divergences, analyzing energy plots, and calculating potential scale reduction (R-hat) statistics.
*Goodness of fit (**1:17:58**)*
- Goodness of fit assesses how well the model fits the observed data.
- The posterior predictive distribution is used to compare model predictions with the data.
- Visualizations like cumulative distribution plots can help evaluate goodness of fit.
*Making predictions (**1:20:43**)*
- PyMC allows for making predictions with fitted models by updating the data and sampling from the posterior predictive distribution.
- The tutorial demonstrates predicting the outcome of a rugby match between Wales and England.
*Conclusion*
- The tutorial concludes by encouraging further exploration of Bayesian modeling with PyMC and suggesting additional resources.

disclaimer: i used gemini 1.5 pro to summarize the youtube transcript.

wolpumba
Автор

lovely video, after watching it a third time and trying to build some models of my own i finally start understanding it :)

iliya-malecki
Автор

Wow. I'm literally looking up PyMC3 because I'm writing a paper on Bayesian analysis of pitcher performance as a game progresses. Turns out this dude is doing the same thing.

donnymcjonny
Автор

It's always great to see a video on pymc

CristianHeredia
Автор

@39:00 it takes like > 250 mins on my VSC and 50 mins in my jupyterhub. Is there any reason for that?

tobiasmuenchow
Автор

This is great video with good practical application. Is it possible to have access to the notebooks?

tahsinahmed
Автор

Can we use PyMC to estimate DSGE models with Bayesian technique?

teshex
Автор

When I ran the sticky baseball example it took 20 mins on my computer! Anyone know what could be wrong? He said it should take seconds.

BillTubbs
Автор

I really enjoyed the Video. Helped me a lot. Is there any document for the presentation?

tobiasmuenchow
Автор

is it called the back-end because its where the shit goes?

JosephKings-jf
Автор

Can i get a book 📖 n Bayesian in python

musiknation
Автор

SOmebody noticed that Mr. Bayes have a very similar face to the speaker?

pablolecce
Автор

languages. Not that you should care at this level :D but it is what it is

haditime