Andrew Gelman: Introduction to Bayesian Data Analysis and Stan with Andrew Gelman

Показать описание

Stan is a free and open-source probabilistic programming language and Bayesian inference engine. In this talk, we will demonstrate the use of Stan for some small problems in sports ranking, nonlinear regression, mixture modeling, and decision analysis, to illustrate the general idea that Bayesian data analysis involves model building, model fitting, and model checking. One of our major motivations in building Stan is to efficiently fit complex models to data, and Stan has indeed been used for this purpose in social, biological, and physical sciences, engineering, and business. The purpose of the present webinar is to demonstrate using simple examples how one can directly specify and fit models in Stan and make logical decisions under uncertainty.

Andrew Gelman is a professor of statistics and political science at Columbia University. He has received the Outstanding Statistical Application award three times from the American Statistical Association, the award for best article published in the American Political Science Review, and the Council of Presidents of Statistical Societies award for outstanding contributions by a person under the age of 40. His books include Bayesian Data Analysis (with John Carlin, Hal Stern, David Dunson, Aki Vehtari, and Don Rubin), Teaching Statistics: A Bag of Tricks (with Deb Nolan), Data Analysis Using Regression and Multilevel/Hierarchical Models (with Jennifer Hill), Red State, Blue State, Rich State, Poor State: Why Americans Vote the Way They Do (with David Park, Boris Shor, and Jeronimo Cortina), A Quantitative Tour of the Social Sciences (co-edited with Jeronimo Cortina), and Regression and Other Stories (with Jennifer Hill and Aki Vehtari).

Рекомендации по теме

Комментарии

That golf putting model is just about the coolest thing ever.

KyPaMac

I know this is a rather old video but it is still highly relevant and useful. At 47:02 I don't think a standard EV calculation really does that situation justice. With high payout/low loss situations like that I think it is better to weight the payouts by their utility. For example losing $10 may have basically no subjective utility loss when compared to the subjective utility gained from having $100k. Lets say that to me having $100k has 20k times as much ultility as losing $10 does. When you switch from an ev calculation based on the win/loss to a subjective ultility of the payouts there is a drastic increase in the EV(although still negative in this case). e.g:
win=10000 dollars
lose = 10 dollars
~-$9.46 dollars
win_util = 20000 utility points or "utils"
lose_util = 1 utils
utils

This is a simple example and we could for sure argue about the subjective utility values but I think overall it shows that the normal EV calculation doesn't really do the situation justice when you think about the utility of the win versus the utility of the loss. One could also flip this around and talk about the subjective utility of losing samples versus winning samples. Like say this was overall +ev but that the subjective value in winning so rarely was less than the subjective value loss from losing so often.

I got this concept from game theory. There are plenty of examples, especially in poker, where doing something that is -ev right now could lead to a +ev situation later on. Poker players call that implied ev and an example could be calling with some sort of draw when the current raw pot odds don't justify it but you know that when you do make your hand that the profits gained will make up for the marginal loss now. So for example lets say I have some idea for a product or service that would earn $50k a year off a $100k investment. With using a fairly standard 10x income for valuation estimation I could say the subjective utility of winning that 100k is actually worth 50k utility points versus the 10k utility points implied by an even weighting. This specific situation would still be -ev though.

All of that leads me down the path of seriously doubting most of rational economics.

crypticnomad

Thanks a lot for making this code available for download. That was really helpful for getting started in Stan.

RobetPaulG

Wow! Brilliant - this really helped me a lot. Thank you.

macanbhaird

Also really liked the golf putt example.

johnnyedwards

Wondered why we have '-1' in "2 * Phi(asin((R - r) / sigma) - 1" in the golf example model. (Min 51:38)

mehmetb

Just beginning to learn about Bayesian analysis ... thanks for the great video and everyone for links in comments ...

Question: Is it correct to say that, in the world cup example 1, the only variables that are calculated by Stan are: b real, sigma_a >=0, and sigma_y >=0?
In other words, Stan figures out (simultaneously/jointly):
(1) the best b and sigma_a for the equation a = b*prior_scores + sigma_a*[eta_a ~ N(0, 1)]
(2) the best sigma_y so that student_t(df = 7, a[team_1] - a[team_2], sigma_y) best predicts ~ sqrt_adjusted[score(team_1) - score(team_2)]

That seems kind of weird to me that after we figure out the formula for a, it kind of boils down to just one parameter = sigma_y

josephjohns

Thanks for the great presentation and explanations on real models.

This made me laugh: "working with live posterior"

usptact

Gelman is quite nice to listen to. His RL voice sounds different from his blog voice somehow

emf

Prof. Gelman: At 19:00 you talk about checking how the model fits the data; are there any tools in Stan to avoid overfitting?

NikStar

At 16:27 you talk about checking predictive posterior distributions for games against their actual results to check if they are within their respective 95% CIs. Are these games training data or unseen data?

yoij-ovsd

Thanks for this. Any link to the slides?

erwinbanez

What was the bug he fixed? I want to know how he solved the problem.

JesseFagan

"Soccer games are random, it all depends how good the acting is"

mattn

Andrew Gelman: Introduction to Bayesian Data Analysis and Stan with Andrew Gelman

Andrew Gelman: Introduction to Bayesian Data Analysis and Stan with Andrew Gelman

Andrew Gelman - Bayes, statistics, and reproducibility (Rutgers, Foundations of Probability)

Andrew Gelman - Bayesian Methods in Causal Inference and Decision Making

Introduction to Bayesian statistics, part 1: The basic concepts

MRI Together 2021 - B1 (Atlantic) - Bayesian Statistics and Reproducible Science (Andrew Gelman)

Bayesian Statistics: An Introduction

Andrew Gelman - Solve All Your Statistics Problems Using P-Values

02 Andrew Gelman

Introduction to Bayesian data analysis - part 1: What is Bayes?

Andrew Gelman at the Data Science Lecture Series 'What is Data Science?'

Introduction to Bayesian Analysis

Andrew Gelman: Better than difference-in-differences

Keynote 2: Weakly Informative Priors -- Andrew Gelman

#106 Active Statistics, Two Truths & a Lie, with Andrew Gelman

Stan tutorial for beginners in ~6 mins: Bayesian Data Analysis Software

An Introduction to Bayesian Analysis 2016

Andrew Gelman- When You do Applied Statistics, You're Acting Like a Scientist. Why Does this ma...

1 Introduction to Bayesian Statistics

shinyStan for beginners in 2 mins: Bayesian Data Analysis Software

Andrew Gelman: How Stats & Data Figure In Life

#27 Modeling the US Presidential Elections, with Andrew Gelman & Merlin Heidemanns

ISBA World Meeting 2024 - 07/03/24, 11:00 am - 12:00 pm Foundation Lecture: Andrew Gelman

Fundamentals of Bayesian Data Analysis in R - Introduction to the course

Data, Modeling, and Uncertainty Amidst the Forking Paths - Andrew Gelman Interview - The Filter Ep12