Difference-in-differences | Synthetic Control | Causal Inference in Data Science Part 2

preview_player
Показать описание
This video is the second part of our mini course on application of Causal Inference in data science. We are going to discuss what kind of methods you can use to do Causal Inference with just a few treated units. Two methods are introduced: difference-in-differences and synthetic control.

📚 Resources recommended by Yuan
- Abadie, A. (2021). Using synthetic controls: Feasibility, data requirements, and methodological aspects. Journal of Economic Literature, 59(2), 391-425.
🟢Get all my free data science interview resources

// Comment
Got any questions? Something to add?
Write a comment below to chat.

// Let's connect on LinkedIn:

====================
Contents of this video:
====================
00:00 How to measure COVID's Impact on the Economy
08:13 Difference-in-Differences
14:47 Synthetic Control
24:17 Summary
Рекомендации по теме
Комментарии
Автор

This channel and this video is sooo under-rated!

sophial.
Автор

Fantastic lecture! The introduction to DiD method is really very intuitive. It is one of the best explanations, to my experience.

pavlobilinskyi
Автор

Wouldn't covid be a bad use of DD because it was worldwide? There are limited economies that were unaffected that can be used as a counterfactual

escargot
Автор

fantastic lecture! thanks Yuan and Emma!

xinyaohui
Автор

This video clears a lot of questions in my mind. Thank you!

junqichen
Автор

Thank you both (and everyone behind) for that video. Strong didactic structure, very very good for starting and to understand. It's a bit sad, that the guy seemed to be very nervous while talking, even though he knows very very good what he is talking about! You did very well, though!

nicolasheinz
Автор

shouldnt the common trend for uber be a proportion? shouldnt the average trip duration be expected to go up by the same fraction in new york and san francisco? How do you decide that the common trend is an absolute change that is matched in San Francisco and New York? Also, why is there a discontinuity in the blue curve? Is that needed to discuss diff in diff? Surely a discontinuity in gradient is already enough? What does it mean even to have an explicit data point at the boundary?

gabrieldurkin
Автор

I think we forgot to answer the original question of "how did COVID impact our economy"? I'd probably not use Diff-in-diff to answer that but use an event study design. The whole world was impacted by COVID so it's difficult to find an appropriate control. For example what country is matchable to USA that was not impacted by COVID? An event study allows us to predict the counterfactual in this case and then compare with actual. The residual is our effect size.

TheBjjninja
Автор

one question i had is why we need to do the counterfactual prediction on the donor pool (similar cities) instead of using the treatment city's own historical data before the treatment to predict the counterfactuals for the period of interest?

andreaxue
Автор

for the uber case, what is the argument of NOT using A/B test? (or is it just for the example's case) thanks!

the_teemo
Автор

Oh my god was this useful. Thank you so much for planning it out and recording it! Amazing job.

itsBlue
Автор

Synthetic controls are pretty much the big brother of difference-in-differences. You can do so much more with SCM that you can't really do with DD. For example.... I'm writing a synthetic control command for Stata, and it uses LASSO or Ridge to automate donor/variable selection, and this method already outperforms classic SCM. I've even gotten it to do staggered implementation as well as placebo inference, and the best thing is that you only need outcome data, you don't need a long list of covariates to measure the counterfactual.

jaredgreathouse
Автор

Phenomenon like pandemic that occurs rarely but has large scale effects are explained by Power Law. Self-organizing criticality is what it's called.

sumangautam
Автор

I would kindly argue that DiD and Synthetic Controls suffer from the same pitfalls as standard statistical controls. When these two methods are employed within observational designs, confounding can be introduced if the two groups of interest are not balanced on key covariates. We employ methods like Counterfactuals (Propensity Score adjustments) as a way to balance or equal the two groups, which then can be analyzed within the eye toward providing supportive or disconfirming evidence. Synthetic controls also can suffer from confounding likely unobserved. Because the confounding is unobserved, you cannot use Propensity methods, and instead must use something more like instrumental variable methods.

McDreamyn_mdphd
Автор

By using synthetic control, we target to meet the common trend assumption as required by Difference in Differences.

PeakWuNeverSurrender
Автор

Good Job Guys!!!is it possible you do a vedio on the commands used in SCM?

percytaabazuing
Автор

I have a question that many people may be confused as well: Other than cases where one event being estimated happened in the history, in what else cases do we feel that it is better to use DID than AB testing to estimate an effect?

jaden