Tutorial 9: Aggregated Classification Pipelines: Propagating Probabilistic Assumptions

preview_player
Показать описание
Description:

NLP has helped massively scale-up previously small-scale content analyses. Many social scientists train NLP classifiers and then measure social constructs (e.g sentiment) for millions of unlabeled documents which are then used as variables in downstream causal analyses. However, there are many points when one can make hard (non-probabilistic) or soft (probabilistic) assumptions in pipelines that use text classifiers: (a) adjudicating training labels from multiple annotators, (b) training supervised classifiers, and (c) aggregating individual-level classifications at inference time. In practice, propagating these hard versus soft choices down the pipeline can dramatically change the values of final social measurements. In this tutorial, we will walk through data and Python code of a real-world social science research pipeline that uses NLP classifiers to infer many users’ aggregate “moral outrage” expression on Twitter. Along the way, we will quantify the sensitivity of our pipeline to these hard versus soft choices.

Tutorial host: Katherine Keith

Рекомендации по теме