MSc / Engineering degree at ENSAE IP Paris
Computational Content Analysis Methods for News Frames Prevalence Estimation in the Political Press.
This dissertation aims at providing Computational Content Analysis (CCA) methods for the analysis of News Framing in the political press. First, it aims at creating a french corpus of political press articles and providing human annotations for two news frames identification tasks, derived from the literature on strategic news framing and “horse race” journalism. Second, it aims at exploring the modalities (frame complexity, data quantity and data quality) in which Supervised Machine Learning (SML) methods can “augment” social scientists, i.e. train a model to generalize social scientists’ content analysis (CA) codebook (and subsequent text annotations) so that billions of articles can be analyzed instead of a few hundred. Third, the dissertation aims at evaluating the potential benefits of CCA over CA when it comes to estimating news frames prevalences in a corpus. What justifies using CCA over CA, and is it always justified? I will try to define the conditions on SML models performances under which news frames prevalence estimates are more accurate with CCA than CA.