GODEY Nathan

PhD student

Inria

nathan.godey [at] inria.fr

Short bio

Masters of Engineering, Ecole des Ponts

Thesis topic

Cheap and expressive neural contextual representations for textual data.

Short abstract

Neural language models are pre-trained using self-supervised learning to produce contextual representations of text data like words or sentences. These representations are shaped by the pre-training procedure: its data, its task, its optimization scheme, among others. I aim at identifying ways to improve the quality of the text representations by leveraging new pre-training approaches, in order to reduce data and/or compute requirements without quality loss.

DO Salomé

PhD student

École normale supérieure - PSL

salome.do [at] ens.psl.eu

Short bio

MSc / Engineering degree at ENSAE IP Paris

Thesis topic

Computational Content Analysis Methods for News Frames Prevalence Estimation in the Political Press.

Short abstract

This dissertation aims at providing Computational Content Analysis (CCA) methods for the analysis of News Framing in the political press. First, it aims at creating a french corpus of political press articles and providing human annotations for two news frames identification tasks, derived from the literature on strategic news framing and “horse race” journalism. Second, it aims at exploring the modalities (frame complexity, data quantity and data quality) in which Supervised Machine Learning (SML) methods can “augment” social scientists, i.e. train a model to generalize social scientists’ content analysis (CA) codebook (and subsequent text annotations) so that billions of articles can be analyzed instead of a few hundred. Third, the dissertation aims at evaluating the potential benefits of CCA over CA when it comes to estimating news frames prevalences in a corpus. What justifies using CCA over CA, and is it always justified? I will try to define the conditions on SML models performances under which news frames prevalence estimates are more accurate with CCA than CA.

POURNAKI Armin

PhD Student

ENS & PSL

pournaki [at] mis.mpg.de

Short bio

Master’s degree in Theoretical Physics, 2021, Technical University Berlin

Thesis title

Analysing discourse and semantics through geometric representations.

Short abstract

I explore geometric approaches to language and discourse analysis. Currently, I work on combining methods from network science and natural language processing to gain insights on the mechanisms behind information and knowledge spreading related to climate change.

BAWDEN Rachel

Natural Language Processing

Inria

rachel.bawden [at] inria.fr

OLYMPUS DIGITAL CAMERA

Short bio

Researcher (Chargée de recherches) at Inria in the ALMAnaCH project-team since 2020. Previously obtained a PhD from Université Paris-Sud (awarded the ATALA thesis prize) and spent 2 years as a postdoc in the Machine Translation group at the University of Edinburgh.

Topics of interest

Natural language processing, multilinguality, machine translation

Project in Prairie

Rachel Bawden will focus on improving Machine Translation in the face of language variation (texts from different domains, user-generated texts and historical language). Alongside the development of models, she will also explore the interpretability of models in a bid to make them more robust to variation. Finally, she will experiment with the integration of other input modalities (e.g. image and video data), to help tackle ambiguity and scenarios for which the input signal is impoverished or incomplete.

Quote

Huge progress has been seen in Machine Translation in recent years. However, the translation of domain-specific texts (e.g. biomedical and financial), those displaying a high degree of language variation (e.g. social media texts containing spelling errors, acronyms and marks of expressiveness) and other non-standard varieties of language (including dialects and old languages) remains a challenge. Developing models that (i) are robust to variation, (ii) are able to handle the low-resource settings that these scenarios often present and (iii) can incorporate all external context is therefore fundamental to progress in Machine Translation.

LASRI Karim

PhD Student

L’Ecole normale supérieure - PSL

karim.lasri [at] ens.fr

Short bio

Engineer’s degree at Ecole CentraleSupélec (former Ecole Centrale Paris)

Master’s degree in Cognitive Science at the Ecole Normale Supérieure

Thesis title

Linguistic generalization in transformer-based neural language models.

Short abstract

Transformer-based neural architectures bear lots of promises as they seem to address a wide range of linguistic tasks after learning a language model. However, the level of abstraction they reach after their training is still opaque. My main research focus is understanding better how neural language models generalize. What linguistic properties do these architectures acquire during learning ? How is linguistic information encoded in their intermediate representation spaces?

SEMINCK Olga

Postdoctoral Researcher

CNRS

olga.seminck [at] cri-paris.org

Short bio

Bachelor Vrije Universiteit (Amsterdam)

Master Université Paris Diderot, Doctorate Sorbonne Paris Cité

Research Topic

Computational Linguistics & Digital Humanities

Short abstract

My research project focuses on using techniques from the domain of Natural Language Processing to answer questions about language and style on big corpora of literary texts. Currently, I am working on a project about the notion of idiolect. I try to answer the questions 1) How does the idiolect of an author evolve over their lifetime ? and 2) Can we distinguish idiosyncratic changes from general diachronic language evolution?

RIABI Arij

Research Engineer

INRIA

arij.riabi [at] inria.fr

Short bio

Master,  Sorbonne University

Research project

NLP for low-resource, non-standardised language varieties, especially North-African dialectal Arabic written in Latin script.

Short abstract

RAPHALEN Yann

PhD Student

INRIA

yann.raphalen [at] inria.fr

dav

Short bio

Engineer’s degree – Grenoble INP

Thesis title

The role of rapport in human-conversational agent interaction:modeling conversation to improve task performance in human-agent interaction.

Short abstract

How to take into account the dynamics of conversation between humans within the NLG system of an embodied conversational agent? My thesis aims to produce a generator for language in conversation (including prosody, gestures, …) that relies on the theorical basis provided by psycho-linguistics and conversational analysis.

SAGOT Benoît

Natural Language Processing

benoit.sagot [at] inria.fr

Benoit Sagot

Short bio

Research Director at Inria, head of the ALPAGE (2014-2016) and ALMAnaCH (2017-) teams. Co-founder of the Verbatim Analysis (2009-) and opensquare (2016-) Inria start-ups.

Topics of interest

Computational linguistics, Natural Language Processing (NLP), NLP applications.

Project in Prairie

Benoît Sagot will focus on improving and better understanding neural approaches to NLP and integrating linguistic and extra-linguistic contextual information. He will study how non-neural approaches and language resources can contribute to improving neural NLP systems in low-resource and non-edited scenarios. Applications, both academic and industrial, will include computational linguistics and sociolinguistics, opinion mining in survey results, NLP for financial and historical documents, and text simplification to help people with disabilities.

Quote

Most current research in NLP focuses on neural architectures that rely on
large volumes of data, in the form of both raw text and costly annotated corpora. The increasing amount of data necessary to train such models is not available for all languages and can require massive computational resources. Moreover, these approaches are highly sensitive to language variation, illustrated for instance by domain-specific texts, historical documents and non-edited content as found on social media. To address these issues and allow for a wider deployment of NLP technologies, this bottleneck must be overcome. This will require new models that better exploit the complex structure of language and the context in which it is used.

POIBEAU Thierry

Natural Language Processing, Digital Humanities

thierry.poibeau [at] ens.fr

Thierry Poibeau

Short bio

CNRS Research Director, Head of the CNRS Lattice research unit (2012-2018) and adjunct head since 2019. Affiliated lecturer, Language Technology Laboratory, U. of Cambridge since 2009. Rutherford fellowship, Turing institute, London, 2018-2019. Teaching NLP in the PSL Master in Digital Humanities.

Topics of interest

Computational linguistics, Low resource languages, Corpora, Distant reading, AI and creativity

Project in Prairie

Thierry Poibeau’s work focuses on Natural Language Processing. He is especially interested in developing techniques for low resource languages that have largely been left out of the machine learning revolution. He is also interested in applying AI techniques to the study of literature and social sciences, shedding new light on the notions of culture and creativity.

Quote

Natural Language Processing (NLP) has made considerable progress over the last few years, mainly due to impressive advances in machine learning. We have now efficient and accurate tools for 20+ languages, but the vast majority of the world languages lack the resources for state-of-the-art NLP. This is a major challenge for our field, since preserving language and cultural diversity is as important as preserving bio-diversity. Technology is not the only solution, but it helps facilitate this process by leveraging resources, bridging the gap between languages, and enhancing our understanding of culture and society.