AL NAJJAR Reem

PhD Student

INRIA

reem.al-najjar [at] inria.fr

Short bio

BSc in Biology from the Lebanese American University 2017-2020
MSc in Neuroscience from the American University of Beirut 2020-2022
MSc in Learning Sciences from Université Paris Cité – Learning Planet Institute 2022-2024

Thesis Title

Investigating the Neural and Behavioral Mechanisms of Social Interaction to Better Understand Collaboration Among Peers.

Short Abstract

My research project investigates the neural and behavioral mechanisms underlying peer collaboration in middle childhood, focusing on inter-brain synchrony (IBS) as a key marker of effective social interaction. Using functional near-infrared spectroscopy (fNIRS) hyperscanning, we examine how IBS correlates with conversational behaviors and rapport-building during real-world collaborative tasks. Findings will inform the development of AI-driven virtual peers to enhance educational outcomes by fostering productive collaboration and social bonding among children.

ETLING Sophie

Engineer

Natural language processing

INRIA

Sophie.etling [at] inria.fr

Short bio

Bachelor’s degree in Economics and Management from La Sorbonne University
Master’s degree in Education from Paris-Saclay University

Research project

After several years of experience as a primary school teacher, I joined a research team to work a multimodal hyperscanning project studying collaboration between children. I bring my educational expertise and project management skills to coordinate this research, and, in parallel, I collaborate with the team working on the development of Articulab’s next conversational agent, Son-of-Sara.

SILEM Oussama

Engineer

Natural language processing

INRIA

oussama.silem [at] inria.fr

Short bio

Computer Science Engineer, Higher National School of Computer Science -ESI ex INI- (Algeria)

Research project

The role of rapport in human conversational agent interaction: Modeling conversation to improve task performance in human agent interaction.

Short abstract

I work on adapting large language models for spontaneous conversation and enhancing their ability to establish a social connection with the interlocutor beyond merely transmitting information. I investigate ways to enable these models to better replicate human interaction dynamics, including social factors such as rapport.

ANDREW Judith Jeyafreeda

Postdoctoral researcher

Natural language processing

Institut Imagine

judithjeyafreeda [at] gmail.com

Short bio

PhD, Université de Caen, Normandie

Research project

Extracting Temporal relations within clinical text documents.

Short abstract

Clinical documents have a mentions of phenotypes with the time they have been identified in a patient. These time frames are not always explicitly stated, thus automatic identification is a challenging task. However, identifying and constructing a time frame from the identification of a phenotype and its evolution can be very useful to further clinical research. I will be developing AI models to automatically identify temporal relationships between phenotypes and time.

Matthieu FUTERAL-PETER

PhD student

Natural language processing

Inria

matthieu.futeral [at] inria.fr

Personal page

Short bio

Msc in Engineering at ENSAE Paris

Thesis title

Exploration of multilingual and multimodal word embeddings.

Short abstract

The aim of this PhD is to study the advantage of visual information for the alignment of word embeddings in several languages, in particular for low-resource languages. Such multimodal embeddings may be particularly useful in scenarios where the text itself is partial or degraded, and where additional context (in the form of video data) could therefore be beneficial.

QUENNELLE Sophie

PhD student

Natural language processing

Hôpital Necker-Enfants Malades

sophie.quennelle [at] protonmail.com

Short bio

Master 2 – Informatique biomédicale, Sorbonne Université, Paris
Docteur en cardiologie – Université Paris Cité

Thesis title

Deep representation of the patient’s electronic health record for clinical event prediction and patient similarity.

Short abstract

Pediatric cardiologist at Necker-Enfants Malades in Paris interested in health data extraction and reuse for clinical research. Her PhD project started in October 2020 supervised by Pr. Anita Burgun and co-supervised by Dr. Antoine Neuraz. Its objective is to propose a deep learning model to provide a reliable representation of the patient electronic health record.

GILMARTIN Emer

Postdoctoral researcher

Natural language processing

Inria

emer.gilmartin [at] inria.fr

Personal page

Short bio

Ph.D, Trinity College Dublin, Ireland. M.Phil, Trinity College Dublin, Ireland
B.E (Mech), NUIG, Ireland

Short abstract of the research project

We are working with groups in Korea to understand and model the effects of interlocutor personality in dialogue. We are building a new model of ‘interpersonality’, how personality related behaviours of each participant in a conversation affect the conversation as a whole.

MOHAPATRA Biswesh

PhD student

Autonomous agents and multi-agent systems

Inria

biswesh.mohapatra [at] inria.fr

Personal page

Short bio

Integrated Master of Technology in Computer Science Engineering from IIIT Bangalore

Research project

Improving multimodal dialogue systems through conversational grounding.

Short abstract

This project plans to dive deep into the issues regarding conversational grounding. The thesis intends to do the following – 1) It will investigate why modern neural networks trained on vast amounts of data are unable to solve the phenomenon of conversational grounding in current dialogue systems. 2) It will investigate old approaches to conversational grounding for neural network-based models. 3) It will look into the role of nonverbal behavior such as eye gaze and head nods in conversational grounding, and how insights from cognitive science studies of these phenomena can be integrated into deep learning approaches. 4) The project aims to then build computational models that take conversational grounding into account and help the state-of-the-art conversational models like BlendorBot [5] or dialoGPT [6] to generate more consistent dialogues. 5) It will also develop methods to quantify and test the phenomenon of conversational grounding. 6) Finally, it will evaluate the success of these models in human-chatbot conversation, by looking at whether users are more successful in human-computer collaborative tasks.

Natural language processing

Natural Language Processing (NLP) is at the intersection between linguistics, computer science and artificial intelligence. It is known to the general public through its applications such as machine translation and fake news detection (to name just a few) and now through generative language models (LMs) such as ChatGPT.

A core part of our NLP activity is dedicated to the design and training of models for various tasks, in order to improve training/inference, efficiency and performance. Model efficiency is relevant from both an ecological and user-oriented perspective, given the high computational cost of training and deploying large LMs and their democratisation. One of the most important aspects of improving model performance (and one of our priorities) is creating data: (i) task-annotated data, (ii) evaluation data and (iii) monolingual data (e.g. the OSCAR corpus project) for training LMs. We cover a wide range of languages, dialects and idiolects, including non-standard language (e.g. user-generated content) and low-resource scenarios, which requires exploring dedicated techniques. This includes training models to generalise better, creating additional resources and doing cross-lingual transferring. We also address how to bring domain knowledge from linguistics to improve large LMs, e.g. by using knowledge-based features to fine-tune and re-rank outputs. Also important is the interpretability of NLP models, to understand the properties learnt, how they work and how they could be improved.

Within PRAIRIE, there are various interactions between NLP and other fields due to the utility of text processing for information extraction, content analysis and accessing data trends. These include health (e.g. the analysis of electronic health records, medical interviews and reported symptoms on social media), politics and sociology (e.g. the analysis of press articles and the study of information spread), literary analysis and production (e.g. poetry generation), conversational and discourse analysis and generation (e.g. the detection and generation of speaker personality in dialogues, and of indirect language such as hedges) and linguistics (the analysis of language structure, literary corpus analysis, diachronic change, etc.). Finally, an emerging topic is the interaction with other modalities including speech, embodied behaviours (such as gestures and facial movements) and images.

NISHIMWE Lydia

Natural language processing

Inria

lydia.nishimwe [at] inria.fr

Short bio

Bachelor of Science in Mathematics and Computer Science, Université Grenoble Alpes
Master of Engineering in Mathematics and Computer Science, École Centrale de Nantes

Thesis topic

Robust Neural Machine Translation.

Short abstract

Neural machine translation models struggle to translate texts that differ from the “standard” data commonly used to train them. In particular, social media texts pose many challenges because they tend to be “noisy”: non-standard use of spelling, grammar and vocabulary; typographical errors; use of emojis, hashtags and at-mentions; etc. I aim to develop new methods to better translate these texts.

LE PRIOL Emma

Phd student

Natural language processing

Université Paris Cité and Kap Code

emmalepriol [at] gmail.com

Short bio

Master’s degree : mathematics (Université Paris – Dauphine) and social sciences (Sciences Po Paris)

Thesis topic

Using NLP to leverage social media data in the study of rare diseases.

Short abstract

My PhD thesis aims at exploring NLP techniques to study the online contents from rare diseases’ patients or their caregivers. The first goal is to better understand natural histories of the studied diseases, and compare the spontaneously reported symptoms to symptoms collected during medical interviews. The other goal is to study how patients’ associations become invested in public policy governance, in particular by acquiring a vast knowledge collectively.

GODEY Nathan

PhD student

Natural language processing

Inria

nathan.godey [at] inria.fr

Personal page

Short bio

Masters of Engineering, Ecole des Ponts

Thesis topic

Cheap and expressive neural contextual representations for textual data.

Short abstract

Neural language models are pre-trained using self-supervised learning to produce contextual representations of text data like words or sentences. These representations are shaped by the pre-training procedure: its data, its task, its optimization scheme, among others. I aim at identifying ways to improve the quality of the text representations by leveraging new pre-training approaches, in order to reduce data and/or compute requirements without quality loss.

DO Salomé

PhD student

Natural language processing

École normale supérieure - PSL

salome.do [at] ens.psl.eu

Short bio

MSc / Engineering degree at ENSAE IP Paris

Thesis topic

Computational Content Analysis Methods for News Frames Prevalence Estimation in the Political Press.

Short abstract

This dissertation aims at providing Computational Content Analysis (CCA) methods for the analysis of News Framing in the political press. First, it aims at creating a french corpus of political press articles and providing human annotations for two news frames identification tasks, derived from the literature on strategic news framing and “horse race” journalism. Second, it aims at exploring the modalities (frame complexity, data quantity and data quality) in which Supervised Machine Learning (SML) methods can “augment” social scientists, i.e. train a model to generalize social scientists’ content analysis (CA) codebook (and subsequent text annotations) so that billions of articles can be analyzed instead of a few hundred. Third, the dissertation aims at evaluating the potential benefits of CCA over CA when it comes to estimating news frames prevalences in a corpus. What justifies using CCA over CA, and is it always justified? I will try to define the conditions on SML models performances under which news frames prevalence estimates are more accurate with CCA than CA.

POURNAKI Armin

PhD Student

Natural language processing

ENS & PSL

pournaki [at] mis.mpg.de

Personal page

Short bio

Master’s degree in Theoretical Physics, 2021, Technical University Berlin

Thesis title

Analysing discourse and semantics through geometric representations.

Short abstract

I explore geometric approaches to language and discourse analysis. Currently, I work on combining methods from network science and natural language processing to gain insights on the mechanisms behind information and knowledge spreading related to climate change.

OLYMPUS DIGITAL CAMERA

BAWDEN Rachel

Natural language processing

Inria

rachel.bawden [at] inria.fr

Personal page

Twitter

Short bio

Researcher (Chargée de recherches) at Inria in the ALMAnaCH project-team since 2020. Previously obtained a PhD from Université Paris-Sud (awarded the ATALA thesis prize) and spent 2 years as a postdoc in the Machine Translation group at the University of Edinburgh.

Topics of interest

Natural language processing, multilinguality, machine translation

Project in Prairie

Rachel Bawden will focus on improving Machine Translation in the face of language variation (texts from different domains, user-generated texts and historical language). Alongside the development of models, she will also explore the interpretability of models in a bid to make them more robust to variation. Finally, she will experiment with the integration of other input modalities (e.g. image and video data), to help tackle ambiguity and scenarios for which the input signal is impoverished or incomplete.

Quote

Huge progress has been seen in Machine Translation in recent years. However, the translation of domain-specific texts (e.g. biomedical and financial), those displaying a high degree of language variation (e.g. social media texts containing spelling errors, acronyms and marks of expressiveness) and other non-standard varieties of language (including dialects and old languages) remains a challenge. Developing models that (i) are robust to variation, (ii) are able to handle the low-resource settings that these scenarios often present and (iii) can incorporate all external context is therefore fundamental to progress in Machine Translation.

Team

NISHIMWE Lydia

PhD student

AL NAJJAR Reem

Short bio

Thesis Title

Short Abstract

ETLING Sophie

Short bio

Research project

SILEM Oussama

Short bio

Research project

Short abstract

ANDREW Judith Jeyafreeda

Short bio

Research project

Short abstract

Matthieu FUTERAL-PETER

Short bio

Thesis title

Short abstract

QUENNELLE Sophie

Short bio

Thesis title

Short abstract

GILMARTIN Emer

Short bio

Short abstract of the research project

MOHAPATRA Biswesh

Short bio

Research project

Short abstract

Natural language processing

MEET THE RESEARCHERS WORKING ON THIS TOPIC

NISHIMWE Lydia

Short bio

Thesis topic

Short abstract

LE PRIOL Emma

Short bio

Thesis topic

Short abstract

GODEY Nathan

Short bio

Thesis topic

Short abstract

DO Salomé

Short bio

Thesis topic

Short abstract

POURNAKI Armin

Short bio

Thesis title

Short abstract

BAWDEN Rachel

Short bio

Topics of interest

Project in Prairie

Quote

Team

LASRI Karim

Short bio

Thesis title

Short abstract

ABULIMITI Alafate

Short bio

Thesis title

Short abstract

SAGOT Benoît

Short bio

Topics of interest

Project in Prairie

Quote

Team

POIBEAU Thierry

Short bio

Topics of interest

Project in Prairie

Quote

Team

CASSELL Justine

Short bio