ANDREW Judith Jeyafreeda
judithjeyafreeda [at] gmail.com
Short bio
PhD, Université de Caen, Normandie
Research project
Extracting Temporal relations within clinical text documents.
Short abstract
Clinical documents have a mentions of phenotypes with the time they have been identified in a patient. These time frames are not always explicitly stated, thus automatic identification is a challenging task. However, identifying and constructing a time frame from the identification of a phenotype and its evolution can be very useful to further clinical research. I will be developing AI models to automatically identify temporal relationships between phenotypes and time.
Matthieu FUTERAL-PETER
matthieu.futeral [at] inria.fr
Short bio
Msc in Engineering at ENSAE Paris
Thesis title
Exploration of multilingual and multimodal word embeddings.
Short abstract
The aim of this PhD is to study the advantage of visual information for the alignment of word embeddings in several languages, in particular for low-resource languages. Such multimodal embeddings may be particularly useful in scenarios where the text itself is partial or degraded, and where additional context (in the form of video data) could therefore be beneficial.
QUENNELLE Sophie
Hôpital Necker-Enfants Malades
sophie.quennelle [at] protonmail.com
Short bio
- Master 2 – Informatique biomédicale, Sorbonne Université, Paris
- Docteur en cardiologie – Université Paris Cité
Thesis title
Deep representation of the patient’s electronic health record for clinical event prediction and patient similarity.
Short abstract
Pediatric cardiologist at Necker-Enfants Malades in Paris interested in health data extraction and reuse for clinical research. Her PhD project started in October 2020 supervised by Pr. Anita Burgun and co-supervised by Dr. Antoine Neuraz. Its objective is to propose a deep learning model to provide a reliable representation of the patient electronic health record.
GILMARTIN Emer
emer.gilmartin [at] inria.fr
Short bio
- Ph.D, Trinity College Dublin, Ireland. M.Phil, Trinity College Dublin, Ireland
- B.E (Mech), NUIG, Ireland
Short abstract of the research project
We are working with groups in Korea to understand and model the effects of interlocutor personality in dialogue. We are building a new model of ‘interpersonality’, how personality related behaviours of each participant in a conversation affect the conversation as a whole.
MOHAPATRA Biswesh
biswesh.mohapatra [at] inria.fr
Short bio
Integrated Master of Technology in Computer Science Engineering from IIIT Bangalore
Research project
Improving multimodal dialogue systems through conversational grounding.
Short abstract
This project plans to dive deep into the issues regarding conversational grounding. The thesis intends to do the following – 1) It will investigate why modern neural networks trained on vast amounts of data are unable to solve the phenomenon of conversational grounding in current dialogue systems. 2) It will investigate old approaches to conversational grounding for neural network-based models. 3) It will look into the role of nonverbal behavior such as eye gaze and head nods in conversational grounding, and how insights from cognitive science studies of these phenomena can be integrated into deep learning approaches. 4) The project aims to then build computational models that take conversational grounding into account and help the state-of-the-art conversational models like BlendorBot [5] or dialoGPT [6] to generate more consistent dialogues. 5) It will also develop methods to quantify and test the phenomenon of conversational grounding. 6) Finally, it will evaluate the success of these models in human-chatbot conversation, by looking at whether users are more successful in human-computer collaborative tasks.
Natural language processing
Natural Language Processing (NLP) is at the intersection between linguistics, computer science and artificial intelligence. It is known to the general public through its applications such as machine translation and fake news detection (to name just a few) and now through generative language models (LMs) such as ChatGPT.
A core part of our NLP activity is dedicated to the design and training of models for various tasks, in order to improve training/inference, efficiency and performance. Model efficiency is relevant from both an ecological and user-oriented perspective, given the high computational cost of training and deploying large LMs and their democratisation. One of the most important aspects of improving model performance (and one of our priorities) is creating data: (i) task-annotated data, (ii) evaluation data and (iii) monolingual data (e.g. the OSCAR corpus project) for training LMs. We cover a wide range of languages, dialects and idiolects, including non-standard language (e.g. user-generated content) and low-resource scenarios, which requires exploring dedicated techniques. This includes training models to generalise better, creating additional resources and doing cross-lingual transferring. We also address how to bring domain knowledge from linguistics to improve large LMs, e.g. by using knowledge-based features to fine-tune and re-rank outputs. Also important is the interpretability of NLP models, to understand the properties learnt, how they work and how they could be improved.
Within PRAIRIE, there are various interactions between NLP and other fields due to the utility of text processing for information extraction, content analysis and accessing data trends. These include health (e.g. the analysis of electronic health records, medical interviews and reported symptoms on social media), politics and sociology (e.g. the analysis of press articles and the study of information spread), literary analysis and production (e.g. poetry generation), conversational and discourse analysis and generation (e.g. the detection and generation of speaker personality in dialogues, and of indirect language such as hedges) and linguistics (the analysis of language structure, literary corpus analysis, diachronic change, etc.). Finally, an emerging topic is the interaction with other modalities including speech, embodied behaviours (such as gestures and facial movements) and images.
NISHIMWE Lydia
lydia.nishimwe [at] inria.fr
Short bio
- Bachelor of Science in
Mathematics and Computer Science, Université Grenoble Alpes
- Master of Engineering in Mathematics and Computer Science, École Centrale de Nantes
Thesis topic
Robust Neural Machine Translation.
Short abstract
Neural machine translation models struggle to translate texts that differ from the “standard” data commonly used to train them. In particular, social media texts pose many challenges because they tend to be “noisy”: non-standard use of spelling, grammar and vocabulary; typographical errors; use of emojis, hashtags and at-mentions; etc. I aim to develop new methods to better translate these texts.
LE PRIOL Emma
Université Paris Cité and Kap Code
emmalepriol [at] gmail.com
Short bio
Master’s degree : mathematics (Université Paris – Dauphine) and social sciences (Sciences Po Paris)
Thesis topic
Using NLP to leverage social media data in the study of rare diseases.
Short abstract
My PhD thesis aims at exploring NLP techniques to study the online contents from rare diseases’ patients or their caregivers. The first goal is to better understand natural histories of the studied diseases, and compare the spontaneously reported symptoms to symptoms collected during medical interviews. The other goal is to study how patients’ associations become invested in public policy governance, in particular by acquiring a vast knowledge collectively.
GODEY Nathan
nathan.godey [at] inria.fr
Short bio
Masters of Engineering, Ecole des Ponts
Thesis topic
Cheap and expressive neural contextual representations for textual data.
Short abstract
Neural language models are pre-trained using self-supervised learning to produce contextual representations of text data like words or sentences. These representations are shaped by the pre-training procedure: its data, its task, its optimization scheme, among others. I aim at identifying ways to improve the quality of the text representations by leveraging new pre-training approaches, in order to reduce data and/or compute requirements without quality loss.
DO Salomé
École normale supérieure - PSL
salome.do [at] ens.psl.eu
Short bio
MSc / Engineering degree at ENSAE IP Paris
Thesis topic
Computational Content Analysis Methods for News Frames Prevalence Estimation in the Political Press.
Short abstract
This dissertation aims at providing Computational Content Analysis (CCA) methods for the analysis of News Framing in the political press. First, it aims at creating a french corpus of political press articles and providing human annotations for two news frames identification tasks, derived from the literature on strategic news framing and “horse race” journalism. Second, it aims at exploring the modalities (frame complexity, data quantity and data quality) in which Supervised Machine Learning (SML) methods can “augment” social scientists, i.e. train a model to generalize social scientists’ content analysis (CA) codebook (and subsequent text annotations) so that billions of articles can be analyzed instead of a few hundred. Third, the dissertation aims at evaluating the potential benefits of CCA over CA when it comes to estimating news frames prevalences in a corpus. What justifies using CCA over CA, and is it always justified? I will try to define the conditions on SML models performances under which news frames prevalence estimates are more accurate with CCA than CA.
Short bio
Master’s degree in Theoretical Physics, 2021, Technical University Berlin
Thesis title
Analysing discourse and semantics through geometric representations.
Short abstract
I explore geometric approaches to language and discourse analysis. Currently, I work on combining methods from network science and natural language processing to gain insights on the mechanisms behind information and knowledge spreading related to climate change.
OLYMPUS DIGITAL CAMERA
BAWDEN Rachel
rachel.bawden [at] inria.fr
Short bio
Researcher (Chargée de recherches) at Inria in the ALMAnaCH project-team since 2020. Previously obtained a PhD from Université Paris-Sud (awarded the ATALA thesis prize) and spent 2 years as a postdoc in the Machine Translation group at the University of Edinburgh.
Topics of interest
Natural language processing, multilinguality, machine translation
Project in Prairie
Rachel Bawden will focus on improving Machine Translation in the face of language variation (texts from different domains, user-generated texts and historical language). Alongside the development of models, she will also explore the interpretability of models in a bid to make them more robust to variation. Finally, she will experiment with the integration of other input modalities (e.g. image and video data), to help tackle ambiguity and scenarios for which the input signal is impoverished or incomplete.
Quote
Huge progress has been seen in Machine Translation in recent years. However, the translation of domain-specific texts (e.g. biomedical and financial), those displaying a high degree of language variation (e.g. social media texts containing spelling errors, acronyms and marks of expressiveness) and other non-standard varieties of language (including dialects and old languages) remains a challenge. Developing models that (i) are robust to variation, (ii) are able to handle the low-resource settings that these scenarios often present and (iii) can incorporate all external context is therefore fundamental to progress in Machine Translation.
Team
NISHIMWE Lydia
PhD student
Matthieu FUTERAL-PETER
PhD student
LASRI Karim
L’Ecole normale supérieure - PSL
Short bio
Engineer’s degree at Ecole CentraleSupélec (former Ecole Centrale Paris)
Master’s degree in Cognitive Science at the Ecole Normale Supérieure
Thesis title
Linguistic generalization in transformer-based neural language models.
Short abstract
Transformer-based neural architectures bear lots of promises as they seem to address a wide range of linguistic tasks after learning a language model. However, the level of abstraction they reach after their training is still opaque. My main research focus is understanding better how neural language models generalize. What linguistic properties do these architectures acquire during learning ? How is linguistic information encoded in their intermediate representation spaces?
ABULIMITI Alafate
alafate.abulimiti [at] inria.fr
Short bio
- Master degree in Big Data Management and Analytic, University of Tours
- Engineer degree in Computer Science, Polytech Tours
- Bachelor degree in Applied Physics, Beijing Institute of Technology
Thesis title
The role of rapport in human-conversational agent interaction: Modeling conversation to improve task performance in human-agent interaction.
Short abstract
Human interaction is a complex process, and understanding and structuring the dynamics of the conversation is a necessary step to produce a virtual agent capable of interacting as a social agent. The rapport is a very important factor in the design of social agents. The social agent chooses the proper conversational actions according to the verbal and non-verbal characteristics of the interlocutor in order to maintain or increase the level of rapport while performing a specific task. During the doctoral program, I will use the decision system supported by game theory and deep reinforcement learning to build different models. Then, by using different metrics, I will evaluate whether the addition of these models can improve agent performance.
SAGOT Benoît
benoit.sagot [at] inria.fr
Short bio
Research Director at Inria, head of the ALPAGE (2014-2016) and ALMAnaCH (2017-) teams. Co-founder of the Verbatim Analysis (2009-) and opensquare (2016-) Inria start-ups.
Topics of interest
Computational linguistics, Natural Language Processing (NLP), NLP applications.
Project in Prairie
Benoît Sagot will focus on improving and better understanding neural approaches to NLP and integrating linguistic and extra-linguistic contextual information. He will study how non-neural approaches and language resources can contribute to improving neural NLP systems in low-resource and non-edited scenarios. Applications, both academic and industrial, will include computational linguistics and sociolinguistics, opinion mining in survey results, NLP for financial and historical documents, and text simplification to help people with disabilities.
Quote
Most current research in NLP focuses on neural architectures that rely on
large volumes of data, in the form of both raw text and costly annotated corpora. The increasing amount of data necessary to train such models is not available for all languages and can require massive computational resources. Moreover, these approaches are highly sensitive to language variation, illustrated for instance by domain-specific texts, historical documents and non-edited content as found on social media. To address these issues and allow for a wider deployment of NLP technologies, this bottleneck must be overcome. This will require new models that better exploit the complex structure of language and the context in which it is used.
Team
GODEY Nathan
PhD student
Masters of Engineering, Ecole des Ponts
Private: CASTAGNÉ Roman
PhD student
Matthieu FUTERAL-PETER
PhD student
POIBEAU Thierry
Natural Language Processing, Digital Humanities
thierry.poibeau [at] ens.fr
Short bio
CNRS Research Director, Head of the CNRS Lattice research unit (2012-2018) and adjunct head since 2019. Affiliated lecturer, Language Technology Laboratory, U. of Cambridge since 2009. Rutherford fellowship, Turing institute, London, 2018-2019. Teaching NLP in the PSL Master in Digital Humanities.
Topics of interest
Computational linguistics, Low resource languages, Corpora, Distant reading, AI and creativity
Project in Prairie
Thierry Poibeau’s work focuses on Natural Language Processing. He is especially interested in developing techniques for low resource languages that have largely been left out of the machine learning revolution. He is also interested in applying AI techniques to the study of literature and social sciences, shedding new light on the notions of culture and creativity.
Quote
Natural Language Processing (NLP) has made considerable progress over the last few years, mainly due to impressive advances in machine learning. We have now efficient and accurate tools for 20+ languages, but the vast majority of the world languages lack the resources for state-of-the-art NLP. This is a major challenge for our field, since preserving language and cultural diversity is as important as preserving bio-diversity. Technology is not the only solution, but it helps facilitate this process by leveraging resources, bridging the gap between languages, and enhancing our understanding of culture and society.
Team
POURNAKI Armin
PhD Student
CASSELL Justine
justine.cassell [at] inria.fr
Short bio
Professor and former Associate Dean, School of Computer Science, Carnegie Mellon University (2010-). Chaire Blaise Pascale and Chaire Sorbonne (2017-2018). On leave from CMU, at Inria since fall 2019. ACM Fellow (2017), Fellow Royal Academy of Scotland (2016), AAAS Fellow (2012), Anita Borg Women of Vision Award (2009). AAMAS test-of-time award (2017). Chair, World Economic Forum Global Agenda Council on Robotics & Smart Devices (2011-2014). Since January 2021 a member of CNNUM (Conseil National du Numérique) – French National Digital Council.
Topics of interest
Natural language processing, human-computer interaction, autonomous and virtual agents, social AI
Project in Prairie
Justine Cassell will address issues at the intersection of NLP, AI, Cognitive Science, and Human-Computer Interaction, employing methods from each of these traditions, and developing new interdisciplinary methods. Her goal is to develop theories, architectures, algorithms, and implementations of embodied conversational agents capable of engaging people in natural dialogue, including both task and social components, language and non-verbal behavior. She will participate in the PSL AI graduate school.
Quote
There is a need for a more human-centered design of AI systems so that they may act as partners and teammates to people rather than their replacements. My work in Social AI attempts to address these design challenges by basing AI agent behavior on a close study of human collaboration and teamwork, thereby working towards fulfilling their societal promise, as well as advancing fundamental areas of AI as diverse as natural language generation and transparency in machine learning.
Team
ABULIMITI Alafate
PhD student
Master degree in Big Data Management and Analytic, University of Tours
Engineer degree in Computer Science, Polytech Tours
Bachelor degree in Applied Physics, Beijing Institute of Technology
BONNAIRE Julie
Engineer
Master degree in Systems Biology at Sorbonne University
MOHAPATRA Biswesh
PhD student
Integrated Master of Technology in Computer Science Engineering from IIIT Bangalore
JENKINS Jade
Engineer
MSc Gerontology Research with distinction from the University of Southampton, UK
BS magna cum laude from the University of New Orleans, USA
GILMARTIN Emer
Postdoctoral researcher
Ph.D, Trinity College Dublin, Ireland. M.Phil, Trinity College Dublin, Ireland
B.E (Mech), NUIG, Ireland