VERDIER Hippolyte

PhD Student

Institut Pasteur

hverdier [at] pasteur.fr

Short bio

Ingénieur polytechnicien, École polytechnique (Palaiseau)

MPhil in Computational biology, University of Cambridge (UK)

Thesis title

Combine artificial intelligence with high resolution microscopy to better dissect the mechanism of binding and mechanism of action of multi-specific biologics.

Short abstract

Photo-activated localization microscopy (PALM) enables high-resolution recording of single proteins trajectories in live cells, thus providing precious probes of small-scale properties of the protein environment. I use graph neural networks to characterize relevant physical properties of these dynamics, and developed a flexible analysis scheme able to deal both with the diversity of motion types encountered in nature and with the fact that observed trajectories inevitably differ, to some extent, from archetypal theoretic models.

LASRI Karim

PhD Student

L’Ecole normale supérieure - PSL

karim.lasri [at] ens.fr

Short bio

Engineer’s degree at Ecole CentraleSupélec (former Ecole Centrale Paris)

Master’s degree in Cognitive Science at the Ecole Normale Supérieure

Thesis title

Linguistic generalization in transformer-based neural language models.

Short abstract

Transformer-based neural architectures bear lots of promises as they seem to address a wide range of linguistic tasks after learning a language model. However, the level of abstraction they reach after their training is still opaque. My main research focus is understanding better how neural language models generalize. What linguistic properties do these architectures acquire during learning ? How is linguistic information encoded in their intermediate representation spaces?

DO Virginie

PhD Student

Dauphine - PSL

virginie.do [at] dauphine.eu

Short bio

MSc in Applied Mathematics / Diplôme d’Ingénieur – Ecole Polytechnique

MSc in Social Data Science – University of Oxford

Thesis title

Fairness in machine learning: insights from social choice.

Short abstract

Designing fair algorithms has recently appeared as a major issue in machine learning, and more generally in AI, while it has been studied for long in economics, especially in social choice theory.  My goal is to bring together the notions of fairness of the two communities, and leverage the concepts and mathematical tools of social choice to address the new challenges of fairness in machine learning.

ANDRAL Charly

PhD Student

Dauphine - PSL

andral [at] ceremade.dauphine.fr

Short bio

Diplome d’ingénieur – ENSAE Paris

Master Statistics And Machine Learning, Paris Saclay University

Thesis title

Improvement of MCMC methods and adaptation to the Big Data.

Short abstract

MCMC methods can have some difficulties exploring space, especially in high dimensional settings that can occur in a context of Big Data. The goal of my PhD thesis is to find enhancements to MCMC about this exploring issue.

GODARD Charlotte

PhD Student

Institut Pasteur

charlotte.godard [at] pasteur.fr

Short bio

  • Engineer degree – Telecm Physique Strasbourg
  • Master degree in Imaging, Robotics and Engineering for Healthcare – University of Strasbourg

Thesis title

Semi-automatic and amortized developments of transfer function for surgery planning in virtual reality.

Short abstract

Interpretation of medical images, such as MRI or CT-scan, can be challenging for a non-radiologist expert because of the various image quality and of the similarities between different structures of interest. However, surgeons need to understand these images to prepare surgeries and define corresponding anatomical landmarks. As universal segmentation is not possible due to the diversity of images between patients, we focused on the optimization of the visualization process applied only on the raw data. The AVATAR MEDICAL platform uses virtual reality for an intuitive visualization and manipulation of the images. Visualization parameters (color, transparency) are currently defined manually using an user-friendly transfer function desktop interface. The objective of the thesis is to automate the transfer function generation for a faster isolation of the structures of interest in the image, by combining a statistical approach and pre-trained models.

MISHRA Shrey

PhD student

L’Ecole normale supérieure - PSL

Shrey.Mishra [at] ens.fr

Short bio

  • Manipal University (India, BTech)
  • Cesi school of Engineering (Software majors, Ecole de engineer)
  • Munster Technological University (MSc Artificial Intelligence)

Thesis title

Extracting information related to the Scientific Articles published and making a knowledge base out of it, with the application of various AI / Machine learning based techniques.

Short abstract

Every years thousand’s of scientific papers are published in the academia covering various scientific proofs theorems and relations in a form of a Pdf document. I am enrolled in a TheoremKb (A project led by Pierre Senellart) to extract information from the scientific articles while training Machine learning models to identify / relate various documents together based upon the information expressed in the article (including the mathematical proof’s).

KMETZSCH Virgilio

PhD student

INRIA

virgilio.kmetzsch [at] inria.fr

Short bio

MSc in Data Science – Grenoble INP Ensimag & UGA

Thesis title

Multimodal analysis of neuroimaging and transcriptomic data in genetic frontotemporal dementia.

Short abstract

Frontotemporal dementia (FTD) is a devastating neurodegenerative disease with no effective treatments so far. The Paris Brain Institute has assembled one of the largest cohorts worldwide on genetic forms of FTD, comprising multimodal data including neuroimaging (MRI, PET), cognition and transcriptomic (RNA-seq). The present PhD project aims at designing and applying new approaches for integrating multimodal transcriptomic and neuroimaging data, to characterize biomarkers of the presymptomatic phase of the disease, in order to design upcoming therapeutic trials.

TEBOUL Raphaël

PhD Student

INSERM

raphael.teboul [at] inserm.fr

Short bio

Master degree in Engineering at Telecom Paris

Thesis title

Unravelling non-coding driver alterations in cancer with deep learning.

Short abstract

Of the 3 gigabases that constitute the human genome, only about 50 megabases (<2%) encode protein-coding genes. Particular attention has been paid to somatic mutations affecting the coding sequence of these genes, leading to the almost exhaustive characterization of 723 genes implicated in cancer (cancer gene census, COSMIC database, September 2019). By contrast, at the notable exception of TERT promoter mutations that induce the expression of telomerase (a key enzyme necessary for unlimited cell proliferation), very few driver alterations have been identified in the non-coding genome. Analysis of mutation hotspots or known regulatory regions like promoters and enhancers have failed to identify significantly recurrent mutations with a strong transcrptional impact on cancer genes. The main reason for that is the difficulty to predict the functional consequence of non-coding mutations. Although these mutations can alter important regulatory regions and modulate the expression of key cancer genes, there is no established method to predict the transcriptional impact of a non-coding mutation. To fill this gap, we will develop a deep neural network able to predict gene expression based on the local sequence context. Pioneer studies have demonstrated the ability of deep neural networks to learn how to recognize several regulatory motifs from the DNA sequence, including splicing sites, chromatin accessibility and 3D conformation or transcription factor binding sites. More recently, Olga Troyanskaya’s team has developed a deep neural network integrating able to predict, from the DNA sequence, the expression level of genes in a cell-type specific manner, by integrating predictions of chromatin state and transcription factor binding. Once trained, these neural networks are able to predict in silico the regulatory impact of any sequence variant, and are thus extremely valuable assets to identify disease coding variants. Deep learning analysis has been used to identify causal variants in several diseases including autism, but have not yet been applied to cancer. Our hypothesis is that leveraging the power of deep neural network to explore the millions of somatic alterations identified in cancer sequencing projects is a promising approach to uncover the missing driver events involving the non-coding human genome.

ZHOU Anqi

PhD Student

Institut Pasteur

anqi.zhou [at] pasteur.fr

Short bio

  • BSc. Applied Mathematics, BA. Neuroscience
  • MSc. Biotechnology, Brown University, USA

Thesis title

Rapidly identifying therapeutics of Alzheimer’s Disease using millions of Drosophila larvae and amortized inference.

Short abstract

Alzheimer’s Disease (AD) affects millions of people worldwide, yet the limited treatments address only the physiological symptoms instead of the cause of pathogenesis. The goal of this PhD project is to establish a new pipeline for measuring AD phenotypes that leverages the advantages of Drosophila as a model system for circuit studies and links probabilistic behavior to disease progression. The pipeline builds on automated machine learning to rapidly analyze data from millions of larvae.

SAUTY DE CHALON Benoit

PhD student

INRIA

benoit.sauty-de-chalon [at] inria.fr

Short bio

Diplôme ingénieur Ecole Polytechnique

Thesis title

Multimodal modelling of neurodegenerative diseases.

Short abstract

The goal is to find quantitative links between the decay of structural properties of the brain, shown through imaging techniques such as MRI/Pet scans/etc and the decay of cognitive abilities of the patients, shown through cognitive assessment tests. The research focuses on Alzheimer and Parkinson patients.

D’ASCOLI Stéphane

PhD student

L’Ecole normale supérieure - PSL / FAIR Paris

stephane.dascoli [at] gmail.com

Short bio

Master in Theoretical Physics, ENS Paris

Thesis title

Deep learning: from toy models to modern architectures.

Short abstract

My research focuses on understanding how deep neural networks are able to generalize despite being heavily overparametrized. On one hand, I use tools from statistical mechanics to study simple models, and try to understand when and why they overfit. On the other hand, I investigate how different types of inductive biases affect learning, from fully-connected networks to convolutional networks to transformers.

HAIRAULT Adrien

PhD student

PSL

hairault [at] ceremade.dauphine.fr

Short bio

  • MSc in Statistical Science, Oxford University
  • Double licence M.I.A.S.H.S, Université Paris 1 Panthéon-Sorbonne & SciencesPo Paris

Thesis title

Foundations and applications in Bayesian Mixture Modelling.

Short abstract

Mixtures are a popular class of models bridging parametric and non-parametric statistics and, as part of the standard data analysis toolkit, have ubiquitous applications in regression, clustering, machine learning, etc. One of the main goals of this thesis is to ease model selection within such a class of models, in particular by finding efficient ways of computing the marginal likelihood (aka evidence) of semi-parametric models (such as Dirichlet Process Mixtures). We also study the convergence properties of the Bayes Factor when comparing such parametric and semi-parametric models.

ALLOUCHE Tahar

PhD Student

Université Paris-Dauphine and CNRS

tahar.allouche [at] dauphine.eu

Short bio

Mathematical Engineering degree from ENSTA Paris – M2 Optimization from Paris-Saclay university

Thesis title

Learning societal preferences for automated collective decision making.

Short abstract

We study sophisticated models of agents’ preferences as data structure in a learning framework for collective decision aiding. Statistical, computational and epistemic aspects of the preferences are considered in order to thoroughly explore their structure and efficiently infer optimal decisions.

WALLEZ Théophile

PhD Student

INRIA

theophile.wallez [at] inria.fr

Short bio

Master of Computer Science, ENS Ulm

Research project

A verification framework for privacy-preserving machine learning.

Short abstract

Machine learning is known to be hungry for data, which is often private. Recent advances in privacy-preserving machine learning use new cryptographic techniques to avoid exposing private data. However, such cryptographic implementations are error-prone, resulting in information leakage. Therefore, I use the F* software verifier to implement modern multiparty computation protocols, such as SPDZ2k.

DE SEYSSEL Maureen

PhD student

L’Ecole normale supérieure - PSL

Short bio

  • MSc in Speech and Language Processing – University of Edinburgh (United Kingdom)
  • BSc in Psychology – City, University of London (United Kingdom)

Thesis title

Does multilingual input help or hinder early language acquisition? A computational modelling approach.

Short abstract

Experimental studies in bilingual language acquisition are based on the assumption that children separate languages at birth or within months, and that this early ability is essential for successful learning. This would prevent children from mixing languages and learning a multilingual representation that does not correspond to any specific language. This project will test this hypothesis following a reverse-engineering approach by using computational models, which aim to model the ideal learner when faced with input data whose number of languages is a priori unknown. This approach will directly test two aspects of the hypothesis : (1) the premise that it is possible to separate languages before learning them, and (2) the justification that separation is necessary for learning several languages in parallel.

SEBBOUH Othmane

PhD student

ENS

othmane.sebbouh [at] gmail.com

Short bio

  • Master 2 Data Sciences from Ecole Polytechnique
  • Master in Statistics and Economics from ENSAE
  • MsC in Management from ESSEC Business School

Thesis title

Multivariate quantile normalization and applications to machine learning.

Short abstract

Quantile renormalization is a fundamental tool in statistics. It allows univariate data to be modified so that they follow a predetermined distribution (i.e. Gaussian) by means of a monotonic transformation. This normalization has several practical virtues, notably that of removing extreme values and facilitating the training of the parameters of learning models based on these data. The context in which this renormalization is applied is therefore most often static, in the sense that the distribution towards which data are transformed is most often chosen a priori. Recent work [2] has shown that it is possible to make this operation differentiable, and thus to be able to adapt the final distribution as needed in order to improve, in an integrated way, the final result of learning methods. The aim of this thesis is to study theoretically and numerically multi-variate extensions of this approach, with possible applications in genomics.

ROMAIN Manon

PhD student

L’Ecole normale supérieure - PSL

manon.romain [at] inria.fr

Short bio

  • Diplôme de l’Ecole polytechnique
  • MSc of Computational and Mathematical Engineering – Stanford University

Thesis title

Study of causal networks.

Short abstract

Causal inference is very important to a wide range of use from clinical trials to econometrics: we learned that “correlation is not causation” but how can we learn true causal relationships? We will using learning of causal diagrams using the latest advances in optimization. We will also study experimental design, given your current knowledge, how to best use your limited resources to gain insightful causal information (e.g., by doing biological experimentations)?

POULET Pierre-Emmanuel

PhD student

INRIA

pierre-emmanuel.poulet [at] inria.fr

Short bio

Ingénieur de l’Ecole Polytechnique

Master Mathématiques Vision Apprentissage (ENS Paris-Saclay)

Thesis title

Modelling the evolution of a multi-risk profile.

Short abstract

The objective is to develop non-linear mixed-effect models for longitudinal data in the context of neurodegenerative diseases. The models will then be used for prediction of the evolution of different medical observations and/or diagnosis.

MOREL Rudy

PhD Student

L’Ecole normale supérieure - PSL

rudy.morel [at] ens.fr

Short bio

  • MS in Probabilities and Finance (ex DEA El Karoui) from UPMC
  • BSc in Mathematics from École Normale Supérieure of Rennes

Thesis title

Modelling of multiple time series with learning of the structure across series.

Short abstract

Many phenomena observed in nature can be described as a collection of time series (component of an audio recording, pixels of a video over time, economic agents of a complex system). The goal of this PhD is to model multiple time series and to learn the structure across series.

MERAD Ibrahim

PhD Student

Université de Paris

ibrahim.merad [at] etu.u-paris.fr

Short bio

Master 2 MVA , ENS Paris-Saclay

Thesis title

Apprentissage non-supervisé de représentations et applications en santé (Unsupervised representation learning and applications in healthcare).

Short abstract

Unsupervised representation learning has recently caught up with the performance of supervised approaches thanks to the introduction of contrastive methods. The mathematical study of these new methods is essential to better understand and exploit them as well as provide guarantees on their performance. Their application is especially relevant in healthcare where supervision is commonly lacking.