Elamrani Aïda

PhD student

Institut Jean Nicod, ENS-PSL & Chargée d’études CNRS

aidaelamrani [at] outlook.fr

Short bio

Master in Theoretical Computer Science / Aix-Marseille Université

Thesis topic

Information in the Interplay Between Mind and Matter.

Short abstract

Chadoutaud Loïc

PhD student

Institute Curie

loic.chadoutaud [at] curie.fr

Short bio

Ingénieur Civil des Mines de Paris  – Mines ParisTech (Master’s degree in Science and Executive Engineering)

Master 2 – Mathématiques, Vision & Apprentissage – ENS Paris-Saclay (MVA Master’s degree)

Thesis topic

Spatial and Temporal Heterogeneity of single cell transcriptomic data.

Short abstract

Spatial transcriptomics is a new kind of technology that allows biologists to measure both transcriptomic information and spatial locations in tissues. My project aims to develop methods (such as clustering, dimensionality reduction algorithm…) for the analysis of such multimodal data. In particular, we are mainly interested in the links between spatial organization and transcriptomic heterogeneity within tissues.

SAMARAN Jules

PhD student

ENS

samaran [at] bio.ens.psl.eu

Short bio

Ingénieur Civil des Mines de Paris  – Mines ParisTech (Master’s degree in Science and Executive Engineering)

Master 2 – Mathématiques, Vision & Apprentissage – ENS Paris-Saclay (MVA Master’s degree)

Thesis topic

Methods for single-cell multimodal integration.

Short abstract

Recent technological advances allow biologists to profile multiple modalities (e.g. gene expression, DNA methylation, chromatin accessibility, etc.) from a single cell. However, such data are still rare and most of the existing single-cell multi-modal data are profiled from different cells (i.e. unpaired data). My project aims at developing integrative dimensionality reduction approaches for unpaired multimodal data (i.e. a collection of monomodal data sets) that are adapted to single-cell data. This tool will enable to cluster cells based on their multimodal similarities, to extract markers from the different modalities and to transfer annotations from one data set to another.

FERMANIAN Adeline

Postdoc

Mines ParisTech

adeline.fermanian [at] mines-paristech.fr

Short bio

PhD in Statistics, Sorbonne Université

Research project

High-dimensional inference in genomic data.

Short abstract

Our goal is to propose new efficient procedures for high-dimensional inference, motivated by applications to high-dimensional genomic data. More specifically, we are interested in identifying regions of the genome associated with a phenotype, through procedures that provide p-values, typically via post-selection inference procedures.

Kmetzsch Virgilio

PhD Student

INRIA

virgilio.kmetzsch [at] inria.fr

Short bio

MSc in Data Science – Grenoble INP Ensimag & UGA

Thesis title

Multimodal analysis of neuroimaging and transcriptomic data in genetic frontotemporal dementia.

Short abstract

Frontotemporal dementia (FTD) is a devastating neurodegenerative disease with no effective treatments so far. The Paris Brain Institute has assembled one of the largest cohorts worldwide on genetic forms of FTD, comprising multimodal data including neuroimaging (MRI, PET), cognition and transcriptomic (RNA-seq). The present PhD project aims at designing and applying new approaches for integrating multimodal transcriptomic and neuroimaging data, to characterize biomarkers of the presymptomatic phase of the disease, in order to design upcoming therapeutic trials.

TEBOUL Raphaël

PhD Student

INSERM

raphael.teboul [at] inserm.fr

Short bio

Master degree in Engineering at Telecom Paris

Thesis title

Unravelling non-coding driver alterations in cancer with deep learning.

Short abstract

Of the 3 gigabases that constitute the human genome, only about 50 megabases (<2%) encode protein-coding genes. Particular attention has been paid to somatic mutations affecting the coding sequence of these genes, leading to the almost exhaustive characterization of 723 genes implicated in cancer (cancer gene census, COSMIC database, September 2019). By contrast, at the notable exception of TERT promoter mutations that induce the expression of telomerase (a key enzyme necessary for unlimited cell proliferation), very few driver alterations have been identified in the non-coding genome. Analysis of mutation hotspots or known regulatory regions like promoters and enhancers have failed to identify significantly recurrent mutations with a strong transcrptional impact on cancer genes. The main reason for that is the difficulty to predict the functional consequence of non-coding mutations. Although these mutations can alter important regulatory regions and modulate the expression of key cancer genes, there is no established method to predict the transcriptional impact of a non-coding mutation. To fill this gap, we will develop a deep neural network able to predict gene expression based on the local sequence context. Pioneer studies have demonstrated the ability of deep neural networks to learn how to recognize several regulatory motifs from the DNA sequence, including splicing sites, chromatin accessibility and 3D conformation or transcription factor binding sites. More recently, Olga Troyanskaya’s team has developed a deep neural network integrating able to predict, from the DNA sequence, the expression level of genes in a cell-type specific manner, by integrating predictions of chromatin state and transcription factor binding. Once trained, these neural networks are able to predict in silico the regulatory impact of any sequence variant, and are thus extremely valuable assets to identify disease coding variants. Deep learning analysis has been used to identify causal variants in several diseases including autism, but have not yet been applied to cancer. Our hypothesis is that leveraging the power of deep neural network to explore the millions of somatic alterations identified in cancer sequencing projects is a promising approach to uncover the missing driver events involving the non-coding human genome.

CAPTIER Nicolas

PhD Student

Institut Curie

nicolas.captier [at] curie.fr

Short bio

M2, École polytechnique – MVA

Thesis title

Multimodal and integrative analysis of genomics, radiomics and pathological data for the prediction of response to immunotherapy in lung cancer.

Short abstract

We aim to develop supervised and unsupervised machine learning methods to identify signatures of the response to immunotherapy in non-small cell lung cancer, through the integration of genomics, radiomics (extracted from PET and CT images) and pathological data. We will interpret them to decipher biological pathways and mechanisms modulating immune responses.

BLASSEL Luc

PhD Student

Institut Pasteur

luc.blassel [at] pasteur.fr

Short bio

Diplome d’ingénieur (AgroParisTech)

Masters in science (Dauphine – PSL)

Thesis title

Big Data and Machine learning for alignment in genomics.

Short abstract

With the ever growing quantity of high-quality sequence data, machine learning is becoming more and more useful in genomics.

The goal of my project is to use machine learning methods to improve alignment of DNA sequencing data on genomes.

BAC Jonathan

PhD Student

Institut Curie

jonathan.bac [at] cri-paris.org

Short bio

Double BA english and french law, Université François Rabelais de Tours

MS Computational biology, Center for Interdisciplinary Research & Université de Paris

Thesis title

Machine learning approaches for the analysis of high-dimensional single-cell data.

Short abstract

Recent advances in genomics allow us to obtain for the first time large amounts of information at the level of individual cells (measuring the expression of thousands of genes for individual cells in a tumor, an organ, an embryo, etc.). My project is to develop machine learning methods to analyze this data and better understand cancer and how living things work.

ZINOVYEV Andrei

Computational biology

andrei.zinovyev [at] curie.fr / Twitter: @SysBioCurie

Andrei Zinovyev

Short bio

Senior permanent researcher at Institut Curie and a scientifi c coordinator of Computational Systems Biology of Cancer group inside the Bioinformatics department (2005-). Postdoctoral fellow at Institut des Hautes Etudes Scientifi ques (IHES) (2001-2005). Habilitation in biology at Ecole Normale Superieur in Paris (2014).

Topics of interest

Machine Learning, Unsupervised learning, High-dimensional geometry, Omics data, Mathematical Modeling, Cancer biology

Project in Prairie

Andrei Zinovyev will focus on developing and adapting methods for learning latent spaces and structures in high-dimensional data, with principal applications to the biomedical data analysis. The main research line inside PRAIRIE will be on learning representations of multi-omics and single cell data. Andrei Zinovyev will implement a teaching course on applications of machine learning in molecular oncology.

Quote

Modern datasets in biology and medicine contain millions of objects (patients, biopsies, tumors, cells) characterized by hundreds of thousands of features such as expression of genes and proteins, properties of DNA or concentration of metabolites. How to use these data in order to make discoveries in biology or propose a better disease treatment? We can learn a lot by investigating the corresponding high-dimensional data point clouds, whose intrinsic geometry is shaped by biological processes, experimental designs and technical biases and is aff ected by the heterogeneity and uncertainty of molecular measurements. With machine learning methods allowing us to explore complex multidimensional data structures, one can tackle the problem of extracting the most relevant part of the information contained in omics data and using it further in the most effi cient way.