SAMARAN Jules
samaran [at] bio.ens.psl.eu
Short bio
Ingénieur Civil des Mines de Paris –
Mines ParisTech (Master’s degree in Science and Executive Engineering)
Master 2 – Mathématiques, Vision &
Apprentissage – ENS Paris-Saclay (MVA Master’s degree)
Thesis topic
Methods for single-cell multimodal integration.
Short abstract
Recent technological advances allow biologists to profile multiple modalities (e.g. gene expression, DNA methylation, chromatin accessibility, etc.) from a single cell. However, such data are still rare and most of the existing single-cell multi-modal data are profiled from different cells (i.e. unpaired data). My project aims at developing integrative dimensionality reduction approaches for unpaired multimodal data (i.e. a collection of monomodal data sets) that are adapted to single-cell data. This tool will enable to cluster cells based on their multimodal similarities, to extract markers from the different modalities and to transfer annotations from one data set to another.
FERMANIAN Adeline
adeline.fermanian [at] mines-paristech.fr
Short bio
PhD in Statistics, Sorbonne Université
Research project
High-dimensional inference in genomic data.
Short abstract
Our goal is to propose new efficient procedures for high-dimensional inference, motivated by applications to high-dimensional genomic data. More specifically, we are interested in identifying regions of the genome associated with a phenotype, through procedures that provide p-values, typically via post-selection inference procedures.
Kmetzsch Virgilio
virgilio.kmetzsch [at] inria.fr
Short bio
MSc in Data Science – Grenoble INP Ensimag & UGA
Thesis title
Multimodal analysis of neuroimaging and transcriptomic data in genetic frontotemporal dementia.
Short abstract
Frontotemporal dementia (FTD) is a devastating neurodegenerative disease with no effective treatments so far. The Paris Brain Institute has assembled one of the largest cohorts worldwide on genetic forms of FTD, comprising multimodal data including neuroimaging (MRI, PET), cognition and transcriptomic (RNA-seq). The present PhD project aims at designing and applying new approaches for integrating multimodal transcriptomic and neuroimaging data, to characterize biomarkers of the presymptomatic phase of the disease, in order to design upcoming therapeutic trials.
TEBOUL Raphaël
raphael.teboul [at] inserm.fr
Short bio
Master degree in Engineering at Telecom Paris
Thesis title
Unravelling non-coding driver alterations in cancer with deep learning.
Short abstract
Of the 3 gigabases that constitute the human genome, only about 50 megabases (<2%) encode protein-coding genes. Particular attention has been paid to somatic mutations affecting the coding sequence of these genes, leading to the almost exhaustive characterization of 723 genes implicated in cancer (cancer gene census, COSMIC database, September 2019). By contrast, at the notable exception of TERT promoter mutations that induce the expression of telomerase (a key enzyme necessary for unlimited cell proliferation), very few driver alterations have been identified in the non-coding genome. Analysis of mutation hotspots or known regulatory regions like promoters and enhancers have failed to identify significantly recurrent mutations with a strong transcrptional impact on cancer genes. The main reason for that is the difficulty to predict the functional consequence of non-coding mutations. Although these mutations can alter important regulatory regions and modulate the expression of key cancer genes, there is no established method to predict the transcriptional impact of a non-coding mutation. To fill this gap, we will develop a deep neural network able to predict gene expression based on the local sequence context. Pioneer studies have demonstrated the ability of deep neural networks to learn how to recognize several regulatory motifs from the DNA sequence, including splicing sites, chromatin accessibility and 3D conformation or transcription factor binding sites. More recently, Olga Troyanskaya’s team has developed a deep neural network integrating able to predict, from the DNA sequence, the expression level of genes in a cell-type specific manner, by integrating predictions of chromatin state and transcription factor binding. Once trained, these neural networks are able to predict in silico the regulatory impact of any sequence variant, and are thus extremely valuable assets to identify disease coding variants. Deep learning analysis has been used to identify causal variants in several diseases including autism, but have not yet been applied to cancer. Our hypothesis is that leveraging the power of deep neural network to explore the millions of somatic alterations identified in cancer sequencing projects is a promising approach to uncover the missing driver events involving the non-coding human genome.
CAPTIER Nicolas
nicolas.captier [at] curie.fr
Short bio
M2, École polytechnique – MVA
Thesis title
Multimodal and integrative analysis of genomics, radiomics and pathological data for the prediction of response to immunotherapy in lung cancer.
Short abstract
We aim to develop supervised and unsupervised machine learning methods to identify signatures of the response to immunotherapy in non-small cell lung cancer, through the integration of genomics, radiomics (extracted from PET and CT images) and pathological data. We will interpret them to decipher biological pathways and mechanisms modulating immune responses.
BLASSEL Luc
luc.blassel [at] pasteur.fr
Short bio
Diplome d’ingénieur (AgroParisTech)
Masters in science (Dauphine – PSL)
Thesis title
Big Data and Machine learning for alignment in genomics.
Short abstract
With the
ever growing quantity of high-quality sequence data, machine learning is
becoming more and more useful in genomics.
The goal of
my project is to use machine learning methods to improve alignment of DNA
sequencing data on genomes.
BAC Jonathan
jonathan.bac [at] cri-paris.org
Short bio
Double BA
english and french law, Université François Rabelais de Tours
MS
Computational biology, Center for Interdisciplinary Research & Université
de Paris
Thesis title
Machine learning approaches for the analysis of high-dimensional single-cell data.
Short abstract
Recent advances in genomics allow us to obtain for the first time large amounts of information at the level of individual cells (measuring the expression of thousands of genes for individual cells in a tumor, an organ, an embryo, etc.). My project is to develop machine learning methods to analyze this data and better understand cancer and how living things work.
ZINOVYEV Andrei
andrei.zinovyev [at] curie.fr / Twitter: @SysBioCurie
Short bio
Senior permanent researcher at Institut Curie and a scientifi c coordinator of Computational Systems Biology of Cancer group inside the Bioinformatics department (2005-). Postdoctoral fellow at Institut des Hautes Etudes Scientifi ques (IHES) (2001-2005). Habilitation in biology at Ecole Normale Superieur in Paris (2014).
Topics of interest
Machine Learning, Unsupervised learning, High-dimensional geometry, Omics data, Mathematical Modeling, Cancer biology
Project in Prairie
Andrei Zinovyev will focus on developing and adapting methods for learning latent spaces and structures in high-dimensional data, with principal applications to the biomedical data analysis. The main research line inside PRAIRIE will be on learning representations of multi-omics and single cell data. Andrei Zinovyev will implement a teaching course on applications of machine learning in molecular oncology.
Quote
Modern datasets in biology and medicine contain millions of objects (patients, biopsies, tumors, cells) characterized by hundreds of thousands of features such as expression of genes and proteins, properties of DNA or concentration of metabolites. How to use these data in order to make discoveries in biology or propose a better disease treatment? We can learn a lot by investigating the corresponding high-dimensional data point clouds, whose intrinsic geometry is shaped by biological processes, experimental designs and technical biases and is aff ected by the heterogeneity and uncertainty of molecular measurements. With machine learning methods allowing us to explore complex multidimensional data structures, one can tackle the problem of extracting the most relevant part of the information contained in omics data and using it further in the most effi cient way.
LETOUZÉ Eric
eric.letouze [at] inserm.fr
Short bio
Senior INSERM researcher, leader of the computational biology group within the « Functional Genomics of Solid Tumors » team at Cordeliers Research Center. INSERM excellence award (2015). Institut Necker Fondation Tourre best post-doctoral student award (2015).
Topics of interest
Cancer genomics, bioinformatics, machine learning
Project in Prairie
Discovering cancer-causing mutations using deep learning approaches. Eric Letouzé will develop deep learning approaches to predict celltype specific regulatory features of gene expression, splicing and translation from the DNA sequence, and use these tools to discover new driver events among the millions of non-coding mutations identified in human cancer genomes.
Quote
Next-generation sequencing technologies have allowed the identification of millions of mutations in tens of thousands of tumor samples. Yet, the vast majority of driver mutations functionally associated with cancer development lies within <2% of the genome encoding protein-coding genes. Although non-coding mutations can dramatically modulate the expression of cancer genes, predicting their precise functional impact remains extremely challenging. By developing deep neural networks able to learn regulatory features from the DNA sequence, we will be able to predict which mutations are likely to alter the expression of oncogenes and tumor suppressors, and unravel the missing drivers within the huge pancancer mutation catalogues available in public databases.
GASCUEL Olivier
olivier.gascuel [at] mnhn.fr
Short bio
Research Director at CNRS (ISYEB – Muséum National d’Histoire Naturelle). Head of the Department of Computational Biology at Institut Pasteur (2015-2020). Associate Editor of Systematic Biology. Fast Breaking Paper 2005 and Current Classic in Environment and Ecology from 2007 to 2011 (most cited paper in the field, Science Watch – Thomson Reuters). Silver Medal in Computer Science of the CNRS, 2009. Grand Prix Inria – Académie des Sciences for Numerical Sciences, 2017.
Topics of interest
Computational biology, genomics, evolution, pathogens
Project in Prairie
Olivier Gascuel’s research will focus on the analysis of genomic data. Modeling, statistical/deep learning and algorithmics will be combined to take advantage of the evolutionary relationships among sequences and solve key questions on the function of pathogenic genes, the emergence of drug resistances, and the dynamics of epidemics. He will develop interdisciplinary courses intended to a wide audience.
Quote
The amount of genomic data is increasing exponentially rate. These data contain a wealth of information on diseases, biodiversity, and many other important societal issues. The analysis of these data imposes constantly renewed challenges, on the algorithmic level and that of modeling. We are helped in this task by the traces left by evolution in the genes and genomes of species, as predicted by Theodosius Dobzhansky in his famous sentence “Nothing in biology makes sense except in the light of evolution” (1973). Evolutionary approaches combined with the latest advances in AI, especially deep learning, will be key to harnessing today’s and tomorrow’s genomic data, and solving key questions in biology and health.