SAMARAN Jules

PhD student

ENS

samaran [at] bio.ens.psl.eu

Short bio

Ingénieur Civil des Mines de Paris  – Mines ParisTech (Master’s degree in Science and Executive Engineering)

Master 2 – Mathématiques, Vision & Apprentissage – ENS Paris-Saclay (MVA Master’s degree)

Thesis topic

Methods for single-cell multimodal integration.

Short abstract

Recent technological advances allow biologists to profile multiple modalities (e.g. gene expression, DNA methylation, chromatin accessibility, etc.) from a single cell. However, such data are still rare and most of the existing single-cell multi-modal data are profiled from different cells (i.e. unpaired data). My project aims at developing integrative dimensionality reduction approaches for unpaired multimodal data (i.e. a collection of monomodal data sets) that are adapted to single-cell data. This tool will enable to cluster cells based on their multimodal similarities, to extract markers from the different modalities and to transfer annotations from one data set to another.

FERMANIAN Adeline

Postdoc

Mines ParisTech

adeline.fermanian [at] mines-paristech.fr

Short bio

PhD in Statistics, Sorbonne Université

Research project

High-dimensional inference in genomic data.

Short abstract

Our goal is to propose new efficient procedures for high-dimensional inference, motivated by applications to high-dimensional genomic data. More specifically, we are interested in identifying regions of the genome associated with a phenotype, through procedures that provide p-values, typically via post-selection inference procedures.

Kmetzsch Virgilio

PhD Student

INRIA

virgilio.kmetzsch [at] inria.fr

Short bio

MSc in Data Science – Grenoble INP Ensimag & UGA

Thesis title

Multimodal analysis of neuroimaging and transcriptomic data in genetic frontotemporal dementia.

Short abstract

Frontotemporal dementia (FTD) is a devastating neurodegenerative disease with no effective treatments so far. The Paris Brain Institute has assembled one of the largest cohorts worldwide on genetic forms of FTD, comprising multimodal data including neuroimaging (MRI, PET), cognition and transcriptomic (RNA-seq). The present PhD project aims at designing and applying new approaches for integrating multimodal transcriptomic and neuroimaging data, to characterize biomarkers of the presymptomatic phase of the disease, in order to design upcoming therapeutic trials.

TEBOUL Raphaël

PhD Student

INSERM

raphael.teboul [at] inserm.fr

Short bio

Master degree in Engineering at Telecom Paris

Thesis title

Unravelling non-coding driver alterations in cancer with deep learning.

Short abstract

Of the 3 gigabases that constitute the human genome, only about 50 megabases (<2%) encode protein-coding genes. Particular attention has been paid to somatic mutations affecting the coding sequence of these genes, leading to the almost exhaustive characterization of 723 genes implicated in cancer (cancer gene census, COSMIC database, September 2019). By contrast, at the notable exception of TERT promoter mutations that induce the expression of telomerase (a key enzyme necessary for unlimited cell proliferation), very few driver alterations have been identified in the non-coding genome. Analysis of mutation hotspots or known regulatory regions like promoters and enhancers have failed to identify significantly recurrent mutations with a strong transcrptional impact on cancer genes. The main reason for that is the difficulty to predict the functional consequence of non-coding mutations. Although these mutations can alter important regulatory regions and modulate the expression of key cancer genes, there is no established method to predict the transcriptional impact of a non-coding mutation. To fill this gap, we will develop a deep neural network able to predict gene expression based on the local sequence context. Pioneer studies have demonstrated the ability of deep neural networks to learn how to recognize several regulatory motifs from the DNA sequence, including splicing sites, chromatin accessibility and 3D conformation or transcription factor binding sites. More recently, Olga Troyanskaya’s team has developed a deep neural network integrating able to predict, from the DNA sequence, the expression level of genes in a cell-type specific manner, by integrating predictions of chromatin state and transcription factor binding. Once trained, these neural networks are able to predict in silico the regulatory impact of any sequence variant, and are thus extremely valuable assets to identify disease coding variants. Deep learning analysis has been used to identify causal variants in several diseases including autism, but have not yet been applied to cancer. Our hypothesis is that leveraging the power of deep neural network to explore the millions of somatic alterations identified in cancer sequencing projects is a promising approach to uncover the missing driver events involving the non-coding human genome.

CAPTIER Nicolas

PhD Student

Institut Curie

nicolas.captier [at] curie.fr

Short bio

M2, École polytechnique – MVA

Thesis title

Multimodal and integrative analysis of genomics, radiomics and pathological data for the prediction of response to immunotherapy in lung cancer.

Short abstract

We aim to develop supervised and unsupervised machine learning methods to identify signatures of the response to immunotherapy in non-small cell lung cancer, through the integration of genomics, radiomics (extracted from PET and CT images) and pathological data. We will interpret them to decipher biological pathways and mechanisms modulating immune responses.

BLASSEL Luc

PhD Student

Institut Pasteur

luc.blassel [at] pasteur.fr

Short bio

Diplome d’ingénieur (AgroParisTech)

Masters in science (Dauphine – PSL)

Thesis title

Big Data and Machine learning for alignment in genomics.

Short abstract

With the ever growing quantity of high-quality sequence data, machine learning is becoming more and more useful in genomics.

The goal of my project is to use machine learning methods to improve alignment of DNA sequencing data on genomes.

BAC Jonathan

PhD Student

Institut Curie

jonathan.bac [at] cri-paris.org

Short bio

Double BA english and french law, Université François Rabelais de Tours

MS Computational biology, Center for Interdisciplinary Research & Université de Paris

Thesis title

Machine learning approaches for the analysis of high-dimensional single-cell data.

Short abstract

Recent advances in genomics allow us to obtain for the first time large amounts of information at the level of individual cells (measuring the expression of thousands of genes for individual cells in a tumor, an organ, an embryo, etc.). My project is to develop machine learning methods to analyze this data and better understand cancer and how living things work.

ZINOVYEV Andrei

Computational biology

andrei.zinovyev [at] curie.fr / Twitter: @SysBioCurie

Andrei Zinovyev

Short bio

Senior permanent researcher at Institut Curie and a scientifi c coordinator of Computational Systems Biology of Cancer group inside the Bioinformatics department (2005-). Postdoctoral fellow at Institut des Hautes Etudes Scientifi ques (IHES) (2001-2005). Habilitation in biology at Ecole Normale Superieur in Paris (2014).

Topics of interest

Machine Learning, Unsupervised learning, High-dimensional geometry, Omics data, Mathematical Modeling, Cancer biology

Project in Prairie

Andrei Zinovyev will focus on developing and adapting methods for learning latent spaces and structures in high-dimensional data, with principal applications to the biomedical data analysis. The main research line inside PRAIRIE will be on learning representations of multi-omics and single cell data. Andrei Zinovyev will implement a teaching course on applications of machine learning in molecular oncology.

Quote

Modern datasets in biology and medicine contain millions of objects (patients, biopsies, tumors, cells) characterized by hundreds of thousands of features such as expression of genes and proteins, properties of DNA or concentration of metabolites. How to use these data in order to make discoveries in biology or propose a better disease treatment? We can learn a lot by investigating the corresponding high-dimensional data point clouds, whose intrinsic geometry is shaped by biological processes, experimental designs and technical biases and is aff ected by the heterogeneity and uncertainty of molecular measurements. With machine learning methods allowing us to explore complex multidimensional data structures, one can tackle the problem of extracting the most relevant part of the information contained in omics data and using it further in the most effi cient way.

LETOUZÉ Eric

Cancer genomics

eric.letouze [at] inserm.fr

Eric Letouze

Short bio

Senior INSERM researcher, leader of the computational biology group within the « Functional Genomics of Solid Tumors » team at Cordeliers Research Center. INSERM excellence award (2015). Institut Necker Fondation Tourre best post-doctoral student award (2015).

Topics of interest

Cancer genomics, bioinformatics, machine learning

Project in Prairie

Discovering cancer-causing mutations using deep learning approaches. Eric Letouzé will develop deep learning approaches to predict celltype specific regulatory features of gene expression, splicing and translation from the DNA sequence, and use these tools to discover new driver events among the millions of non-coding mutations identified in human cancer genomes.

Quote

Next-generation sequencing technologies have allowed the identification of millions of mutations in tens of thousands of tumor samples. Yet, the vast majority of driver mutations functionally associated with cancer development lies within <2% of the genome encoding protein-coding genes. Although non-coding mutations can dramatically modulate the expression of cancer genes, predicting their precise functional impact remains extremely challenging. By developing deep neural networks able to learn regulatory features from the DNA sequence, we will be able to predict which mutations are likely to alter the expression of oncogenes and tumor suppressors, and unravel the missing drivers within the huge pancancer mutation catalogues available in public databases.

GASCUEL Olivier

Computational Biology

olivier.gascuel [at] mnhn.fr

Olivier Gascuel

Short bio

Research Director at CNRS (ISYEB – Muséum National d’Histoire Naturelle). Head of the Department of Computational Biology at Institut Pasteur (2015-2020). Associate Editor of Systematic Biology. Fast Breaking Paper 2005 and Current Classic in Environment and Ecology from 2007 to 2011 (most cited paper in the field, Science Watch – Thomson Reuters). Silver Medal in Computer Science of the CNRS, 2009. Grand Prix Inria – Académie des Sciences for Numerical Sciences, 2017.

Topics of interest

Computational biology, genomics, evolution, pathogens

Project in Prairie

Olivier Gascuel’s research will focus on the analysis of genomic data. Modeling, statistical/deep learning and algorithmics will be combined to take advantage of the evolutionary relationships among sequences and solve key questions on the function of pathogenic genes, the emergence of drug resistances, and the dynamics of epidemics. He will develop interdisciplinary courses intended to a wide audience.

Quote

The amount of genomic data is increasing exponentially rate. These data contain a wealth of information on diseases, biodiversity, and many other important societal issues. The analysis of these data imposes constantly renewed challenges, on the algorithmic level and that of modeling. We are helped in this task by the traces left by evolution in the genes and genomes of species, as predicted by Theodosius Dobzhansky in his famous sentence “Nothing in biology makes sense except in the light of evolution” (1973). Evolutionary approaches combined with the latest advances in AI, especially deep learning, will be key to harnessing today’s and tomorrow’s genomic data, and solving key questions in biology and health.