GORTANA Luca
Paris Sciences & Lettres – Complexité du vivant (Complexity of life)
luca.gortana [at] agroparistech.fr
Short Bio
Engineering degree, AgroParisTech
Thesis title
Machine Learning approaches for joint image and omics analyses in cancer
Short abstract
Spatial transcriptomics is a technology that uses spatial analysis to measure the positional context of mRNAs in a tissue. The combination of tumour imaging (e.g. pathological slides) and spatial transcriptomics provides extremely rich information, potentially leading to a better understanding of the mechanisms of tumour progression. My project aims to develop new analytical concepts and tools for analyzing multi-modal omics and spatial omics data of cancer patients, to achieve this goal.
ZAHARIAS Paul
Muséum national d'Histoire naturelle
paul.zaharias [at] mnhn.fr
Short bio
PhD at the Muséum national d’Histoire naturelle (MNHN)
Research project
Evaluation and design of fast statistical branch support methods for phylogenetic gene/species tree reconstruction.
Short abstract
Statistical support in phylogenetic tree reconstruction is essential to interpret evolutionary relationships. My goal is to design scalable statistical supports for large phylogenetic and phylogenomic datasets to overcome the limitations of current methods.
PEREZ Manolo
Muséum national d'Histoire naturelle
manolo.fernandezperez [at] mnhn.fr
Short bio
PhD in on Evolutive Genetics and Molecular Biology – Federal University of Sao Carlos (UFSCar)-Brazil
Research project
Computational and machine learning-based methods in phylogenetics.
Short abstract
Deep Learning frameworks have increasingly been applied to phylogenetics, phylodynamics, and macroevolution due to their flexible and data hungry nature. Recent DL implementations have shown encouraging performance, with higher speed and accuracy than similar methods, when used with phylogenetic information to compare Birth-Death diversification models and estimate parameters for epidemiological data. Here, we propose a new DL framework that allows the incorporation of distinct strategies for simulating and representing phylogenetic information.
The research themes of the PRAIRIE Genomics researchers coalesce around the exploration and application of machine learning in genomics, with a particular focus on the intersection of computational methods, bioinformatics, and cancer research. They delve into the creation of algorithms and tools to process and analyze vast genomic data, identify patterns, and extract actionable insights. This includes the development of statistical models and machine learning algorithms to predict genetic variations and their potential effects on an organism’s phenotype.
They also study the use of genomics in understanding the molecular mechanisms of cancer, such as identifying specific genetic mutations associated with various types of cancer, and the implications these findings have on the development of personalized medicine. Their work contributes to the larger field of genomic medicine, aiming to incorporate individual genetic information into healthcare, thereby advancing precision medicine.
Moreover, their research expands to the areas of population genomics, studying the genetic structure of populations to unravel the history of species, their adaptations, and diversity. In the realm of phylogenetics, they engage in developing methods to understand the evolutionary relationships among various biological species or groups.
Their collective endeavors contribute significantly to enhancing the knowledge of genomics and its applications in healthcare, with implications for disease diagnosis, prognosis, and treatment. This research area is a testament to the power of interdisciplinary collaboration in accelerating scientific discovery and innovation.
CHADOUTAUD Loïc
loic.chadoutaud [at] curie.fr
Short bio
Ingénieur Civil des Mines de Paris – Mines ParisTech (Master’s degree in Science and Executive Engineering)
Master 2 – Mathématiques, Vision & Apprentissage – ENS Paris-Saclay (MVA Master’s degree)
Thesis topic
Spatial and Temporal Heterogeneity of single cell transcriptomic data.
Short abstract
Spatial transcriptomics is a new kind of technology that allows biologists to measure both transcriptomic information and spatial locations in tissues. My project aims to develop methods (such as clustering, dimensionality reduction algorithm…) for the analysis of such multimodal data. In particular, we are mainly interested in the links between spatial organization and transcriptomic heterogeneity within tissues.
SAMARAN Jules
samaran [at] bio.ens.psl.eu
Short bio
- Ingénieur Civil des Mines de Paris –
Mines ParisTech (Master’s degree in Science and Executive Engineering)
- Master 2 – Mathématiques, Vision &
Apprentissage – ENS Paris-Saclay (MVA Master’s degree)
Thesis topic
Methods for single-cell multimodal integration.
Short abstract
Recent technological advances allow biologists to profile multiple modalities (e.g. gene expression, DNA methylation, chromatin accessibility, etc.) from a single cell. However, such data are still rare and most of the existing single-cell multi-modal data are profiled from different cells (i.e. unpaired data). My project aims at developing integrative dimensionality reduction approaches for unpaired multimodal data (i.e. a collection of monomodal data sets) that are adapted to single-cell data. This tool will enable to cluster cells based on their multimodal similarities, to extract markers from the different modalities and to transfer annotations from one data set to another.
FERMANIAN Adeline
adeline.fermanian [at] mines-paristech.fr
Short bio
PhD in Statistics, Sorbonne Université
Research project
High-dimensional inference in genomic data.
Short abstract
Our goal is to propose new efficient procedures for high-dimensional inference, motivated by applications to high-dimensional genomic data. More specifically, we are interested in identifying regions of the genome associated with a phenotype, through procedures that provide p-values, typically via post-selection inference procedures.
Laura CANTINI
CNRS Research Scientist (Chargé de Recherches) at IBENS, specialized in multi-omics data integration in bulk and single-cell data
CANTINI Laura
laura.cantini [at] pasteur.fr
Short bio
Young PI (G5) in Institut Pasteur and CNRS permanent researcher. Her research activity is focused on the design of machine learning methods for the integration of single-cell multi-modal data. Mathematician by training, Laura received her PhD in cancer systems biology from the University of Turin (Italy). She then pursued a postdoc in the cancer system biology group at Institut Curie (Paris). In 2018, awarded the L’Oréal-UNESCO for Women in Science and EMBO fellowship, she joined CSAIL at MIT (USA), before being selected as CNRS permanent researcher. She is recipient of the ERC StG 2023, ANR JCJC 2020, Sanofi iTech Awards 2020 and L’Oréal-UNESCO for Women in Science fellow (2018 edition).
Topics of interest
Single-cell omics data, multi-modal integration, network inference
Project in Prairie
Laura Cantini will develop computational methods for multi-modal single-cell data integration. She will in particular combine multi-omics joint dimensionality reduction, to identify the cell types and states present in a biological sample, and network-based methods to reconstruct the multi-omics regulatory mechanisms underlying each cell type/state. Finally, by applying the developed approaches to patient-derived data, she will contribute to improve our understanding of cancer heterogeneity and its underlying molecular mechanisms.
Quote
The timely detection and successful treatment of cancer depends on our ability to understand when, why, and how a subpopulation of cells deviates away from a healthy state or acquires drug resistance. Single-cell multi-modal data, produced at increasing peace, offer the opportunity to tackle these questions. The current major bottleneck is the crucial need for computational methods able to translate this wealth of information into actionable biological knowledge.
Team
SAMARAN Jules
PhD student
KMETZSCH Virgilio
virgilio.kmetzsch [at] inria.fr
Short bio
MSc in Data Science – Grenoble INP Ensimag & UGA
Thesis title
Multimodal analysis of neuroimaging and transcriptomic data in genetic frontotemporal dementia.
Short abstract
Frontotemporal dementia (FTD) is a devastating neurodegenerative disease with no effective treatments so far. The Paris Brain Institute has assembled one of the largest cohorts worldwide on genetic forms of FTD, comprising multimodal data including neuroimaging (MRI, PET), cognition and transcriptomic (RNA-seq). The present PhD project aims at designing and applying new approaches for integrating multimodal transcriptomic and neuroimaging data, to characterize biomarkers of the presymptomatic phase of the disease, in order to design upcoming therapeutic trials.
TEBOUL Raphaël
raphael.teboul [at] inserm.fr
Short bio
Master degree in Engineering at Telecom Paris
Thesis title
Unravelling non-coding driver alterations in cancer with deep learning.
Short abstract
Of the 3 gigabases that constitute the human genome, only about 50 megabases (<2%) encode protein-coding genes. Particular attention has been paid to somatic mutations affecting the coding sequence of these genes, leading to the almost exhaustive characterization of 723 genes implicated in cancer (cancer gene census, COSMIC database, September 2019). By contrast, at the notable exception of TERT promoter mutations that induce the expression of telomerase (a key enzyme necessary for unlimited cell proliferation), very few driver alterations have been identified in the non-coding genome. Analysis of mutation hotspots or known regulatory regions like promoters and enhancers have failed to identify significantly recurrent mutations with a strong transcrptional impact on cancer genes. The main reason for that is the difficulty to predict the functional consequence of non-coding mutations. Although these mutations can alter important regulatory regions and modulate the expression of key cancer genes, there is no established method to predict the transcriptional impact of a non-coding mutation. To fill this gap, we will develop a deep neural network able to predict gene expression based on the local sequence context. Pioneer studies have demonstrated the ability of deep neural networks to learn how to recognize several regulatory motifs from the DNA sequence, including splicing sites, chromatin accessibility and 3D conformation or transcription factor binding sites. More recently, Olga Troyanskaya’s team has developed a deep neural network integrating able to predict, from the DNA sequence, the expression level of genes in a cell-type specific manner, by integrating predictions of chromatin state and transcription factor binding. Once trained, these neural networks are able to predict in silico the regulatory impact of any sequence variant, and are thus extremely valuable assets to identify disease coding variants. Deep learning analysis has been used to identify causal variants in several diseases including autism, but have not yet been applied to cancer. Our hypothesis is that leveraging the power of deep neural network to explore the millions of somatic alterations identified in cancer sequencing projects is a promising approach to uncover the missing driver events involving the non-coding human genome.
CAPTIER Nicolas
nicolas.captier [at] curie.fr
Short bio
M2, École polytechnique – MVA
Thesis title
Multimodal and integrative analysis of genomics, radiomics and pathological data for the prediction of response to immunotherapy in lung cancer.
Short abstract
We aim to develop supervised and unsupervised machine learning methods to identify signatures of the response to immunotherapy in non-small cell lung cancer, through the integration of genomics, radiomics (extracted from PET and CT images) and pathological data. We will interpret them to decipher biological pathways and mechanisms modulating immune responses.
BLASSEL Luc
luc.blassel [at] pasteur.fr
Short bio
- Diplome d’ingénieur (AgroParisTech)
- Masters in science (Dauphine – PSL)
Thesis title
Big Data and Machine learning for alignment in genomics.
Short abstract
With the
ever growing quantity of high-quality sequence data, machine learning is
becoming more and more useful in genomics.
The goal of
my project is to use machine learning methods to improve alignment of DNA
sequencing data on genomes.
ZINOVYEV Andrei
andrei.zinovyev [at] curie.fr
Short bio
Senior permanent researcher at Institut Curie and a scientifi c coordinator of Computational Systems Biology of Cancer group inside the Bioinformatics department (2005-). Postdoctoral fellow at Institut des Hautes Etudes Scientifi ques (IHES) (2001-2005). Habilitation in biology at Ecole Normale Superieur in Paris (2014).
Topics of interest
Machine Learning, Unsupervised learning, High-dimensional geometry, Omics data, Mathematical Modeling, Cancer biology
Project in Prairie
Andrei Zinovyev will focus on developing and adapting methods for learning latent spaces and structures in high-dimensional data, with principal applications to the biomedical data analysis. The main research line inside PRAIRIE will be on learning representations of multi-omics and single cell data. Andrei Zinovyev will implement a teaching course on applications of machine learning in molecular oncology.
Quote
Modern datasets in biology and medicine contain millions of objects (patients, biopsies, tumors, cells) characterized by hundreds of thousands of features such as expression of genes and proteins, properties of DNA or concentration of metabolites. How to use these data in order to make discoveries in biology or propose a better disease treatment? We can learn a lot by investigating the corresponding high-dimensional data point clouds, whose intrinsic geometry is shaped by biological processes, experimental designs and technical biases and is aff ected by the heterogeneity and uncertainty of molecular measurements. With machine learning methods allowing us to explore complex multidimensional data structures, one can tackle the problem of extracting the most relevant part of the information contained in omics data and using it further in the most effi cient way.
Team
GASCUEL Olivier
olivier.gascuel [at] mnhn.fr
Short bio
Research Director at CNRS (ISYEB – Muséum National d’Histoire Naturelle). Head of the Department of Computational Biology at Institut Pasteur (2015-2020). Associate Editor of Systematic Biology. Fast Breaking Paper 2005 and Current Classic in Environment and Ecology from 2007 to 2011 (most cited paper in the field, Science Watch – Thomson Reuters). Silver Medal in Computer Science of the CNRS, 2009. Grand Prix Inria – Académie des Sciences for Numerical Sciences, 2017.
Topics of interest
Computational biology, genomics, evolution, pathogens
Project in Prairie
Olivier Gascuel’s research will focus on the analysis of genomic data. Modeling, statistical/deep learning and algorithmics will be combined to take advantage of the evolutionary relationships among sequences and solve key questions on the function of pathogenic genes, the emergence of drug resistances, and the dynamics of epidemics. He will develop interdisciplinary courses intended to a wide audience.
Quote
The amount of genomic data is increasing exponentially rate. These data contain a wealth of information on diseases, biodiversity, and many other important societal issues. The analysis of these data imposes constantly renewed challenges, on the algorithmic level and that of modeling. We are helped in this task by the traces left by evolution in the genes and genomes of species, as predicted by Theodosius Dobzhansky in his famous sentence “Nothing in biology makes sense except in the light of evolution” (1973). Evolutionary approaches combined with the latest advances in AI, especially deep learning, will be key to harnessing today’s and tomorrow’s genomic data, and solving key questions in biology and health.
Team
PEREZ Manolo
Postdoctoral researcher
PhD in on Evolutive Genetics and Molecular Biology – Federal University of Sao Carlos (UFSCar)-Brazil
ZAHARIAS Paul
Postdoctoral researcher
PhD at the Muséum national d’Histoire naturelle (MNHN)
CHIKHI Rayan
rayan.chikhi [at] pasteur.fr
Short bio
Recently appointed G5 group leader at Institut Pasteur and CNRS researcher, in the Department of Computational Biology, Sequence Bioinformatics team. Previously affiliated with Université de Lille.
Topics of interest
Analysis of DNA sequencing data, algorithms, data structures
Project in Prairie
The project has the ambitious goal of finding new genetic determinants in a common form of Alzheimer’s disease. We will combine algorithms, machine learning and statistical techniques to mine through large amounts of DNA sequencing data. The plan is to develop new computational methods to perform an initial analysis of raw sequencing data, and then apply supervised machine learning methods to detect clinically relevant variants.
Quote
This project fosters connections with three disciplines: sequence bioinformatics, AI, and a high-profile clinical application. It is thus part of a biological and interdisciplinary side of PRAIRIE. We will also tackle analysis of ‘very big data’, as each human genome yields around 100 gigabases of raw data, and studied cohorts typically gather thousands of samples or more.
Team
DUITAMA GONZALEZ Camila
PhD student
BARILLOT Emmanuel
Computational Molecular Oncology
emmanuel.Barillot [at] curie.fr
Short bio
Director of U900 Research Department (Institut Curie – INSERM – PSL Research University /Mines ParisTech), Head of U900 Computational Systems Biology of Cancer team, Director of Institut Curie Bioinformatics Core Facility, Chair at Paris Artificial Intelligence Research Institute (PRAIRIE).
Topics of interest
Computational molecular oncology, systems biology of cancer, Biological Network modeling, omics data analysis
Project in Prairie
Emmanuel Barillot’s research focuses on Computational Systems Biology of Cancer. It aims at understanding the molecular basis of cancer using large-scale molecular profiles (omics) and clinical records, and at predicting disease evolution, potential therapeutic targets, and treatment outcome (precision medicine). To achieve this goal I develop computational approaches based on machine learning, network and prior knowledge modeling.
Quote
Molecular and phenotypic data about tumors and models are accumulating
at an ever-increasing pace, and is becoming a routine source of information in the medical setting, thanks to lowering costs and improved biotechnological devices (DNA sequencing, mass spectrometry, imaging…). As a consequence, the bottleneck in cancer research has shifted from data acquisition to computational analysis. We still lack powerful computational models and analytical approaches to convert our deepened observations into full understanding of the biology of cancer and to optimize the benefit for patients. My work at the intersection of molecular oncology, mathematical modeling and machine learning is designed to overcome these limitations.
Team
CAPTIER Nicolas
PhD student
CHADOUTAUD Loïc
PhD student
AZENCOTT Chloé-Agathe
chloe-agathe.azencott [at] mines-paristech.fr
Short bio
Assistant professor at the Centre for Computational Biology (CBIO) of MINES ParisTech and Institut Curie (since 2013). Recipient of an ANR Young Researcher grant (2019-2021) and member of an H2020 Initial Training Network (2019- 2022). Instructor at Open Classrooms. Co-founder of Paris Women in Machine Learning and Data Science.
Topics of interest
learning, statistical genetics, genomics, precision medicine
Project in Prairie
Chloé-Agathe Azencott will address feature selection in high-dimensional, heterogeneous data, with applications to biomarker discovery from multi-omics data. She will most notably focus on using biological networks both to constrain the feature selection problem and to facilitate the integration of heterogeneous datatypes. She will teach courses on high-dimensional machine learning, as well as courses with a focus on omics data.
Quote
Many of the molecular data sets collected in the context of precision medicine and health pose statistical and machine learning challenges that are very different from those encountered in most artificial intelligence applications. Indeed, we are facing a setting where data are scarce and high-dimensional – there are orders of magnitudes more nucleotides in a human genome than patients suffering from a specific disease. This is therefore an exciting field providing us with many open problems and challenges.
Team
FERMANIAN Adeline
Postdoctoral researcher