L’éclairage de la physique statistique sur quelques questions d’apprentissage machine
06/05/2020
Speaker: Marc Mézard, Ecole normale supérieure – Université PS
Abstract
Depuis plus de trente ans, il y a eu un certain nombre de tentatives pour utiliser des concepts et méthodes de physique statistique afin de développer un cadre théorique pour l’apprentissage machine, avec des succès mitigés. Cette direction de recherche a été revivifiée récemment, autour des questions ouvertes importantes posées dans le cadre des développements récents du « deep learning », notamment des questions liées à la dynamique des algorithmes d’apprentissage et à la structure des données.
Cet exposé présentera certains de ces développements récents, dans une perspective globale, en soulignant les forces et les faiblesses de telles approches.
Independent Component Analysis is an exploratory technique which, as its name implies, aims at decomposing a vector of observations into components which are statistically independent (or as independent as possible). It has numerous applications, particularly in neurosciences for extracting brain sources from their observed mixtures collected on the scalp.
ICA goes well beyond PCA (Principal Component Analysis) because statistical independence is a much stronger property than mere decorrelation. Of course, this program implies that an ICA method must use non Gaussian statistics in order to express independence (otherwise, independence would reduce decorrelation).
In this (non technical) seminar, I use a simple construction of Information Geometry (a Pythagorean theorem in distribution space) to elucidate the connections in ICA between the main players: correlation, independence, non Gaussianity, mutual information and entropy.
Matt Post is a research scientist working in the Microsoft Translator group, where he has been since 2021. He holds a courtesy appointment in the department of computer science at Johns Hopkins University, where, prior to joining Microsoft, he worked for ten years or so as a research scientist at the HLTCOE (Human Language Technology Center of Excellence) and with the Center for Language and Speech Processing (CLSP). He is interested mostly in machine translation, but also enjoys working on practical applied problems in many areas within NLP. He has contributed to many open source projects, including Joshua, Sockeye, Fairseq, and sacrebleu. He helped organize the WMT manual evaluation for many years, has served on the NAACL executive board, and is the director of the ACL Anthology.
Abstract
The technology and architectures underlying machine translation have changed a number of times over the decades, but apart from occasional research projects, the basic unit of translation has always been, and remains, the sentence. This paradigm persists despite the many clear advantages of translating at the document level, and it grows more glaring as much of NLP technology moves to large language models, which are natively document-based. This talk will survey research in document translation, highlighting difficulties in training, models, and evaluation. We then propose simple, workable solutions in each of these areas that may help the field escape its sentence-level rut. Joint work with Marcin Junczys-Dowmunt.
Dr. Marc Lelarge is a researcher at INRIA. He is also a lecturer in deep learning at Ecole Polytechnique (Palaiseau, France) and Ecole Normale Superieure. He graduated from Ecole Polytechnique, qualified as an engineer at Ecole Nationale Superieure des Telecommunications (Paris) and received a PhD in Applied Mathematics from Ecole Polytechnique in 2005. Recipient of the 2012 SIGMETRICS rising star researcher award and the 2015 Best Publication in Applied Probability Award with Mohsen Bayati and Andrea Montanari for their work on compressed sensing.
Abstract
Geometric deep learning is an attempt for geometric unification of a broad class of machine learning problems from the perspectives of symmetry and invariance. In this talk, I will present some advances of geometric deep learning applied to combinatorial structures. I will focus on various classes of graph neural networks that have been shown to be successful in a wide range of applications with graph structured data.
How AI can help us study the complexity of children’s early language acquisition
16/02/2022
14h
Speaker: Abdellah Fourtassi
Bio
I am currently a researcher (“délégation recherche”) at INRIA Paris, visiting fromAix-Marseille University where I am Assistant Professor (Maître de Conférence) of computer science since late 2019. I am also a research fellow at the Institute of Language, Communication, and the Brain (ILCB) where I direct the interdisciplinary research group “Computational Communication, and Development” (cocodev.fr). Prior to that, I was a postdoctoral research fellow at Stanford University. I completed my PhD at Ecole Normale Supérieure rue d’Ulm and my undergraduate studies at Ecole Polytechnique.
Abstract
To acquire language, children need to learn the form (e.g., phonology and syntax), the content (e.g., word and sentence meanings), and the use (e.g., finding the right words to convey communicative intents). Research in language development has traditionally simplified this process by studying these dimensions separately. The reality of the situation is that children have to deal with aspects of form, content, and use simultaneously. In addition, experimental studies suggest that the timelines of acquisition of these dimensions largely overlap, indicating that children learn them in parallel, not one at a time. While this fact makes language acquisition seem even harder than we previously thought, here I show that the joint learning of form, content, and use may actually be more a help than a hindrance: These dimensions are interdependent in many ways and can therefore constrain/disambiguate each other.
More generally, I argue that research into the complex interaction/synergy across linguistic levels, during child development, requires going beyond traditional research tools in the field of child development (e.g., controlled experiments) and integrating cutting-edge methods from AI in our research toolkit. This new research method is instrumental not only in piercing some lingering mysteries in children’s language learning but also in understanding the development of this complex phenomenon in its natural context (e.g., as opposed to in-lab studies), thus facilitating the translation of scientific findings much more easily into real-life interventions and societal applications.
Data Augmentation in High Dimensional Low Sample Size Setting Usinga Geometry-Based Variational Autoencoder
09/03/2022
14h
Speaker: Stéphanie Allassonnière
Bio
Professor of Mathematics at the School of Medicine, University of Paris and associated Professor in the applied Mathematics department of Ecole Polytechnique. Manager of master programs and masterclasses in AI in healthcare.
Abstract
In this presentation, we propose a new method to perform data augmentation in a reliable way in the High Dimensional Low Sample Size (HDLSS) setting using a geometry-based variational autoencoder. Our approach combines a proper latent space modeling of the VAE seen as a Riemannian manifold with a new generation scheme which produces more meaningful samples especially in the context of small data sets. The proposed method is tested through a wide experimental study where its robustness to data sets, classifiers and training samples size is stressed. It is also validated on a medical imaging classification task on the challenging ADNI database where a small number of 3D brain MRIs are considered and augmented using the proposed VAE framework. In each case, the proposed method allows for a significant and reliable gain in the classification metrics. For instance, balanced accuracy jumps from 66.3% to 74.3% for a state-of-the-art CNN classifier trained with 50 MRIs of cognitively normal (CN) and 50 Alzheimer disease (AD) patients and from 77.7% to 86.3% when trained with 243 CN and 210 AD while improving greatly sensitivity and specificity metrics.
Justin Solomon is an associate professor of Electrical Engineering and Computer Science in the MIT Computer Science and Artificial Intelligence Laboratory. He runs the MIT Geometric Data Processing group, which studies problems at the intersection of geometry, large-scale optimization, and applications in machine learning, graphics, and vision.
Abstract
From 3D modeling to autonomous driving, a variety of applications can benefit from data-driven reasoning about geometric problems. The available data and preferred shape representation, however, varies widely from one application to the next. Indeed, the one commonality among most of these settings is that they are not easily approached using data-driven methods that have become de rigueur in other branches of computer vision and machine learning. In this talk, I will summarize recent efforts in my group to develop learning architectures and methodologies paired to specific applications, from point cloud processing to mesh and implicit surface modeling. In each case, we will see how mathematical structures and application-specific demands drive our design of the learning methodology, rather than bending application demands or ignoring geometric details to apply a standard data analysis technique.
Representing non-negative functions, with applications to non-convex optimization and beyond
10/05/2022
14h
Speaker: Alessandro Rudi
Bio
Alessandro Rudi is Researcher at INRIA, Paris from 2017. He received his PhD in 2014 from the University of Genova, after being a visiting student at the Center for Biological and Computational Learning at Massachusetts Institute of Technology. Between 2014 and 2017 he has been a postdoctoral fellow at Laboratory of Computational and Statistical Learning at Italian Institute of Technology and University of Genova.
Abstract
Many problems in applied mathematics admit a natural representation in terms of non-negative functions, e.g. probability representation and inference, optimal transport, optimal control, non-convex optimization, to name a few. While linear models are well suited to represent functions with output in R or C, being at the same time very expressive and flexible, the situation is different for the case of non-negative functions where the existing models lack one of these good properties.
In this talk we present a model for non-negative functions that promises to bring to these problems, the same benefits that linear models brought to interpolation, approximation, quadrature and supervised learning, leading to a new class of adaptive algorithms with provably fast convergence.
In particular, we will show direct applications in numerical methods for probability representation and non-convex optimization. We will see more in detail that the model allows to derive an algorithm for non-convex optimization that is adaptive to the degree of differentiability of the objective function and achieves optimal rates of convergence. Finally, we show how to apply the same technique to other interesting problems in applied mathematics that can be easily expressed in terms of inequalities.
Patient phenotypic similarity for diagnosis of rare diseases
22/06/2022
14h
Speaker: Xiaoyi CHEN
Bio
Xiaoyi Chen is researcher at Institut Imagine, a research institute specialized in genetic diseases. Her research focuses on automated methods to identify rare disease patients in huge real-world-data repositories. She received her PhD in applied mathematics and computational biology at Institut Pasteur, Paris (2015). Between 2016 and 2022, she was a researcher in the Information Sciences to support Personalized Medicine group at Inserm UMR 1138 (now team HeKA Inria-Inserm-Université Paris Cité).
Abstract
Many rare diseases suffer from important delayed- or underdiagnosis issues due to a broad spectrum of phenotypes and high genetic and clinical heterogeneity. One solution to accelerate the diagnosis process is to rely on patients’ electronic health records (EHRs) for automatic phenotyping and develop algorithms to identify from large scale clinical data warehouse patients having similar profiles to those from already diagnosed patients. In this talk, I will summarize recent efforts in the context of RHU C’IL-LICO project, to develop diagnosis support systems that takes into consideration the semantic relations between clinical concepts and the different levels of relevance presented in patients’ EHRs – including incompleteness, inaccurate phenotyping, noisy phenotypes related to multiple comorbidities and medical histories, as well as the clinical heterogeneity of complex rare diseases and the important imbalance issues.
Researcher at Inria, leading since 2011 the machine learning team which is part of the Computer Science department at Ecole Normale Supérieure. Ph.D. Berkeley (2005). ERC Starting grant (2009) and Consolidator Grant (2016), Inria young researcher prize (2012), ICML test-of-time award (2014), Lagrange prize in continuous optimization (2018). Co-editor-in-chief of the Journal of Machine Learning Research. Member of the Academy of Sciences.
Abstract
Estimating and computing entropies of probability distributions are key computational tasks throughout data science. In many situations, the underlying distributions are only known through the expectation of some feature vectors, which has led to a series of works within kernel methods. In this talk, I will explore the particular situation where the feature vector is a rank-one positive definite matrix, and show how the associated expectations (a covariance matrix) can be used with information divergences from quantum information theory to draw direct links with the classical notions of Shannon entropies.
Speaker: Masashi Sugiyama, RIKEN/The University of Tokyo
Bio
Masashi Sugiyama received a Ph.D. in Computer Science from Tokyo Institute of Technology in 2001. He has been a Professor at the University of Tokyo since 2014 and concurrently Director of the RIKEN Center for Advanced Intelligence Project (AIP) since 2016. His research interests include theories and algorithms of machine learning. In 2022, he received the Award for Science and Technology from Japan’s Minister of Education, Culture, Sports, Science, and Technology. He served as Program Co-chairs for Neural Information Processing Systems (NeurIPS) Conference in 2015, International Conference on Artificial Intelligence and Statistics (AISTATS) in 2019, and Asian Conference on Machine Learning (ACML) in 2010 and 2020. He (co)authored Machine Learning in Non-Stationary Environments (MIT Press, 2012), Density Ratio Estimation in Machine Learning (Cambridge University Press, 2012), Statistical Reinforcement Learning (Chapman & Hall, 2015), Introduction to Statistical Machine Learning (Morgan Kaufmann, 2015), and Machine Learning from Weak Supervision (MIT Press, 2022).
Abstract
When machine learning systems are trained and deployed in the real world, we face various types of uncertainty. For example, training data at hand may contain insufficient information, label noise, and bias. In this talk, I will give an overview of our recent advances in robust machine learning, including weakly supervised classification (positive-unlabeled classification, positive-confidence classification, complementary-label classification, etc), noisy label learning (noise transition estimation, instance-dependent noise, clean sample selection, etc.), and domain adaptation (joint importance-predictor learning for covariate shift adaptation, dynamic importance-predictor learning for full distribution shift, etc.).
Artificial Intelligence and Society: What would a better AI mean?
14h
Speaker: Thierry Poibeau, CNRS
Bio
CNRS Research Director, Head of the CNRS Lattice research unit (2012-2018) and adjunct head since 2019. Affiliated lecturer, Language Technology Laboratory, U. of Cambridge since 2009. Rutherford fellowship, Turing institute, London, 2018-2019. Teaching NLP in the PSL Master in Digital Humanities.
Abstract
Artificial Intelligence (AI) has made huge progress in the last few years. Applications are now deployed and have a real impact on society. The press regularly echoes concerns, from the general public as well as from professionals and even researchers themselves: if AI has achieved human-like performance on various tasks, should we fear the consequences? For example, the production of ‘fake news’ and ‘deep fake’ on a large scale can be a danger for democracy. If language models reflect or even amplify the biases of the training data, there is a risk of discrimination. etc.
In this presentation, we will come back to these thorny and topical questions. We will remind some well-known cases, which have made the headlines, where AI has been called into question in various ways. It seems pretty clear that some scandals could have been avoided and were due to problematic deployment of poorly developed systems. However, beyond that, we will show that the issues raised are complex: the notion of bias, for example, implies the idea of a norm. Who sets the standard? And, if unbiasing the models seems a laudable goal in itself, who could decide what a neutral, unbiased model would be? The notion of human or superhuman performance (which suggests a risk of loss of control of the human against the machine) must also be questioned: we still seem far from a general, autonomous AI, able to take power against humans.
In the end, our position is close to that of Kate Crawford: AI is too often described as an autonomous force, whereas it is made by humans, for humans, with specific interests that have to be unraveled. It is also clear that we, as researchers, have our responsibilities too and we cannot hide behind the supposed neutrality of technology. A better account of what the technology can do, and cannot do, would help raise the debate on these important questions.
Quantitative Uniform Stability of the Iterative Proportional Fitting Procedure
12/12/2022
Speaker: George Deligiannidis, University of Oxford
Bio
After obtaining my PhD from the School of Mathematical Sciences of the University of Nottingham under the supervision of Sergey Utev and Huiling Le, I moved to the Department of Mathematics of the University of Leicester as a Teaching Assistant/Fellow. In 2012 I moved to the Department of Statistics of the University of Oxford as Departmental Lecturer. I stayed in Oxford until September 2016 when I moved to the Department of Mathematics of King’s College London as Lecturer in Statistics. I moved back to the University of Oxford in December 2017 as Associate Professor of Statistics
Abstract
We establish the uniform in time stability, w.r.t. the marginals, of the Iterative Propor- tional Fitting Procedure, also known as Sinkhorn algorithm, used to solve entropy-regularised Optimal Transport problems. Our result is quantitative and stated in terms of the 1- Wasserstein metric. As a corollary we establish a quantitative stability result for Schrödinger bridges.
This is joint work with V. de Bortoli and A. Doucet.