Abstracts

Francis Bach (Inria, PRAIRIE), Kernel sums of squares for optimization and beyond

Aymeric Dieuleveut (Ecole Polytechnique, Hi! Paris), Federated Learning with compression

Julien Mairal (Inria, MIAI), Lucas-Kanade reloaded: End-to-end super-resolution from raw image bursts

Edouard Oyallon (CNRS, SCAI), Learning is boring: image classification with patches

Rachel Bawden (Inria, PRAIRIE), Handling Variation in text with machine translation

Abstract: One of the biggest challenges in natural language processing research today is handling variation, whether it is diachronic variation (the evolution of language) or synchronic variation (variation in the contemporary use of language). Depending on the application, this variation can either be an object of study or something that needs to be controlled for, so that models can be robust to it. This talk will cover a variety of different tasks involving language variation, including the translation of user-generated content, and show how machine translation techniques can be applied to these different settings.

Thierry Poibeau (CNRS, PRAIRIE), Poetry generation, around Oupoco

The recent period has seen the emergence of an incredibly large number of systems able to generate poetry. We will present our own developments in this domain, based both on the recombination of existing verses and on pure generation techniques. We will also address the interest of the type of research and its possible impact, for educational purposes among others.

Martial Hebert (Carnegie-Mellon University), Robust AI

Justin Carpentier (Inria, PRAIRIE), Robotics – What should be really learned?

Raphael Porcher (Université de Paris, PRAIRIE), Stochastic implementation of individualized treatment rules

Nicholas Ayache (Inria, 3IA Nice Côte-d’Azur), AI for medical imaging – The role of  models

Alexandre Gramfort (Inria, DATAIA), Bridging the gap between neurosciences and machine learning

Laura Cantini (CNRS, IBENS-ENS-PSL, PRAIRIE), Single-cell multi-modal data integration

Abstract: Single-cell data constitute a major breakthrough in life sciences. Their integration will enable us to investigate outstanding biological and medical questions thus far inaccessible. However, still few methods exist to integrate different single-cell modalities, corresponding to omics data (e.g., DNA methylation, proteome, chromatin accessibility), plus spatial positioning and images. In this talk, I will give an overview of our ongoing research activity in two main methodological directions: (i) joint dimensionality reduction methods to cluster cells based on their multi-modal similarity and (ii) networks to reconstruct regulatory mechanisms based on multi-modal data.

Jean-Baptiste Masson (Institut Pasteur, PRAIRIE), Physics-informed Bayesian learning: from random walks to fetus morphology

Hippolyte Verdier1, Charlotte Godard1, Corentin Guerinot1, Mohamed El Beheiry1, François Laurent1, Christian Vestergaard1,2, Jean-Baptiste Masson1,2

1Decision and Bayesian Computation, Computational Biology &  Neuroscience Department – UMR 3571 & USR 3756,

Institut Pasteur, Paris, France

2Institut Prairie, Paris, France

The field of Simulation-based inferences1 is developping rapidly, driven by the continuous progress in statistical learning. Amortised posterior inferences allow fast inference on experimental data after training the procedure on numerically generated data. This talk will discuss a new approach to analyze random walks of biomolecules by combining a graph neural network2 with a Bayes flow Approximate Bayesian Computation3. We demonstrate the method on canonical random walks and neuronal receptors in and out of synapses. We will conclude with new developments of these approaches for fetal diagnosis leveraging MRI imaging and virtual reality-based visualization4 and analysis5.

1.  Cranmer, K., Brehmer, J. & Louppe, G. The frontier of simulation-based inference. Proc. Natl. Acad. Sci. 117, 30055–30062 (2020).

2.  Verdier, H. et al. Learning physical properties of anomalous random walks using graph neural networks. J. Phys. A: Math. Theor. 54, 234001 (2021).

3.  Radev, S. T., Mertens, U. K., Voss, A., Ardizzone, L. & Kothe, U. BayesFlow: Learning Complex Stochastic Models With Invertible Neural Networks. IEEE Trans. Neural Netw. Learn. Syst. 1–15 (2020) doi:10.1109/TNNLS.2020.3042395.

4.  El Beheiry, M. et al. DIVA: Natural Navigation Inside 3D Images Using Virtual Reality. J. Mol. Biol. 432, 4745–4749 (2020).

5.  Blanc, T., El Beheiry, M., Caporal, C., Masson, J.-B. & Hajj, B. Genuage: visualize and analyze multidimensional single-molecule point cloud data in virtual reality. Nat. Methods 17, 1100–1102 (2020).

Umut Simsekli (Inria, PRAIRIE), Towards building a heavy-tailed theory of stochastic gradient descent for deep neural networks

Abstract: In this talk, I will focus on the ‘tail behavior’ in SGD in deep learning. I will first empirically illustrate that heavy tails arise in the gradient noise (i.e., the difference between the stochastic gradient and the true gradient). Accordingly, I will propose to model the gradient noise as a heavy-tailed α-stable random vector, and accordingly propose to analyze SGD as a discretization of a stochastic differential equation (SDE) driven by a stable process. As opposed to classical SDEs that are driven by a Brownian motion, SDEs driven by stable processes can incur ‘jumps’, which force the SDE (and its discretization) transition from ‘narrow minima’ to ‘wider minima’, as proven by existing metastability theory and the extensions that we proved recently. These results open up a different perspective and shed more light on the view that SGD ‘prefers’ wide minima. In the second part of the talk, I will focus on the generalization properties of such heavy-tailed SDEs and show that the generalization error can be controlled by the Hausdorff dimension of the trajectories of the SDE, which is closely linked to the tail behavior of the driving process. Our results imply that heavier-tailed processes should achieve better generalization; hence, the tail-index of the process can be used as a notion of “capacity metric”. Finally, if time permits, I will talk about the ‘originating cause’ of such heavy-tailed behavior and present theoretical results which show that heavy-tails can even emerge in very sterile settings such as linear regression with iid Gaussian data.

The talk will be based on the following papers:

U. Şimşekli, L. Sagun, M. Gürbüzbalaban, “A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks”, ICML 2019

U. Şimşekli, O. Sener, G. Deligiannidis, M. A. Erdogdu, “Hausdorff Dimension, Heavy Tails, and Generalization in Neural Networks”, NeurIPS 2020

M. Gurbuzbalaban, U. Simsekli, L. Zhu, “The Heavy-Tail Phenomenon in SGD”, ICML 2021

Gabriel Peyré (CNRS and ENS-PSL, PRAIRIE), Scaling optimal transport for high-dimensional Learning

Abstract: Optimal transport (OT) has recently gained lot of interest in machine learning. It is a natural tool to compare in a geometrically faithful way probability distributions. It finds applications in both supervised learning (using geometric loss functions) and unsupervised learning (to perform generative model fitting). OT is however plagued by the curse of dimensionality, since it might require a number of samples which grows exponentially with the dimension. In this talk, I will explain how to leverage entropic regularization methods to define computationally efficient loss functions, approximating OT with a better sample complexity. More information and references can be found on the website of our book “Computational Optimal Transport” https://optimaltransport.github.io/

Cordelia Schmid (Inria, PRAIRIE), Do you see what I see? Large-scale learning from multimodal videos

Jérôme Lang (Dauphine-PSL, PRAIRIE), AI for collective decision making

Jérôme Bolte (TSE School of Economics, ANITI), Conservative calculus: a variational calculus for nonsmooth algorithmic differentiation

Clément Royer (Dauphine-PSL, PRAIRIE), Black-box optimization based on probabilistic properties

Abstract: This talk is concerned with derivative-free optimization methods, a class of algorithms that has proved particularly relevant in modern tasks such as hyperparameter tuning. Our proposed schemes borrow from standard nonlinear optimization techniques, which we combine with randomized approaches to enhance efficiency and scalability. We describe a generic methodology that leads to complexity guarantees for these algorithms in their deterministic form, which we then adapt to a probabilistic setting. Our results allow for identifying algorithmic variants with best computational complexity guarantees: such a theoretical improvement reflects positively on the numerical performance, as illustrated by our experiments.

Yann LeCun (New York University and Facebook AI Research), The future is self-supervised