Mathematical aspects of neural network approximation and learning
Speaker: Joan Bruna, New York University
Joan Bruna is an Assistant Professor at Courant Institute, New York University (NYU), in the Department of Computer Science, Department of Mathematics (affiliated) and the Center for Data Science, since Fall 2016. He belongs to the CILVR group and to the Math and Data groups. From 2015 to 2016, he was Assistant Professor of Statistics at UC Berkeley and part of BAIR (Berkeley AI Research). Before that, he worked at FAIR (Facebook AI Research) in New York. Prior to that, he was a postdoctoral researcher at Courant Institute, NYU. He completed his PhD in 2013 at Ecole Polytechnique, France. Before his PhD he was a Research Engineer at a semi-conductor company, developing real-time video processing algorithms. Even before that, he did a MsC at Ecole Normale Superieure de Cachan in Applied Mathematics (MVA) and a BA and MS at UPC (Universitat Politecnica de Catalunya, Barcelona) in both Mathematics and Telecommunication Engineering. For his research contributions, he has been awarded a Sloan Research Fellowship (2018), a NSF CAREER Award (2019) and a best paper award at ICMLA (2018).
High-dimensional learning remains an outstanding phenomena where experimental evidence outpaces our current mathematical understanding, mostly due to the recent empirical successes of Deep Learning. Neural Networks provide a rich yet intricate class of functions with statistical abilities to break the curse of dimensionality, and where physical priors can be tightly integrated into the architecture to improve sample efficiency. Despite these advantages, an outstanding theoretical challenge in these models is computational, by providing an analysis that explains successful optimization and generalization in the face of existing worst-case computational hardness results.
In this talk, we will describe snippets of such challenge, covering respectively optimization and approximation. First, we will focus on the framework that lifts parameter optimization to an appropriate measure space. We will overview existing results that guarantee global convergence of the resulting Wasserstein gradient flows, and present our recent results that study typical fluctuations of the dynamics around their mean field evolution, as well as extensions of this framework beyond vanilla supervised learning to account for symmetries in the function and in competitive optimization. Next, we will discuss the role of depth in terms of approximation, and present novel results establishing so-called ‘depth separation’ for a broad class of functions. We will conclude by discussing consequences in terms of optimization, highlighting current and future mathematical challenges.