LargeData Chair
Description
Motivation and Key Objectives:At the core of AI lies a set of elaborate non-linear, data-driven or implictly-defined machine learning methods and algorithms. The latter however largely rely on “small dimensional intuitions” and heuristics which are mostly inappropriate and behave strikingly differently in large dimensions. This “curse” of dimensionality notably explains the failure of kernel methods and the difficulty to fathom the powerful deep networks. Simultaneously, the strong pressure and demands in reliable AI tools from acamedia and companies alike induces an unprecedented need for novel mathematical tools and theoretical guarantees in machine learning algorithms.
Recent advances in large dimensional statistics, and particularly in random matrices and statistical physics, have provided important clues and striking first results on the understanding and directions of improvements for machine learning methods in large dimensions: a new approach to kernel methods is emerging, a renewed methodology for semi-supervised learning is born, and first advances in the difficult deep neural network theory have appeared. More surprisingly, by exploiting a notion of universality, statistical advances in random matrix theory are increasingly shown to be robust to real datasets and thus powerful practical performance predictors. Many of these recent findings are at the initiative of the members of the present project.
Chair Description and Program:
The LargeDATA chair is anchored on these recent breakthroughs and proposes to develop a consistent mathematical framework for the large-dimensional analysis, improvement and renewed design of basic-to-advanced data processing methods. LargeDATA is a theoretical chair, organized around (i) two methodological work packages on the development and application of random matrix theory and statistical physics to machine learning for large datasets, and (ii) a “transfer” work package that elaborates both on more realistic data models and on applications to real data (see detail below).
The long-term ambition is for the chair to hold a leading core-AI position within the MIAI institute providing world-class mathematical advances and attractivity in the automated treatment of large, numerous and dynamical data.
Activities
WP1. Random Matrix Theory (RMT) for AI. This workpackage analyses non-linear matrix models (kernel random matrices, random graphs, random neural networks models, large random tensors) [Comon,Couillet,Tremblay] and their implications to related machine learning algorithms (LS-SVM, SSL, spectral clustering, neural nets, DPP sampling) [Amblard,Barthleme,Couillet]. It also studies the performance of implicit solutions to large dimensional optimizations for classification and regression (SVM, logistic regression, GLMM) [Chatelain,Couillet].
WP2. Statistical Physics and Graphs in AI. This workpackage specifically explores statistical physics (and related heuristics) in scenarios where mathematical developments and techniques are difficult or currently inaccessible [Barthelme,Couillet,Tremblay]. This concerns specifically statistical models of sparse random matrices and graphs, and complex non-linear learning methods such as deep neural networks.
WP3. Universality Results: from Theory to Practice. This workpackage develops theoretical grounds to apply large dimensional statistics to more practical considerations (such as in brain signal processing [Barthelme]). The focus will be on concentration of measure theory for neural network analysis [Couillet], sparse matrix and network analyses for clustering, graph mining [Couillet,Tremblay], as well as on heuristic methods to better grasp the most challenging real data and algorithm models (e.g., for deep learning methods) [Couillet,Tremblay].
Chair events
The chair organizes regular meetings with its main industrial collaborators HUAWEI Labs Paris, CEA Leti/List, and ST-Microelectonics.A merged GAIA (at GISPA-lab) and LargeDATA chair seminar takes place on a weekly basis at GIPSA-lab.
Scientific publications
Estimation of Covariance Matrix Distances in the High Dimension Low Sample Size Regime hal-02965834
Revisiting the Bethe-Hessian: Improved Community Detection in Sparse Heterogeneous Graphs hal-02429525
Random matrix-improved estimation of covariance matrix distances hal-02355223
Random Matrix-Improved Estimation of the Wasserstein Distance between two Centered Gaussian Distributions hal-02965778
Classification spectrale par la laplacienne déformée dans des graphes réalistes hal-02153901
Random Matrix Improved Covariance Estimation for a Large Class of Metrics hal-02152121
A Large Scale Analysis of Logistic Regression: Asymptotic Performance and New Insights hal-02139980
Community Detection in Sparse Realistic Graphs: Improving the Bethe Hessian hal-02153916
Improved Estimation of the Distance between Covariance Matrices hal-02355321
Revisiting and improving semi-supervised learning: a large dimensional approach hal-02139979
A Large Dimensional Analysis of Least Squares Support Vector Machines hal-02048984
Concentration of Measure and Large Random Matrices with an application to Sample Covariance Matrices hal-02020287
A Kernel Random matrix-based approach for sparse PCA hal-02971198
Kernel Random Matrices of large concentrated data: The example of GAN-Generated Images hal-02971224
Pourquoi les matrices aléatoires expliquent l'apprentissage ? Un argument d'universalité offert par les GANs hal-02971207
A random matrix analysis of random Fourier features: beyond the Gaussian kernel, a precise phase transition, and the corresponding double descent hal-02971807
Concentration of solutions to random equations with concentration of measure hypotheses hal-02973851
Performance-complexity trade-off in large dimensional statistics hal-02961057
Smoothing graph signals via random spanning forests hal-02956603
Large Dimensional Asymptotics of Multi-Task Learning hal-02965810
A Random Matrix Analysis of Learning with ?-Dropout hal-02971211
Random Matrix Theory Proves that Deep Learning Representations of GAN-data Behave as Gaussian Mixtures hal-02971185
Smoothing graph signals via random spanning forests hal-02319175
Head of the chair
Full Professor at CentraleSupélec, Université ParisSaclay.
romain.couillet@gipsa-lab.grenoble-inp.fr
Team members
Pierre-Olivier AMBLARD
Simon BARTHELME
Florent CHATELAIN
Pierre COMON
Nicolas TREMBLAY
Research topics
statistical physics and graphs for AI
universality results : from theory to practice