Exploiting Algebraic Structure for Massively Scalable Machine Learning
Andrew Wilson (ORIE, Cornell)
We most want to develop flexible models for large datasets. Such datasets often contain the necessary information to learn rich statistical representations – for great predictive performance, and new scientific insights into our modelling problems. However, standard implementations of machine learning models typically involve a trade-off between scalability and flexibility. In this talk, I will argue that this trade-off is often unnecessary. Highly flexible models, such as deep neural networks or Gaussian processes, often contain a large degree of algebraic structure, reflecting the existing assumptions and inductive biases of these models. This structure can be exploited for highly efficient and exact inference and learning. For example, Gaussian processes with the popular RBF kernel are universal approximators, capable of converging to a wide range of different functions, but inference involves solving linear systems and computing log determinants over highly structured kernel matrices. I will show how to exploit structure across a variety of popular kernel functions for exact $O(n)$ training and $O(1)$ test-time Gaussian processes, in contrast to the standard $O(n^3)$ training and $O(n^2)$ test-time complexities. I will make connections to deep learning architectures and determinantal point processes, where we can also exploit such structure.