A Practical Framework for Scalable Gaussian Processes

Gaussian processes are a widely used machine learning algorithm, in part due to convenient access to uncertainty estimates and flexible modeling capabilities via composable covariance functions. Most efforts to circumvent the cubic running time of standard inference have focused on approximate modeling choices while leaving the standard linear algebra operations performed thereafter relatively unchanged. In this talk, I focus on the other side of the story and propose a method to replace these standard operations entirely for GP inference based on iterative techniques from numerical linear algebra. This strategy is compatible with and provides an asymptotic complexity improvement to many existing scalable GP methods, while often outperforming the Cholesky decomposition in terms of floating point solve accuracy. I will demonstrate theoretical convergence guarantees in certain GP settings. I will discuss an exponential improvement using this approach to the running time of scalable kernel interpolation, one example inducing point method. I will also discuss a technique for computing predictive distributions that are empirically accurate to 5 decimal places in constant time, and a technique for posterior sampling in linear time. In our software implementation of these ideas, GPyTorch, we achieve speed-ups of up to 2000x over existing software depending on the application.