Gaussian processes (GPs) define a distribution over functions that generalizes the multivariate normal distribution over vector spaces. Long used as a tool for spatio-temporal statistical modeling, GPs are also a key part of the modern arsenal in machine learning. Unfortunately, Gaussian process regression and kernel hyper-parameter estimation with $N$ training examples involve manipulating a dense $N$-by-$N$ kernel matrix, and standard factorization-based approaches to the underlying linear algebra problems have $O(N^3)$ scaling. For regression with a fixed covariance kernel, more scalable iterative methods based on fast matrix-vector multiplication with the kernel matrices are available. However, maximum likelihood estimation of kernel hyper-parameters and computation of conditional variances involve operations such as computing log derivatives and their derivatives or extracting the diagonal part of a Schur complement. New tools are needed to address these problems in a scalable manner. In this talk, we discuss our recent work on one such set of tools, based on a combination of Krylov subspace methods for matrix solves and matrix function applications together with stochastic estimators for the trace and diagonal of a matrix using only matrix-vector multiplies.

This is joint work with Kun Dong, David Eriksson, and Andrew Wilson.