Efficient natural gradient method for large-scale optimization problems

Yunan Yang (Cornell, Math)

First-order methods are workhorses for large-scale optimization problems, but they are often agnostic to the structural properties of the problem under consideration and suffer from slow convergence, being trapped in bad local minima, etc. Natural gradient descent is an acceleration technique in optimization that takes advantage of the problem’s geometric structure and preconditions the objective function’s gradient by a suitable “natural” metric. Despite its success in machine learning, the natural gradient descent method is far from a mainstream computational technique due to the computational complexity of calculating and inverting the preconditioning matrix. This work aims at a unified computational framework and streamlining the computation of a general natural gradient flow via efficient tools from numerical linear algebra. We obtain robust numerical methods for natural gradient flows without directly calculating, storing, or inverting the dense preconditioning matrix. We treat various natural gradients in a unified framework for any loss function.