The field of graph-based learning is one that is closely connected to manifold learning. It explores the following question: given a collection of data points $x_1 , \dots, x_n$ and a similarity graph over it, how can one use this graph to learn relevant geometric features from the dataset and in turn learn about the distribution that generated it? The question becomes a geometric or analytical problem when one assumes that the sampling distribution is supported on an unknown low dimensional manifold, as it is postulated by the manifold hypothesis.

In this talk, I will discuss that, despite the multiple questions and answers that have been explored in the area of graph-based learning, there are several fundamental questions in statistical theory that have been largely unexplored, all of which are essential for manifold learning. Examples of these questions include: 1) What is the best possible estimator (potentially not graph-based), from a sample efficiency perspective, for learning features of unknown manifolds from observed data? 2) What is the asymptotic efficiency of popular graph-based estimators used in unsupervised learning? I will focus on the first type of question in the context of the task of learning the spectra of elliptic differential operators from data and will present new results that can be interpreted as a first step in bridging the gap between the mathematical analysis of graph-based learning and the analysis of fundamental statistical questions like the ones mentioned above. Throughout the talk, I will highlight the connection between the spectrum estimation and density estimation problems, and through this connection I will motivate a series of open mathematical questions related to operator learning and generative models using contemporary machine learning tools.

This talk is based on very recent work with my PhD student Chenghui Li (UW-Madison) and with Raghavendra Venkatraman (Utah).