Best (Worst-Case) Linear Predictors

Note change of location: Gates 114

We study the classical problem of predicting an outcome variable using a linear combination of a d-dimensional covariate vector. We are interested in linear predictors whose coefficients minimize the p-norm of the prediction error adjusted by a convex penalty (e.g., the square-root lasso, the square-root slope, etc). We show that, under some regularity conditions, each of these penalized linear predictors optimize the worst-case prediction error over a class of distributions determined by the p-norm and the penalty function. Our proof—which generalizes the work of Bertsimas and Copenhaver (2018)—is constructive and shows that the worst-case performance is typically achieved at an additive perturbation of the baseline distribution of covariates and outcomes. When the convex penalty is proportional to a norm, we provide theoretical support for recommending a tuning parameter of order d/n (up to a logarithmic factor), where n is the size of the available training data.