Oblique Randomized Decision Trees and Dimension Reduction

Random forests are a widely used class of prediction algorithms made of ensembles of randomized decision trees. The most common algorithms use only one covariate of the input data to partition the data in a given node of a tree. Oblique random forests are variants where splits are allowed to depend on linear combinations of the covariates. These versions have shown improved empirical performance, but lack corresponding theoretical guarantees. In this talk, we will discuss a class of efficiently generated random tree estimators called oblique Mondrian trees, built by first selecting a set of features from linear combinations of the covariates and then running a Mondrian process that hierarchically partitions the data along these features. Our analysis demonstrates these estimators have the potential to adapt to multi-index dimension reduction models for which the output depends on a general low-dimensional relevant feature subspace and quantifies how robust the risk is with respect to error in the estimation of these relevant features. We will then discuss an approach for identifying the relevant feature subspace that uses a standard Mondrian forest to estimate the expected gradient outer product (EGOP) of the regression function. Finally, we introduce a complete iterative algorithm for prediction called a Transformed Iterative Mondrian (TrIM) forest to improve the standard Mondrian forest estimator by using the EGOP estimate to update the set of features and weights used by the Mondrian partitioning mechanism. Based on joint work with Ricardo Baptista and Yangxinyu Xie.