Compressing Neural Networks with Interpolative Decompositions
Megan Renz (Cornell University)
Pruning and compression are frequently used to improve the efficiency of neural networks. Typical prior works use excessive fine-tuning after pruning, which results in models that do not behave substantially similarly to the original network, and conflate the improvement due to the pruning method and improvement due to hyperparameter choice during fine-tuning. Our goal is to compress deep networks into smaller models with the same computational structure that produce the same per example output as the original model. This goes beyond simple classification accuracy. We introduce a principled approach to model compression that leverages structured low-rank matrix approximations known as interpolative decompositions to accomplish this task. By explicitly building an approximation to the activation output of each layer, we simultaneously select and prune channels (analogously, neurons) and construct an interpolation matrix that propagates a correction into the next layer. Consequently, our method achieves good performance even without fine tuning and admits theoretical analysis. In contrast to many existing methods, our approach is able to automatically determine per-layer sizes for the compressed model based on a single user defined input parameter. We demonstrate the efficacy of our approach by showing strong empirical performance on a variety of tasks, models, and datasets—including one and two hidden layer networks on Fashion MNIST, 3D-CNNs on ATOM3D, VGG and MobileNets on CIFAR-10and ImageNet.