Levi Boyles, Anoop Korattikara, Deva Ramanan, Max Welling
Learning problems such as logistic regression are typically formulated as pure optimization problems defined on some loss function. We argue that this view ignores the fact that the loss function depends on stochastically generated data which in turn determines an intrinsic scale of precision for statistical estimation. By considering the statistical properties of the update variables used during the optimization (e.g. gradients), we can construct frequentist hypothesis tests to determine the reliability of these updates. We utilize subsets of the data for computing updates, and use the hypothesis tests for determining when the batch-size needs to be increased. This provides computational benefits and avoids overfitting by stopping when the batch-size has become equal to size of the full dataset. Moreover, the proposed algorithms depend on a single interpretable parameter – the probability for an update to be in the wrong direction – which is set to a single value across all algorithms and datasets. In this paper, we illustrate these ideas on three L1 regularized coordinate algorithms: L1 -regularized L2 -loss SVMs, L1 -regularized logistic regression, and the Lasso, but we emphasize that the underlying methods are much more generally applicable.