On Convergence and Generalization of Dropout Training

Part of Advances in Neural Information Processing Systems 33 pre-proceedings (NeurIPS 2020)

Bibtex »Paper »Supplemental »

Bibtek download is not availble in the pre-proceeding


Poorya Mianjy, Raman Arora


We study dropout in two-layer neural networks with rectified linear unit (ReLU) activations. Under mild overparametrization and assuming that the limiting kernel can separate the data distribution with a positive margin, we show that the dropout training with logistic loss achieves $\epsilon$-suboptimality in the test error in $O(1/\epsilon)$ iterations.