Eran Malach, Shai Shalev-Shwartz
It is known that learning deep neural-networks is computationally hard in the worst-case. In fact, the proofs of such hardness results show that even weakly learning deep networks is hard. In other words, no efficient algorithm can find a predictor that is slightly better than a random guess. However, we observe that on natural distributions of images, small patches of the input image are corre- lated to the target label, which implies that on such natural data, efficient weak learning is trivial. While in the distribution-free setting, the celebrated boosting results show that weak learning implies strong learning, in the distribution-specific setting this is not necessarily the case. We introduce a property of distributions, denoted “local correlation”, which requires that small patches of the input image and of intermediate layers of the target function are correlated to the target label. We empirically demonstrate that this property holds for the CIFAR and ImageNet data sets. The main technical results of the paper is proving that, for some classes of deep functions, weak learning implies efficient strong learning under the “local correlation” assumption.