It is shown that running SGD on a non-convex surrogate for the 0-1 loss converges to a near optimal halfspace under adversarial label noise. The resulting algorithm is much simpler than others available in the literature. It is also shown (by a novel lower bound) that using convex surrogates cannot really improve the performance. The reviewers are generally positive about the paper. In the revised version the authors should highlight better the difference compared to [DKTZ20].