Part of Advances in Neural Information Processing Systems 37 (NeurIPS 2024) Main Conference Track
Shuyao Li, Sushrut Karmalkar, Ilias Diakonikolas, Jelena Diakonikolas
We study the problem of learning a single neuron with respect to the L22-loss in the presence of adversarial distribution shifts, where the labels can be arbitrary, and the goal is to find a "best-fit" function.More precisely, given training samples from a reference distribution p0, the goal is to approximate the vector w∗which minimizes the squared loss with respect to the worst-case distribution that is close in χ2-divergence to p0.We design a computationally efficient algorithm that recovers a vector ˆwsatisfying E_p∗(σ(ˆw⋅x)−y)2≤CE_p∗(σ(w∗⋅x)−y)2+ϵ, where C>1 is a dimension-independent constant and (w∗,p∗) is the witness attaining the min-max riskmin.Our algorithm follows the primal-dual framework and is designed by directly bounding the risk with respect to the original, nonconvex L_2^2 loss.From an optimization standpoint, our work opens new avenues for the design of primal-dual algorithms under structured nonconvexity.