Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The paper proposes a new method for provably fitting deep generative models to observations, a highly non-convex optimization problem. Instead of trying to find the latent code that explains the measurements directly, as proposed by Bora et al. this paper starts with a different deep generative model that has random weights, for which Hand et al. showed that gradient descent provably works. Then they incrementally modify the weights of the generator to approach the true generator while using the previous optimum as a starting point. This sequence of models can be snapshots of the model during the training process. The main result is a theory that shows that a warm-started non convex optimization in expansive Gaussian networks yields successful recovery. The proposed method, Surfing, can be seen as a variation of multiple random restarts. Instead of randomly restarting, the network is slightly modified and the previous starting point is used. Reviewers 1 and 2 brought up concerns about the running time of the proposed method as opposed to random restarts but the rebuttal sufficiently addressed this. This paper makes important contributions for nonconvex optimization for deep learning.