Thank you for your submission. There were many internal discussion about the paper. R3 championed the paper and appreciated the fact the method has theoretical footing. R1 & R2 raise critical issues with the empirical evaluation. R1 correctly highlighted that experiments do not include important baselines. Additionally, the evaluation was done on a nonstandard learning rate schedules, and results on standard learning rate schedule are not fully convincing (feedback didn’t resolve this issue). R1 and R2 were also not convinced about hyperparameter selection. However, R2, R3, R4 found theoretical results important. Reviewer R2 raised the score based on the value proposition of the theoretical results. Based on the value of theoretical results I am happy to accept the work. However, this is conditional on addressing reviewers comments, and please pay special attention to R2 comments: (1) comparing to magnitude prunning v2, (2) changing wording around “competitive results” to something along the lines of “the technique shows promise, producing accuracy and sparsity trade-offs that are within range of contemporary techniques”. Please also include a detailed discussion of how easy it is to tune the hyperparameters.