Reviews: Fast and Accurate Stochastic Gradient Estimation

This paper received extensive discussion by the reviewers, the meta-reviewer, the SPC, etc. Here is a meta-review summary. The paper considers the problem of adaptively sampling training examples in stochastic optimization, and it shows that it is possible to do so without a per-iteration cost of O(N). This is of interest by itself, since one typically thinks that such sampling requires maintaining a distribution over training examples, which requires O(N) in every iteration, i.e., which is as expensive as full-batch gradient descent. A second aspect of this paper is that the mechanism by which the authors accomplish this is to use LSH, which is a sketching method usually used for nearest neighbor search. Showing that LSH can be used in this way is also interesting by itself since it opens up the possibility to use this method beyond the usual nearest neighbor techniques. On the whole, the paper could be improved (several reviewer comments address this) and there is certainly obvious follow-up work (several reviewer comments also address this), but the current paper as it stands contains a novel combination ideas and methods will be of interest to the NeurIPS community. In particular, the proposed method is applied to SGD, but the basic idea proposed in the paper is orthogonal to particular optimization algorithms and could be applied to momentum, Adam or any other method. Also, while the approach is only derived for linear regression and logistic regression, as pointed out by the authors, it can be extended to models that can be locally linearized and perhaps other extensions in future work. Illustrating the method even in the particular contexts that were used is valuable for the community. After discussion, it was felt that this is the sort of paper that it not an incremental improvement of a paper from last year, but instead combines ideas from several areas in novel ways. While the ideas may take more time to develop fully, the whole point of publishing is to make sure the broader community can build on the early papers and extend the ideas further, and this effort should not be discouraged. Several of the less positive reviews were evaluated by the meta-reviewer, the SPC, etc. in light of that.

Paper ID:	6671
Title:	Fast and Accurate Stochastic Gradient Estimation