This paper proposes a gradient compression technique to speed up all-reduce type gradient accumulation for optimization with large minibatches. The reviewers were positive on average (5,6,6,7) but did point out several concerns. Author response and reviewer discussion did not have a significant impact in changing the content or scores of the reviews. To me the author feedback does look to address the most serious concerns, and in addition I think the experimental validation of the proposed method looks strong. I therefore recommend accepting this paper.