Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The authors propose an algorithm for distributed SGD that combines gradient sparsification, gradient quantization and local computation with periodic gradient accumulation. Novelty may be limited (as a combination of existing techniques), but the authors claimed that the theoretical analysis is challenging, and experimental results are reported on a practical distributed SGD problem. The paper is generally well written and easy to follow. However, the main concern is on the the experimental section, which is difficult to follow and important details are missing. A better description and discussion of the experimental results should be added.