This paper proposes a simple heuristic for reducing the work and memory required for training networks. The heuristic is to TopK the weights and activations. Applying TopK to the weights isn't by itself novel, but the combination with the activation TopK is new. The paper is a good starting point as a review and a novel, if straight forward, combination of prior work. Simple techniques which work are valuable because of their simplicity and not in spite of it. It is difficult to achieve acceleration on existing hardware with this technique, but the authors provide lots of simulator / cycle analysis which could guide future hardware development. The experiments are rather extensive in supporting the efficiency of the method. There are some questions about the accuracy of the FLOP counts provided for some prior techniques (see Reviewer #2 re: RigL and SNFS). I hope the authors are able to correct this and other issues raised in the reviews in the final version.