The authors propose an interesting technique for quantizing by rescaling gradients during training. It is an interesting idea that can built upon even if the results are not SOTA. The reviewers have concerns that the comparisons in the paper are not appropriately made with other QAT approaches. It is recommended that the authors address this to make their paper as strong as possible.