Fast training and model compression are important issues when applying machine learning techniques in practice. The proposed 4-bit training method in this paper is novel. The empirical experiments are comprehensive and the results are promising. A minor issue is that it does not seem very clear how hardware could well support this method. Please add some discussions on this in the final version.