This paper proposes a new model to extend Transformer. The key idea is to shrink the network to improve efficiency in computation. Strength • The proposed method is novel and technically sound. • Experiments have been conducted and the results are convincing. Weakness • The model is a little bit complex.