Two reviewers indicated accept and two reviewers indicated weak reject. The reviewers praised the good theoretical motivation and analysis, as well as the simple solution proposed in the paper, making it easy and widely applicable. Some concerns raised include missing experimental details, as well as the motivation of the method. While I agree with R1 that the rebuttal was not convincing, I believe that such method is still useful (eg when training a single model and deploying it on different hardware), and experimental results promising. Therefore, the paper is accepted.