The paper first shows that the softmax gives a biased gradient estimation under the long-tailed setup, and proposes a balanced softmax to accommodate the label distribution shift between training and testing. Theoretically, the authors derive the generalization bound for multiclass softmax regression. They then introduce a balanced meta-softmax procedure, using a complementary meta sampler to estimate the optimal class sample rate and further improve long-tailed learning.Experiments demonstrate that this outperforms SOTA long-tailed classification solutions on both visual recognition and instance segmentation tasks. The paper was reviewed by the four reviewers that found strengths and weaknesses. The strengths were the fact that the idea is intuitive and simple to implement, the theoretical derivations in support of the method, and the good results. The weaknesses include the fact that method combines several ideas and it is difficult to see exactly which most contributes to its success, concerns about the experimental results that may invalidate the significance of the SOTA claims (use of strong baselines), similarity to the proposed softmax to others in the literature, similarity of the meta sampler to others in the literature, lack of discussion of these issues, and many other questions of technical detail. The author rebuttal satisfied some of the concerns but not all, e.g. even the most positive reviewer acknowledged questions about the fairness of the experimental set-up. In the end, the paper had three positive reviews of relatively low confidence and one very confident negative review, which raised most of the weaknesses above.