The submission introduces a new loss function for hierarchical multi-label classification. The justification of the loss function is purely empirical given in a form of results obtained on an illustrative synthetic example. The learning under this loss can be efficiently performed using GPUs. The introduced algorithm obtains the state-of-the-art results. The reviewers agreed that the paper is clearly written, the loss function well-motivated and interesting, and the results worth publishing. Nevertheless, the paper would certainly gain by extending it by theoretical analysis of the loss function. It would be interesting to learn, for example, what is the Bayes optimal decision for this loss function and see an analysis on how well it behaves in optimization (convexity, smoothness). From this point of view the paper is unfortunately very shallow, but the general idea thought-provoking.