Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The paper proposes to use multiple meta classes, each of which which are defined to be partitions of the original classes, to create a classifier that is more robust to label noise. The idea is that, while some of the original labels may be incorrect, they are more likely to be correct in the new meta classes. Furthermore, given enough overlapping meta classes, it is possible to infer the original class by meta class membership. In practice, the authors use meta classes that are binary partitions of the original class space, which are constructed by clustering on subsets of features from a classifier trained on the original noisy labels. This is a nice idea, and experiments show that it works well. Originality: the idea here is novel (as far as I know) and innovative, going beyond an incremental contribution. Related work is adequately discussed. Clarity: overall, the paper is quite clear and provides a sufficient level of detail. Some minor language errors remain, but very readable for the most part. Significance: interesting idea, advances the state of the art, likely to be interesting for others working in the area.
The combinatorial (meta- or super-class) idea is interesting: it is reasonable and one easily expects to work well. In terms of related work, I suggest add 2 related papers. One is ECOC (Solving Multiclass Learning Problems via Error-Correcting Output Codes, JAIR 1995), which is a classic combinatorial method for classification. The other one is PENCIL (Probabilistic End-to-end Noise Correction for Learning with Noisy Labels, CVPR 2019), which is a novel noise handling method. With regard to the method, the proposed probabilistic way to decipher class from meta-class is simple. Does it have any guarantee (especially with under different hyper-parameters like k' and M)? It is possible that the mathematical underpinning of ECOC could be useful. For clustering, the current description is pretty terse. For example, I suggest the author(s) may make use of a supplementary material to provide more details (this submission did not have a supplementary material, though). Finally, why the experiments on WebVision are a small scale / downsampled one? In fact, this is the most important experiment (because it is real-world data). I expect to see how the CombCls method works in reality. And, I suggest ablation studies in terms of hyper parameters, especially K' and M. -- The author response addressed most of my questions. The large scale one is missing -- it is reasonable because there may not be enough time to run a large scale webvision experiment during the short rebuttal period. However, I suggest that the author(s) test this method in real-world large scale problems.
Originality: The ideas in the paper are quite original. Related work has been cited well. Quality: The work is technically sound. The conclusions are supported by the experiments. Clarity: Please give more details of the reinforcement learning task used for learning optimal collection meta class sets. Some intuition on what makes an optimal collection should be provided. Significance: The improvements are modest for the WebVision dataset. More empirical results would be required to assess the significance. --- Post rebuttal response: The authors responded to my concerns adequately. After looking at the comments of other reviewers and the author's response, I'd like to change my score to 7.