Review for NeurIPS paper: Part-dependent Label Noise: Towards Instance-dependent Label Noise

NeurIPS 2020

Part-dependent Label Noise: Towards Instance-dependent Label Noise

Review 1

Summary and Contributions: In this paper, a new method for approximating the instance-dependent label noise by using parts-dependent label noise is proposed. The method is built upon the assumption that an instance-dependent transition matrix for an instance is constructed by a combination of the transition matrices for the parts of the instance. The motivation for this is based on the conjecture that instances can be approximately reconstructed by a combination of parts. Here, The transition matrices for the parts are learned based on anchor points. Finally the method is evaluated on some small dataset, Fashion-MNIST, SVHN, and CIFAR-10 to show the superiority of the methods in comparison to SOTA.

Strengths: + a new and interesting method for approximating the instance-dependent label noise. + rigorous analysis and evaluation on small datasets to demonstrate the superiority of the proposed method in comparison to SOTA.

Weaknesses: 1- The main assumption in this method for part-dependent label noise is not realistic to me. The method learns the parts using the anchor points, and these parts are used for all the samples belonging to the same class for representing them. However, there are images in the same classes (e.g., hard samples) that may not be represented properly using the combination of the learned parts from the anchor points. Considering the fact that anchor points are usually easy samples as the model has a high confidence regarding their labels and may not contain all the information of other samples belonging to the same class. However, I think the method only has been evaluated on the small datasets which may not show this. 2- In the paper it is mentioned that “the data matrix could consist of deep representations extracted by a deep neural network trained on the noisy training data.” When the deep model is trained using the noisy labels, I was thinking that the features or data matrix is not the accurate or reliable features that we could define a loss function like EQ. (1). And therefore “W, matrix of parts, and h(xi) may be poorly estimated. 3- As a minor point: “The parameters for reconstructing the instance-dependent transition matrix are identical to those for reconstructing an instance.” This conjecture has been used for estimating h(xi) parameters in the model which is an important parameter.” Although there are some references in the paper for supporting this conjecture, could you please be more specific in references and elaborate on this. 4- Finally, it would be very nice to visualize parts for some classes of datasets used in the experiments, like the one shown in Figure. 1. This can better contextualize the power of the proposed method for reconstructing images using the parts learned by the proposed model.

Correctness: Yes, to some extent.

Clarity: Yes, the paper is well-written. The concepts are explained precisely.

Relation to Prior Work: Yes.

Reproducibility: Yes

Additional Feedback: Please refer to weaknesses. -------------------------------------------------------------- after rebuttal: I read the authors' feedback and the other reviewers' comments. I am happy that the authors release my main concerns that I mentioned in my reviews. For example, the early stopping technique on validation set for alleviating the effect of memorization in deep models. I hope that the authors visualize parts for some classes of datasets and also improve the clarity of the conjecture used in their work in the camera ready. Overall, I think this is a good piece of work for approximating the instance-dependent label noise. Therefore, I change my score from marginal accept to a clear accept.

Review 2

Summary and Contributions: This paper investigates the problem of multi-class learning with instance-dependent label noise. The proposed method of this paper is motivated by the human cognition that annotators are more likely to annotate instances based on the parts rather than the whole instances. Each instance can be approximately reconstructed by a combination of parts. So this paper first proposes to use non-negative matrix factorization to obtain the reconstruction weights. Then, this paper exploits the anchor points to learn the parts-dependent transition matrices. Finally, the instance-dependent transition matrix (of each instance) could be reconstructed by the parts-dependent transition matrices and reconstruction weights. Overall, I think this is a good paper, because the problem of multi-class learning with instance-dependent label noise was rarely studied and this paper proposes an interesting and decent method to solve this problem. I feel that the most interesting point is that although we cannot directly obtain the instance-dependent transition matrix for each instance, but we can still exploit the anchor points to directly obtain the instance-dependent transition matrix for each anchor point. In this way, the parts-dependent transition matrices can be learned by using those transition matrices of anchor points and the reconstruction weights.

Strengths: 1. The problem of multi-class learning with instance-dependent label noise was rarely studied. 2. This paper proposes an interesting and decent method to solve this problem. 3. Experiments demonstrate the effectiveness of the proposed method.

Weaknesses: 1. I feel that the presentation of the subsection "Learning the parts-dependent transition matrices" could be further improved, because it seems that the key point is not so clear. 2. For the experimental results, it would be better to show whether the performance gap is significant or not.

Correctness: Yes

Clarity: Yes

Relation to Prior Work: Yes

Reproducibility: Yes

Additional Feedback: Please see the weaknesses. ======= I have read the comments and the rebuttal. The authors said that they conducted significant tests and all results were statistically significant, and will provide experimental results in the final version. So I keep my rating (accept) unchanged.

Review 3

Summary and Contributions: Learning with instance-dependent label noise is a vital part of label noise learning. However, it’s hard to model such realistic noise only exploiting noisy data. This paper assumed that the instance-dependent label noise depends only on parts of instances, which was motivated by annotators annotate instances based on their parts. The authors further proposed to approximate the instance-dependent transition matrix with a combination of parts-dependent transition matrices. Extensive experimental results on syntheic and real-world datasets verified the effectiveness of the proposed method. Overall, this paper makes a significant contribution to learning with instance-dependent label noise.

Strengths: 1. The proposed method is very novel and well motivated, which makes a lot of sense in learning with instance-dependent label noise. 2. The explanation of the proposed approach is sufficient and clear. Besides, the organization and logic of this paper make it easy to understand this idea. 3. The description of experimental settings is detailed. The experimental results on benchmark datasets are very convincing. Besides, the authors provided a detailed analysis of experimental results.

Weaknesses: 1. Although the idea makes sense, the proposed method contains multiple stages according to algorithm flow. It will be better to achieve end-to-end training. 2. In addition to the experimental results, this paper needs some more detailed explanations intuitively or theoretically as to why the other state-of-the-art methods don’t work for learning from instance-dependent label noise.

Correctness: Yes, the claims and method are correct. The empirical methodology is correct.

Clarity: Yes, the paper is well written.

Relation to Prior Work: Yes, it is clearly discussed how this work differs from previous contributions.

Reproducibility: Yes

Additional Feedback: 1. Non-negative matrix factorization (NMF) plays an important role in learning instance-dependent transition matrix. But except for this paper, NMF and label noise learning are usually not mentioned together. It will be better to provide more related work about NMF or other parts-based learning techniques in main paper or supplementary materials, which will be easier for readers to understand the proposed method. 2. The authors did’t use any clean data in the experiments on synthetic datasets and Clothing1M. I want to know how can this method utilize a small trusted dataset. 3. Some descriptions of experimental settings need to be added. For instance, are data augmentation techniques, e.g., random crop and horizontal flip, used in the experiments? Please give some explanation or emphasis. 4. In this paper, a slack variable is introduced to revise all the instance-dependent transition matrices. I would like to know whether it is possible to introduce different slack variables for different instance-dependent transition matrices. 5. All the datasets used in the experiment are image datasets, i.e., MNIST, Fashion-MNIST, SVHN, CIFAR-10 and Clothing1M. The authors employ NMF to learn parts of image objects and combination parameters. Note that NMF also can be exploited for text data to learn the parts of sentences, i.e., some key words. However, the authors don’t provide experimental results on the text dataset. It will be more convincing if the authors can verify the effectiveness of the proposed methods through experiments on text datasets.

Review 4

Summary and Contributions: The paper is regarding to learning with the instance-dependent label noise. The authors approximated the instance-dependent transition matrix for an instance by a combination of the transition matrices for the parts of the instance. The transition matrices for parts were learned by exploiting data points that belong to a specific class almost surely.

Strengths: The authors proposed an assumption for instance-dependent label noise: The noise depends only on parts of instances. The authors also proposed how to learn parts-dependent transition matrices.

Weaknesses: Descriptions on "the state-of-the-art approaches" for learning from the instance-dependent label noise are not sufficient. Authors stated "Empirical evaluations on synthetic and real-world datasets demonstrate our method is superior to the state-of-the-art approaches for learning from the instance-dependent label noise." What is the rationale of achieving the outperformance theoretically and methodologically?

Correctness: Looks correct.

Clarity: The writing can be improved.

Relation to Prior Work: Can be improved.

Reproducibility: Yes

Additional Feedback: After reading the feedback from the authors, I found that the following question is not sufficiently addressed. "What is the rationale of achieving the outperformance theoretically and methodologically?" There is no theoretical justification for the proposed method in this manuscript. There are lots of existing nonnegative-matrix-factorization related methods and variants. The discussion of "the state-of-the-art approaches" is not sufficient.