Review for NeurIPS paper: One-bit Supervision for Image Classification

NeurIPS 2020

One-bit Supervision for Image Classification

Review 1

Summary and Contributions: This paper proposes an iterative weakly-supervised active learning method that learns from the one-bit label annotating whether the model prediction is correct or not. In each iteration, it selects part of the unlabeled set for querying and updates the model with the queried annotation. Experiments show that the proposed model performs better than the semi-supervised baseline when both are given the same bits of labeling information.

Strengths: 1. The paper is well-written and easy to read. The literature review is comprehensive and the motivation and method are clearly stated. 2. The one-bit supervision setting is novel from my knowledge. 3. The ablation studies are helpful for readers to understand how the method works. The observations about the optimal partition between two stages and between fully supervised and one-bit supervised are both interesting.

Weaknesses: 1. (Main concern) The paper compares the proposed one-bit supervision with full supervision given the same amount of information bits. However, the number of information bits is not necessarily proportional to the labeling time. For example, in practice labeling 3 images with one-bit labeling could take more time than labeling one image with 8 classes. The paper would be more convincing if the labeling time of different types of supervision can be shown like in [1]. 2. The proposed model still needs some fully-labeled samples to train the initial model. It would be interesting to show how the model would perform with only one-bit labels. Minor: L18: belong -> belongs L27, 28: gives -> give L341: in a time -> at a time [1] Bearman, Amy, et al. "What’s the point: Semantic segmentation with point supervision." European conference on computer vision. Springer, Cham, 2016. -----------Update------------ The additional info authors provide has addressed my main concern. I'm raising my score.

Correctness: yes.

Clarity: yes.

Relation to Prior Work: yes.

Reproducibility: Yes

Additional Feedback:

Review 2

Summary and Contributions: This paper proposes a novel weakly-supervised learning methodology, named one-bit supervision, for image classification which just needs the annotator to verify whether an image belongs to a specified class. In order to acquire more correct guesses, it starts with a semi-supervised method and makes use of a multi-stage training paradigm to use the supervision quota. Meanwhile, a method of label suppression is proposed to utilize incorrect guesses. Experiments on three popular benchmarks show that the proposed method outperforms the semi-supervised learning under the same bits of supervision.

Strengths: 1. The idea that annotating an image by answering a yes-or-no question is very interesting. 2. This new learning methodology seems easy to apply in the real world. 3. Sufficient experiments on three image classification benchmarks verify the effectiveness of the proposed method. 4. The proposed multi-stage training framework and the method of negative label suppression effectively improve the performance of the proposed method, which is verified in the ablation study. 5. The paper is well written and technically are clearly explained.

Weaknesses: 1. The idea that doing extra annotations using the model predictions is related to the technique described in “We don’t need no bounding-boxes: Training object class detectors using only human verification”, which is ignored by the authors. 2. How do you handle the samples selected twice in the second training stage? Please explain in detail. 3. The results on the Imagenet dataset have not achieved the SOTA performance. 4. There are no experiments in the paper to accurately verify whether the proposed method can reduce the annotation cost.

Correctness: Yes, the claims and methods are correct.

Clarity: Yes, the paper is well written.

Relation to Prior Work: Yes, compared to previous approaches that need to obtain the correct label for each image, the proposed system annotates an image by answering a yes-or-no question for the guess made by a trained model.

Reproducibility: Yes

Additional Feedback:

Review 3

Summary and Contributions: In the context of semi-supervised learning and active learning, the paper proposes the one-bit supervision learning strategy for image classification. The idea is to leave a portion (in the extreme case could be all or none) of the labeled data in a semi-supervised learning setting to be guessed with yes/no answer for a class for each sample, instead of directly learning with the ground truth labels (i.e., the full bit learning). Two issues are studied: how to make a good guess and how to make use of the negative samples (those guessed wrongly). Solutions of a multi-stage training with partitioning the portion into multiple parts for iteratively improving guessing accuracy and a negative label suppression to artificially force the score of the negative samples to zero. Evaluations are reported from three benchmarks to demonstrate the effectiveness of this approach.

Strengths: 1. The work is certainly relevant to the NeurIPS community 2. The proposed method appears to be technically correct and advances the SOTA 3. The evaluations appear to be extensive and convincing 4. The paper is well-written and the presentation is clear and easy to follow, except for a few typos and grammatical errors 5. The proposed method is simple and easy to implement 6. Though not directly stated in the paper, the paper is inspiring to a broader research question. See my comments below.

Weaknesses: 1. The novelty of the proposed method is arguably limited. See my comments below. 2. There are minor issues in presentation

Correctness: Yes

Clarity: Yes

Relation to Prior Work: Yes

Reproducibility: Yes

Additional Feedback: I consider this work as a new method in the context of semi-supervised learning and actively learning. Indeed, these are the two topics the authors of the paper reviewed as the related work to this work. The method essentially is yet another way to rearrange labeled samples and unlabeled samples in order to identify “active” samples to improve the learning accuracy. Thus, it is not an eye-opening, truly novel approach. I would argue that this method is incrementally novel at best. On the other hand, the method is pretty simple and easy to implement, and also it appears to be working, which is considered as an advantage. This work, however, brings up a broader, more interesting question that is worth investigating. That is, given a semi-supervised setting of a dataset and a semi-supervised learning method, is there anyway to further raise up the learning efficacy through the manipulation and rearrangement of the samples in the dataset (but without changing the original setting) by introducing a new mechanism (such as the one-bit supervision mechanism introduced in this paper)? The presentation in general is good and easy to follow. There are a few typos and grammatical errors. Table 4 appears to be incomplete per the discussions in the text. The above was my original review. After the authors uploaded their rebuttal, I read it carefully. The authors addressed all of my concerns I stated in my original review, though I was still not convinced that this work was a piece of truly innovative contribution. So I stay with my original review and overall rating. I also looked at the other reviews and I think all the reviews are more or less converged. By the way, in the authors' rebuttal, it appears that they did not use cut and paste from my original review. They had a typo in copying my original statement: more interesting, not mode interesting.

Review 4

Summary and Contributions: After carefully reading the authors' rebuttal and the other reviews, my opinion of the paper is not changed. The authors addressed my concerns but I still not fully convinced that this is truly innovative contribution. Nonetheless, I believe it is worth of discussion for the community. Hence, my overall rating is unchanged. ---- The paper is about a novel way to exploit supervision in an active learning setting when training classifiers, i.e with a given fully annotated training set and another set of images that can be ask to be partially labeled. Given that annotating datasets is costly, the idea is to reduce the burden on annotators by shifting from the question "which class this image belongs to?" to "does this image belong to this specific class x?". Asking if an image belongs to a class is considered less costly for an annotator but reduce the amount of bits of supervision available for training a classifier. The classifier is updated multiple times during the whole process of training the system. The paper contains the proposal of this semi-supervised problem, considerations about when and how many times a classifier should be updated during the process, how to exploit the one bit supervision in case of negative guess. Experiments show the importance of using both the positive and negative guesses in the known classifier Mean-Teacher, able to reach state of the art performance on three public datasets. Ablations studies clarifies basic choices of how many stages, which images should be guessed between random or easy/hard images, the size of the starting fully annotated dataset. Discussion on past and future work, provides an outline of possibile directions from this contribution.

Strengths: + The paper is easy to read and well presented. I appreciated the discussion of past and future work in relation to the contribution. It is definitely of relevance for the neurips community. + The contribution is novel for the active learning approaches, while also related to semi-supervised learning. + The idea is simple and execution straight forward, which is a plus.

Weaknesses: - The fact that it is simpler and cheaper to obtain one bit supervision versus full supervision is not explored and given as hypothesis from the introduction section. However, intuitively it may be that obtaining one bit supervision can be as expensive as full supervision. For instance, I think about the expertise of an annotator: if he/she immediately recognize the class of an image, it is straightforward to say yes or no, but it is also straightforward to say which class it is, so they have the same cost. Besides [29][40] which shows that it is hard for a human to memorize all classes in the case of ImageNet, it would be important to show which are the scenarios where one bit supervision is actually useful. A realistic study of annotating a dataset with this technique would be interesting to show. - The paper is posed more as a well executed proof of concept with limited scope which is left for the community. Experiments tests the Mean-Teacher method only, which can be deemed enough for showing potential of the idea.

Correctness: The method appear to be generally correct and methodologically correct. - In sec 3.3, while intuitively the first stage quota seems the most important (for (i)), the results in table 3 show that the accuracy is comparable in all cases (for instance on CIFAR100, 73.36 % vs 74.10 vs 73.76% and 73.33%), as similarly stated for (ii) when 73,76% is comparable to 74,72% when changing the number of stages. Hence, I do not agree that the more stages experiments show comparable results while table 3 show not comparable results...

Clarity: The paper is easy to read and well presented.

Relation to Prior Work: The paper contain a throughly discussion of past and future work.

Reproducibility: Yes

Additional Feedback: - For the sake of precision, in Figure 1 the blue triangles are concentred in a separate area with respect of the dark circles. It may give the impression that the training set is completely separate from the other sets, while in reality they have similar distribution. Hence they should be shuffled together. - Table 4, the labels per class should be 10% 30% 50% ?