Review for NeurIPS paper: Unsupervised Semantic Aggregation and Deformable Template Matching for Semi-Supervised Learning

NeurIPS 2020

Unsupervised Semantic Aggregation and Deformable Template Matching for Semi-Supervised Learning

Review 1

Summary and Contributions: In the paper, the authors propose a semi-supervised learning framework via unsupervised semantic aggregation and deformable template matching. Specifically, unsupervised semantic aggregation is used to generate semantic labels for unlabeled data by minimizing triplet mutual information, and then a few annotated samples and the feature-level deformable template matching are used to assign proxy labels to unlabeled data for entropy minimization. I have read the rebuttal and would keep the score.

Strengths: 1. The original image with its two transforms are formulated as the triple mutual information loss for feature extraction in unsupervised learning. The idea seems novel and technical sound. 2. The paper is clear and logical. The theoretical grounding is soundness and extensive experiments are conducted to validate the proposed method. 3. The work is related to the NeurIPS community.

Weaknesses: 1. The previous related works are discussed by authors. However, how this work differs from them are not clearly discussed. 3. In experiments section, the authors listed the experimental results and elaborated them. However, there is a lack of discussions and analyses w.r.t. them, e.g., in comparisons of mutual information loss, why T-MI performs more stable than IIC? In comparisons of proxy label generator, what characteristics make DTM outperform FixMatch? etc. In other words, the reason and rationality of the results shoud be discussed and analyzed.

Correctness: Yes.

Clarity: Yes

Relation to Prior Work: More discussions are expected.

Reproducibility: Yes

Additional Feedback: 1. See the weaknesses part. 2. Why the selection of K should follows the rules in Eq.(6)? How different K impact the performance of the proposed method? 3. In Section 4, testing details, ``Fllowing" -> Following.

Review 2

Summary and Contributions: This paper addresses the problem of classification under a few annotated samples by injecting a part of the supervised information into unsupervised learning. Specifically, the authors combine both unsupervised learning and semi-supervised learning (SSL) to propose an Unsupervised Semantic Aggregation and Deformable Template Matching (USADTM) framework for SSL. For the proposed method, unsupervised semantic aggregation based on Triplet Mutual Information (T-MI) minimization is explored to generate semantic labels for unlabeled data. Then the semantic labels are aligned to the actual class by the supervision of labeled data. The proposed deformable template matching method for generating pseudo labels is a more effective way compared with confidence-based methods. Comprehensive experiments are conducted to demonstrate the effectiveness of the proposed methods.

Strengths: 1. It is novel to generate proxy labels based on deformable template matching comparing with the existing semi-supervised methods. 2. On the whole, the paper is well written, and the proposed semi-supervised framework is novel and easy to follow. 3. The proposed Triplet Mutual Information (T-MI) minimization function is used to evaluate unsupervised learning effectively, which could address the challenging difficult evaluation of unsupervised learning. Furthermore, the proposed T-MI loss function has a more robust and stable performance comparing with the original single paired mutual information functions. 4. The experimental results are comprehensive and convincing, which is comprehensively demonstrate the effectiveness of the proposed method.

Weaknesses: However, I still have some concerns about the paper as follows: 1. Do the two CNNs share the same parameters in Fig. 2? If it is true, it is better to give some annotations to explain this in the figure. 2. In Lines 169-176, the authors mentioned that there is a deviation when limited available annotated data (e.g., four samples per class) are used to represent the category center. Thus, in each epoch the authors set up a memory bank with K lengths for each class, where samples in each class with the top K cosine similarity to the class center will be recorded and added to the labeled set in the next epoch. Is the choice of Eq. (6) based on experiments or theoretical analysis? Moreover, is the proposed method robust to the selected unlabeled samples? Will the proposed method select many wrong samples? 3. How to update the memory banks? 4. What the difference and relation between the proposed method and contrastive learning? For example, momentum Contrast for Unsupervised Visual Representation Learning.

Correctness: Yes, the claims and method are correct. Yes, the empirical methodology is correct.

Clarity: Yes, this paper is well written and easy to follow.

Relation to Prior Work: Yes, the difference between this work and the previous works is well discussed in the paper.

Reproducibility: Yes

Additional Feedback:

Review 3

Summary and Contributions: This paper proposes an Unsupervised Semantic Aggregation and Deformable Template Matching (USADTM) framework for SSL semi-supervised learning (SSL), which combines the unsupervised learning and semi-supervised learning to improve the performance and reduces the cost of data annotation. The USA produces the semantic label for unlabeled data by optimizing a Triplet Mutual Information loss. And DTM generates the pseudo labels for unlabeled samples in leveraging few samples. Experimental results compared with eight state-of-the-art methods on four datasets demonstrate the validity of the proposed approach. Overall, I think the paper is slightly above the borderline.

Strengths: 1. This paper is novel, original, and well structured. It clearly references existing work on supervised and semi-supervised learning and gives an excellent overview of the area. 2. The proposed triplet mutual information loss achieves semantic label clustering for unlabeled data and has a better performance in unsupervised semantic aggregation than single paired MI loss according to the comparison experiment in section 5.2. 3. The proposed deformable template matching method for generating proxy labels is a new perspective than the current works. According to section 5.3, it achieves comparable or even better results than other methods. 4. Detailed design and analysis of ablation experiments reveal the effectiveness of each component proposed in the paper. In particular, the improvement brought by the fake labels is quite apparent.

Weaknesses: 1. It is not fully clear how the top K similarity for each class is selected to help generates more reasonable class centers. Elaborating this mechanism would help the reader. 2. The authors only give the K value selection with a class number of 10 and 100, How to confirm the K when the class number is others? 3. In the Fig. 2, the , and fc need more explanations. 4. On page 6, line 242, MNIST needs citation.

Correctness: Yes

Clarity: Yes

Relation to Prior Work: Yes

Reproducibility: Yes

Additional Feedback:

Review 4

Summary and Contributions: The authors combine both to propose an unsupervised semantic aggregation and deformable template matching framework for semi-supervised learning, which strives to improve the model's performance while reducing the cost of data annotation

Strengths: (1) The author claim that exploit triplet mutual information loss to achieve semantic labels clustering for unlabeled data in semi-supervised learning. (2) The author claim that propose a deformable template matching method for generating pseudo labels.

Weaknesses: 1. It seems trivial to extend the Triplet Mutual Information [1] and its code [2]. The contribution of the proposed method is not clear. Please explain the difference between your work and [1] about Triplet Mutual Information. [1] Wu, Jianlong, et al. "Deep comprehensive correlation mining for image clustering." Proceedings of the IEEE International Conference on Computer Vision. 2019. [2] https://github.com/Cory-M/DCCM 2. Are these parameters the same for different tasks in Table 3? Is it possible to do a 2D search instead of fixing one parameter while searching for another? 3. For the comparison, how were the parameters of other methods tuned? 4. Deformable template matching is an existing technology. Please explain the difference between your work and [3, 4] separately. [3] Lee, Hyungtae, et al. "DTM: Deformable template matching." 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2016. [4] Xu, Yuhao, et al. "Partial descriptor update and isolated point avoidance based template update for high frame rate and ultra-low delay deformation matching." 2018 24th International Conference on Pattern Recognition (ICPR). IEEE, 2018.

Correctness: Just.

Clarity: Just

Relation to Prior Work: Yes.

Reproducibility: No

Additional Feedback: