Review for NeurIPS paper: Universal Domain Adaptation through Self Supervision

NeurIPS 2020

Universal Domain Adaptation through Self Supervision

Review 1

Summary and Contributions: This paper proposed a model named DANCE(Domain Adaptive Neighborhood Clustering vid Entropy optimization) to deal with the problem of universal domain adaptation. The contributions can be listed as follows: (i) propose a domain adaptation framework that can be used for universal domain adaptation problem (ii) design two novel loss function, neighborhood clustering and entropy separation, for such shift-agnostic adaptation problem (iii) learn discriminative features of 'unknown' target samples without supervision.

Strengths: This paper designs neighborhood clustering and entropy separation loss to better deal with the universal domain adaptation problem, which is a novel attempt. Experimental results shows that indeed DANCE outperform other methods, especially methods focusing on CDA, ODA, PDA and OPDA.

Weaknesses: The methods proposed are mainly based on empirical observation. However, DA problem is a machine learning problem rather than a simple computer vision problem, some theoretical proof or explanation will be welcomed. I think it will be better to submit this paper to a conference focusing on application rather than NeurIPS.

Correctness: Yes

Clarity: Yes

Relation to Prior Work: Yes

Reproducibility: Yes

Additional Feedback:

Review 2

Summary and Contributions: This work proposed a method called Domain Adaptative Neighborhood Clustering via Entropy optimization (DANCE) for universal domain adaptation, which can handle arbitrary category shift. DANCE proposes a novel neighborhood clustering technique to move each target sample either to a known class prototype in the source domain or to its neighbor in the target domain. Besides, an entropy-based feature alignment and rejection are proposed to align target features with the source, or reject them as unknown categories based on their entropy.

Strengths: This work proposes a simple and effective idea for universal domain adaptation, which is practical. The idea of clustering each target sample either to a known class prototype in the source domain or to its neighbor in the target domain is reasonable. The experimental comparisons are extensive, including experiments on multiple datasets, comparison with different types of methods, ablation study etc. The average performance on close-set DA, partial DA and openset DA is much better than existing methods, with a large margin. The paper is written clearly.

Weaknesses: I have only one concern: For a target sample belonging to the known category, a prototype in source and a neighbor in target may be both close to the target sample or even the neighbor in target may be closer to the target than the prototype. How does this affect the performance?

Correctness: yes, all claims and method are clear and correct.

Clarity: this work is written well and clear.

Relation to Prior Work: clearly

Reproducibility: Yes

Additional Feedback: ============== Post-rebuttal comments ============== The authors clearly answer my main concern but not explain whether it affects the performance. Considering the idea of this work is interesting and effective, i'd like to keep my score.

Review 3

Summary and Contributions: In this paper, the authors propose a method for what they call "universal" domain adaptation (closed, open, partial and mix of open and partial DA). The method, called DANCE, include two entropy-based constraints on top of domain-specific BN: (i) neighbourhood clustering and (ii) entropy separation. The authors show extensive experimental results on multiple DA settings and datasets.

Strengths: + The idea of having a single model to deal with different DA scenarios (open, close, partial and open/partial) without any prior knowledge about the category distribution is interesting, challenging and not much explored yet. + The paper has an extensive empirical evaluation, with results in different DA settings and on different domain-shift scenarios. + The proposed model achieves good results in many of the tasks evaluated

Weaknesses: - The paper has basically two main novel contributions: NC and ES. I found that both losses are designed a bit ad-hoc and more intuition on why they are designed as they are and why they work are necessary. For instance, it is not entirely clear to my why Eq. 6 would converge to clusters where each target point to known class prototype OR to its neighbour in the target. The choice for ES loss also seems ad-hoc with the \rho and m hyperparameters and not much analyses of why that loss was chosen and why does it converge to a reasonable solution. - Although I appreciate the effort of authors to validate the proposed approach, I find the experimental section difficult to follow. It is way too dense and contain way too much datasets/DA scenarios. It becomes very difficult to really see why/how the method is better than others. Specially because in most cases, the comparison between DANCE and other approaches is not really apple-to-apples. The experimental section would maybe be more readable if the authors would consider less subset fo experiments and focus more on understanding why each component of the proposed approach is useful (there is only one small table providing this information) then blindly comparing with other methods.

Correctness: The paper seems to be be in general correct, although I believe more justification (and empirical evidence) showing why/how the method work would be very helpful (see Weaknesses).

Clarity: The writing of the paper could be improved for clarity, specially Section 3 and the figures. The experimental section is also difficult to follow given the large amount of information on the relatively small number of pages.

Relation to Prior Work: Yes.

Reproducibility: No

Additional Feedback: ============== Post-rebuttal comments ============== The authors answer my concerns and I am raising my score to 6. The proposed method seem to be useful in different settings, but I still think experimental section could be more clear. I also recommend the authors to include the comments/explanations provided by the rebuttal on the revised version of the manuscript.

Review 4

Summary and Contributions: This paper presents a concept of Domain Adaptative Neighborhood Clustering via Entropy optimization (DANCE) to achieve universal domain adaptation. Specifically, this paper proposes two novel self-supervision losses: neighborhood clustering and entropy separation. DANCE performs well under all settings and the whole framework is simple and clean. In addition, DANCE can extract discriminative feature representations for “unknown” class examples without any supervision on the target domain.

Strengths: 1. This paper focuses on a more general and uniform adaptation setting and proposes a uniform solution with two specifically designed losses. 2. Clear paper writings and clear algorithm illustration. 3. Sufficient experimental results support author’s claims. 4. Satisfying performance on multiple domain adaptation tasks.

Weaknesses: 1. Although authors integrate many components such as the memory bank updating and the confidence threshold, the main ideas of this paper, Neighborhood Clustering (NC) and entropy separation (ES), seem to be very straightforward. NC is not new [1]. [1] Li S, Chen D, Liu B, et al. Memory-based neighbourhood embedding for visual recognition[C]//Proceedings of the IEEE International Conference on Computer Vision. 2019: 6102-6111. 2. Will the memory bank F and V bring in lots of additional memory cost? If so, it would largely reduce the practicability of the proposed algorithm. Needs more explanation and discussions. 3. Missing reference ID in Table2-6.

Correctness: Yes.

Clarity: Yes.

Relation to Prior Work: Yes.

Reproducibility: Yes

Additional Feedback: It would be helpful to add some discussions on effects of the employing the Domain Specific Batch Normalization. Use a higher-resolution version of the left part figure in Figure 2. --------------------------------------------------------- Authors' feedback well addressed my concerns. I will raise my score to 7.