Review for NeurIPS paper: Domain Generalization for Medical Imaging Classification with Linear-Dependency Regularization

NeurIPS 2020

Domain Generalization for Medical Imaging Classification with Linear-Dependency Regularization

Review 1

Summary and Contributions: This paper presents an approach for training generalizable image classification and segmentation models for medical imaging. The authors introduce an approach based on feature linear dependency modeling and distribution alignment.

Strengths: Impactful work New method Theoretical analysis

Weaknesses: Unrealistic clinical scenario Lack of the most obvious baseline

Correctness: The claims are correct

Clarity: The paper will benefit from proofreading by a language expert

Relation to Prior Work: The prior work needs to be discussed better. Such methods as mixup and virtual mixup are not discussed, but should be.

Reproducibility: Yes

Additional Feedback: - Typos need fixing. E.g. line 49 has “to to” typo. I also recommend proofreading the paper - I found confusing reading some sentences - Line 140: SVD already includes the word “decomposition” - Table 1 should report standard errors instead of standard deviations - The proposed method assumes that there are data from multiple domains. However, this situation is quite rare in practice. This is related to my latest point. - What concerns me the most is that the authors did not try simpler regularization techniques, such as mixup. Since the authors claim the linear dependency between the domains, it is essential to test mixup and compare it to their method. - What happens when the data from 1 or 2 domains are used for training? This could highlight a more realistic scenario

Review 2

Summary and Contributions: This paper targets to the domain generalization for medical data. It proposes to marriage the advantage of data augmentation and domain alignment to tackle the domain generalization problem for medical imaging classification. It assumes that there exists linear dependency in a latent space among various domains. Both imbalanced-category based skin lesion classification and segmentation (pixel-wise classification) are evaluated.

Strengths: It is technically sound. The connection of data augmentation based domain generalization and domain alignment are investigated. Clear writing for the method part. It is easy to understand. Good performance on the medical domain generation task.

Weaknesses: The linear combination of multi-source domain has been investigated in [a,b], which shoule be discussed. Multi-source Domain Adaptation for Face Recognition Multi-Source Domain Adaptation: A Causal View Using KL divergence or GAN to align many source domains is a quite common operation. The method seems like a simple combination of [37] and KL-divergence based alignment. Is the other divergence measure can be applied for this framework? The abstract claims that the medical image are usually heterogeneous, but the tested datasets are homogeneous. The domain gap are quite small since all of the domain are images and can use the same backbone. There are many domain generalization task for general case satisfied this condition. Therefore, I would think this paper is limited in the application part. It is not clear why the propose method can outperform the recent domain generation methods [7,21].

Correctness: It seems technically sound. Actually both the data augmentation and domain alignment are well studied for domain generalization.

Clarity: I am satisfied with the clarity of the method parts.

Relation to Prior Work: The related work part just simply list the previous works without further discussion of the difference with the submission. Moreover, the aforementioned [a,b] should be discussed.

Reproducibility: Yes

Additional Feedback:

Review 3

Summary and Contributions: The authors propose a way to tackle the domain generalization problem which marriage the advantage of data augmentation and domain alignment. In this paper, the authors train a deep neural network with a novel rank regularization term on latent feature space by setting the rank of latent feature to be the number of categories with a pre-defined prior distribution.

Strengths: In this paper, the authors point out that regular data augmentation can be conducted through linear transformations. Therefore, there exists linear dependency on the latent feature space. Furthermore, this paper regularizes rank based on the latent feature space. In order to extract shareable information among multiple domains, the authors propose to conduct variational encoding by adopting KL divergence to match the latent features from soured domain to gaussian distribution.

Weaknesses: This paper misses details on reproducing the experiments, such as layer setting, and hyper parameters.

Correctness: Correct

Clarity: The paper is well structured with theoretical analysis. There are some typos need to be fixed.

Relation to Prior Work: Yes. The authors discuss the improvement and motivation for each proposed method.

Reproducibility: No

Additional Feedback:

Review 4

Summary and Contributions: This paper targets at domain generalization for medical image classification. The topic is quite valuable. The authors tried to solve the domain generalization from two aspects: a) linear dependency regularization (low rank constraint); b) domain alignment to a prior distribution.

Strengths: 1. The topic in this paper is very meaningful. 2. I think the idea of domain alignment to a prior distribution in the latent space is very interesting and also feasible. 3. The theoretical and experimental analysis validate the effectiveness of the proposed method.

Weaknesses: 1. I'm not convinced by the method of the low-rank constraint, especially setting the rank to a number (C), because I think it could potentially suppress the representation ability of the model. Also, how to select the rank is another problem. 2.More details about the experiments for the ablation study will help us to understand the advantage of the proposed method, especially the one about the rank (for example, how you conduct the comparison experiment with traditional low rank constraint). 3. How do you select lambda_1 and lambda_2? Will you release the code later?

Correctness: Yes.

Clarity: Yes.

Relation to Prior Work: It has clearly written the difference among the proposed method and the related works.

Reproducibility: No

Additional Feedback: 1. More details will be better. For example, where do you exactly conduct the variational encoding and low rank constraint in the ResNet18? 2.More evidence to show the low rank constraint can also better position this paper. After reading other reviews and the authors' feedback, I'd like to keep my rate. The authors have addressed the issues I proposed for rank. While I'm not sure if this work is limited in clinic or not. (R1's concern: unrealistic clinic scenario).