NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:5526
Title:Learning Robust Global Representations by Penalizing Local Predictive Power

Reviewer 1


		
The main argument, that predictions should not be too focused on local patches was a little counter-intuitive. There are many counter examples for the points they made, e.g. a local patch around a person's face may be enough to classify the image as a person. This was a little distracting while reading their paper, but the examples presented were satisfactory to make their points, and the extensive results help alleviate most of my concerns about this. Otherwise, I was happy with the paper, felt it provided an interesting approach for domain adaptation.

Reviewer 2


		
This paper proposes a new method for learning a robust network by penalizing the local predictive power and forcing it to focus on global features. To penalize the local predictive power, as a regularization, a new network is build upon the features of early layers and adversarial training is adopted to make these early layer features not able to correctly predict. However, there are still some concerns about the method and the experiment. 1. The compared baseline method for domain generalization setting is not state-of-the-art. Some recent methods such as JiGen [1], Feature-Critic [2] and Epi-FCR [3] should be compared. 2. As we can see from the experimental results, the performance of the proposed method and its variants perform differently on different datasets. For MNIST, PAR_M seems to be the best, for CIFAR10, PAR and PAR_M perform better than PAR_B and PAR_H, but for PACS, PAR_H performs best. So how should we select which one to use in realistic applications? 3. In page4, it is described that the method is tested on CIFAR10 under domain adaptation setting, but in section 4.2 it seems for domain generalization. Besides, it is not clear which Resnet is used for the experiments on Cifar10 dataset. 4. The claim that “the proposed network is forced to discard predictive signals such as color and texture” is not well validated. What is the prediction accuracy of the patch-wise classifier? [1] Carlucci F M, D'Innocente A, Bucci S, et al. Domain Generalization by Solving Jigsaw Puzzles[J]. CVPR, 2019. [2] Li Y, Yang Y, Zhou W, et al. Feature-Critic Networks for Heterogeneous Domain Generalization[J]. arXiv preprint arXiv:1901.11448, 2019. [3] Li D, Zhang J, Yang Y, et al. Episodic Training for Domain Generalization[J]. arXiv preprint arXiv:1902.00113, 2019. --------------------- After reading the rebuttal, most of the my concerns are resolved except for missing the recent state-of-the-art baselines. I am leaning towards accepting the paper.

Reviewer 3


		
we posit that classifiers that are required to (in some sense) discard this local signal (i.e., patches of an images correlated to the label within a data collection), basing predictions instead on global concepts (i.e., concepts that can only be derived by combining information intelligently across regions), may better mimic the robustness that humans demonstrate in visual recognition. I believe the work, together with the work in Wang et al. (2019) in reference, provide a very insight in domain adaptation. The writing is clear, the extensive experimental results make it a solid work. The new dataset provided in the manuscript further benefit the cross-dataset recognition via UDA in computer vision area.