NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:8237
Title:Reverse KL-Divergence Training of Prior Networks: Improved Uncertainty and Adversarial Robustness

Reviewer 1

This paper addresses an interesting problem and proposes a solution that is shown to have empirical advantages. However, the text of the paper has been poorly executed. The notation has not been clearly defined or explained. In line 75, $x*$, $\hat{\theta}$ are not defined. These variables have been consistently used in the rest of the paper. I had to read [18] to understand this notation. In lines 74-80 the definition of a prior network is unclear. In lines 196-199 the intuitive explanation for why prior networks are more robust to adversarial attacks is also unclear. This diminishes the quality of this paper as a standalone piece of work. The main contribution of this work is the improved training criterion. In previous work, prior networks were trained under the forward KL divergence while this paper proposes to use the reverse KL divergence instead. This implies empirical benefits in training. It is also shown empirically that these networks have better out of distribution detection performance and in some cases are shown to be more robust to adversarial attacks. However, in complex datasets like CIFAR-100 the improvement shown is only modest, so it would be nice to see the performance of these networks on more datasets (like ImageNet). ------------------------------------------------------------------------------------------------------------------------------------------------ In light of the author response I tend to keep my overall score (6).

Reviewer 2

As described above, the paper makes an important observation regarding the implications of training with a forwards vs. reverse KL loss in prior networks. The analysis is compelling, and nicely explains prior observations of the difficulty for training prior networks on datasets with many classes. It’s nice to see that with the reverse KL loss, this is now possible. My main concerns are with the empirical evaluation, and whether the paper provides convincing evidence that prior networks trained with reverse KL are SOTA for the two discussed tasks. For OOD detection: All the experiments currently are run on re-implementations. It’s a standard enough task - why not compare to previously reported numbers? Relatively minor: The network performances are noticeably sub-par for 2019. Even setting aside tricks, and data augmentation, 8.0% for a DNN baseline on CIFAR-10 is twice of 4.0% for a wide resnet, no tricks. For adversarial example detection: Again, it would be nice to compare to previously reported numbers, rather than comparing only to re-implementations. Relatedly, an L_inf bound of 30 pixels is a strange choice. eps=8/255 is really standard on CIFAR-10, and seems to be the natural choice to allow comparison with prior work. Relatively minor: It would be nice in addition to reporting AUROC, to report joint success rate for a fixed recall (say, 80%, 95%, and 99%), to allow comparison to adversarial accuracy rates without detection. Against strong whitebox attacks, the joint success rate is extremely high (appears >90%). Additionally, MC-Dropout appears to outperform the proposed prior networks approach. In general, demonstrating that finding undetected adversarial examples “takes a greater amount of computational effort” is not considered a significant result, as it is also possible purely through masking gradients without improving the robustness of the model. It’s nice to see that the paper implements an adaptive attack for the model under consideration, and reports metrics against the strongest attack they find. Minor comments: For Figure 2, it would be nice to include the data distribution for convenience, so that readers don’t need to refer to ref. 16. Why not use a stronger attack than FGSM for adversarial training? In the second line of Eq. 10, I believe there’s a missing normalization constant. This doesn’t affect the correctness of the argument. I couldn't find the code referenced in the paper or supplement, though the reproducibility checklist said it was included. My apologies if I missed it. I think it's great to share this though - especially for the adversarial robustness community, it's really helpful in ensuring proposed defenses are truly robust. ________________ Update Thanks to the authors for providing additional insights. The original paper and responses still do not provide convincing empirical evidence that prior networks with the reverse KL loss are improvements for existing approaches for adversarial robustness. The OOD evaluation also still has weaknesses. I'm glad that you're making additional preparations to run other experiments, but it's somewhat disappointing that it wasn't possible to re-run experiments under standard settings (e.g. on CIFAR-10, no work uses eps=30, and eps=8 is standard across the field), which makes it very difficult to compare this to prior work. Similarly, attack success at fixed recall would be easier to compare to e.g. Madry adversarial training / TRADES / other adversarial robustness work, though this is less significant. Increasing required number of attack iterations is typically not considered a significant result within the adversarial examples community, even without additional training cost. See e.g. Athalye et al, ICML 2018, or Carlini et al 2019 on adversarial evaluation. Joint success rate: My mistake, thank you for the clarification. I was looking at the "success rate" column where MC-Dropout outperforms prior networks (for large # of iterations, the regime of interest). RKL-PN indeed outperforms significantly under JSR. This point does make me more positive, though not sufficiently to raise my score. For OOD detection: It's fine not to compare to bespoke post-processing techniques which use domain-specific knowledge, but that doesn't seem like a good justification to not compare to previously reported numbers at all. (I don't understand why the rebuttal includes classification accuracy numbers, rather than OOD detection numbers. The question of interest is whether the technique can outperform existing OOD detection, when using standard architectures.) Again, I want to emphasize that I think this is good work, and could be impactful within the uncertainty/OOD detection and adversarial robustness communities. I don't think the current evaluation in the paper is sufficient though, and the rebuttal does not provide any experiments to address this, so I believe the paper would be better served with a stronger empirical evaluation.

Reviewer 3

The authors present a novel algorithm with theoretical analysis and empirical results. We have a few comments and suggestions for the work: The comparison of forward vs reverse KL divergence as the objective criteria resembles the choice of mode vs. mean seeking form of the objective in variational inference, respectively (in this case applied with a Dirichlet distribution). We recommend that the authors refer and make connections to this similar literature. It would be great if the authors could expand upon the distinction of in-domain and out-of-domain training data in lines 105-106. How are these datasets created and is the purpose of separating the data to improve generalization. How is the optimization performed in practice? In the algorithm, the authors propose to set the in-domain \beta parameters to large value of 1e2 and the out-of-domain parameters to small values of 0. How sensitive are the results to these specific choices? The authors also note that the losses were equally weighted using the forward KL divergence and had a large relative weighting \gamma when using the reverse loss. What criteria was used to optimally choose the \gamma parameter? Lastly, we had a few minor suggestions for the text: using the conventional indicator variable instead of \mathcal{I} may be more clear, and defining all notation (i.e., \pi) in the main text may improve readability.