The reviews were mixed. On one hand, the manuscript is well-organized and reviewers appreciated the probabilistic attempt at robustness and the fine-tuning idea. On the other hand, concerns were voiced in the reviews and during discussion. In the end, the meta-reviewer (after independent examination of the manuscript) concluded that the merits outweigh the potential issues. We strongly encourage the authors to revise the draft by taking the following comments into account (more in the reviews): (a) The causal aspect of the manuscript appears to be somewhat decorative than necessary. Indeed, upon independent reading of the manuscript, the meta-reviewer agrees that one can essentially remove all causal notions, after all the do-calculus on the simple model that the authors adopted is nothing different from the usual conditioning. Besides, the causal model is never used for true intervention or performing counterfactual inference. As Reviewer 1 pointed out, the proposed approach is essentially a VAE with factorized latent variables, and the authors should have compared to many existing VAE alternatives. Whenever possible, please choose the simplest formulation of your work and refrain from borrowing terminology to just make things look fancier without substance. If there is a true need of causal reasoning, then this needs to be better articulated and demonstrated. (b) The experiments were well performed in the sense of verifying the authors' approach but fell short in comparing to existing attempts at improving robustness, just to name a few (more can be easily googled): - Parseval networks - L_2-nonexpansive neural networks - Efficient Defenses Against Adversarial Attacks - Deep defense: Training dnns with improved adversarial robustness Comparison against at least some of these existing approaches should be included. A related concern is that the reason that the proposed algorithm appears to be more robust (than a straightforward dnn) may simply be because its clean accuracy is too low. As mentioned in the appendix, the proposed algorithm achieved less than 50% on CIFAR10, which is probably even worse than classic algorithms such as logistic regression or SVM. It is plausible that as accuracy increases, a model (such as a deep net) becomes increasingly fragile against adversarial examples. The authors need to rule this possibility out (the experiments on MNIST is not sufficiently convincing as MNIST is known to be quite easy). Put it another way, if a practitioner adopted the authors' model, then an adversary need not do anything at all but is already able to drive the accuracy to lower than 50% on CIFAR10. (c) During prediction (eq (6)), the algorithm needs to sample m and z for each possible label. This is essentially a Bayesian approach to impute the unobserved variables. How sensitive is the final result wrt this Monte Carlo sampling? It is well-known that the Bayesian approach offers some robustness by averaging over a number of models (see McKay or Bishop's book). Could the authors explain their approach through this well-known principle? There is nothing free though: One would need more training data to train the latent variable model. This might explain why the authors were not able to achieve decent accuracy on CIFAR10. The trade-off between clean accuracy and robustness in a latent variable model needs to be better illustrated.