This paper proposes using a causal inference framework for weakly supervised semantic segmentation. It corrects mistakes in pseudomasks by adjusting for confounding effects. By relying on causal inference, as opposed to a discriminative model, the goal is to avoid relying on spurious correlations in the training data that might fail to generalize. The reviewers agree that using backdoor adjustments for semantic segmentation is a novel use of the technique, and that the experimental results are impressive. One suggestion for improvement in the camera ready is to more clearly state the modeling assumptions of the causal framework that is used, and to elaborate on what their implications are for this problem.