This paper proposes a new classification algorithm by turning both the input data and target label into probability measures in the 2-Wasserstein space. A network is trained to push the raw input forward to the predicted measure, while another network enforces diffusion invariance. Theoretical analysis was provided for the relationship/equivalence between the diffusion operator and the 2- Wasserstein distance. Experimental results on 2D image and point cloud classification suggest the effectiveness of the method under severe random perturbation. All reviewers find the approach interesting. One reviewer, and myself to some extent, is not clear about the motivation behind Wasserstein uncertainty and the data with multiple locations. The rebuttal soothed the concern a little, but it will be helpful to better clarify the motivation. The experiment is also thin, and the reviewers (R2 and R4) still have concerns after the rebuttal. I understand the paper’s major contribution is theoretical, but it is important to show that the problem address is real by testing on real perturbations. Overall, I see the potential of the method and I encourage the reviewers to enhance the experiments in the final version where an additional page will be allowed.