Review for NeurIPS paper: Handling Missing Data with Graph Representation Learning

NeurIPS 2020

Handling Missing Data with Graph Representation Learning

Meta Review

Dear authors, The reviewers discussed your document and carefully considered your rebuttal. All agree that the main contribution is the framework for dealing with missing values using bipartite graphs. This is an interesting idea, both for imputing missing values and for making predictions with missing values. They also appreciated that you added experimental comparisons to two reference methods (missMDA and MIWAE) and included the results in your response, as well as experiments on two additional high-dimensional data sets. Nevertheless, although they emphasized that GNNs are used here as a toolbox and not as the focus of the study, you need to be specific about important aspects of their application (such as discussions of architectural novelty and scalability), as noted by two reviewers. Yes, the reviewers are pleased to support acceptance of the paper. But they strongly encourage the authors to take seriously the issues related to their discussion of GNNs and to be thorough in how this technique is addressed in their camera-ready version. Therefore, as I also agree with them, I have decided to accept your paper, but you have to be very specific about scalability issues (and not just say that GNN in general has been applied to large graphics), there is no problem if this is also a problem for your case. In addition, in the introduction you need to position clearly your work with respects to the two works Graph Convolutional Matrix Completion and Inductive Matrix Completion Based on Graph Neural Networks as they also consider bibartite graph in the framework of missing values. In addition a few comments that you can try to integrate for the final version. Random forests dealing with missing values are available in the grf R package using the strategies for splitting missing incorporated in attributes MIA which is recommended when supervised learning with missing values (see the reference that may be useful https://arxiv.org/abs/1902.06931), also available in the partykit of the R packages. You can add the comparison when you first impute with mice or other methods and then when you predict with a non-parametric method such as a forest and not with a linear method. Finally, you should discuss the mechanism that generates the missing values (perhaps as a conclusion) as you only consider the MCAR scenario in the simulations and not MAR and MNAR.