Paper ID: | 7338 |
---|---|

Title: | Diffusion Improves Graph Learning |

The authors consider graph problems where the homophily bias helps the predictive power. They show how pre-processing graphs using graph diffusion convolution can generally improve the predictive performance of downstream methods. The paper reads well, the core thesis is well described, its empirical analysis is thorough. The approach is tested on several methods and shows a consistent capacity of improving each one of them. The authors analyze empirically the changes in the eigenvalues after the proposed graph transformation and offer as insight the conclusion that the procedure affects both the highest and the lowest eigenvalues and can therefore mitigate a) the influence of long range interactions and b) of noise on local interactions. The clarity of the insight contribute to increase the work's significance: particularly insightful is Section 6 where key questions are analyzed.

This paper introduces graph diffusion convolution (GDC), a pre-processing pipeline for graphs useful in node classification experiments on homophilic network datasets (e.g., citation networks). The preprocessing pipeline consists of two steps: 1) replacing the original adjacency matrix with a diffusion matrix (obtained as a polynomial function of the original adjacency matrix), and 2) sparsification by, e.g., thresholding the values on the edges. When using the newly obtained adjacency matrix in typical graph learning frameworks, results on node classification improve in many cases. The paper is extremely well written and scores very high in terms of clarity and quality of exposition. Related work is covered in great detail. In addition to the two cited early works on models related to GNNs (Sperduti & Starita and Baskin et al.) it would be good to refer to the original two GNN papers by Gori et al. “A new model for learning in graph domains” (IJCNN 2005) and Scarselli et al. “The graph neural network model” (IEEE TNN 2009) in the related work section — which are both, surprisingly, not cited in this work. The ideas presented here are somewhat novel, but recent work has covered the idea of using diffusion-based approaches for some of these particular homophilic datasets used in this paper in great detail and show similar results (Klicpera et al., ICLR 2019 [cited as 29], Wu et al., ICML 2019 [cited as 65]), hence novelty and relevance is limited. The theoretical analysis outlined in the beginning of this paper in terms of behavior of eigenvalues under diffusion and sparsification is interesting, relevant and of high quality. The experimental protocol used in this paper is outstanding and node classification performance on these (limited) datasets is analyzed at great detail and with reasonable baselines. The experiments, however, are very narrow in scope (see below). My main concern is the limited diversity of datasets and tasks considered in this project. There is a wide range of graph-structured datasets coming from different domains to which GCNs/GNNs are regularly applied, which go beyond the usual homophily/heterophily classification. Consider, for example, road networks, knowledge graphs, protein-interaction networks or graphs used in software analysis/verification. GDC is portrayed as a one-fits-all approach in this paper (with a disclaimer by the authors that they only consider networks with clear homophily). It would be necessary to experimentally explore some of its limitations, by moving beyond simple and comparatively small co-author/co-purchase/citation networks, and explore other tasks such as link prediction or graph classification, in order to make this paper a candidate for an accept. At this point, this paper is borderline. Comments/questions; Q1: It is interesting that the re-implementation of DGI (Veličković et al., ICLR 2019) performs quite poorly in all of the classification experiments (Appendix Tables 2-7), whereas Veličković et al. made strong claims about outperforming even (semi-)supervised approaches (such as GCN) on some of the datasets also considered here. How can this difference be explained? Q2: What happens with higher label rates? All the datasets considered in this paper assume label rates of around 5 percent or lower. What happens in the case of, e.g., 80%? Will the low-pass filtering property of GDC hurt classification performance (compared to using a vanilla MPNN)? Q3: Line 30 (page 1) states that “[spectral] methods are routinely outperformed by MPNNs in graph-related tasks” and cites Graph Attention Networks to substantiate this statement. This statement is however never made in that particular paper and there appears to be nothing in this particular paper that supports this statement. Q4: The colors/shades of grey are not distinguishable for the two methods in Figure 7 when viewed/printed in greyscale. All other figures are OK in greyscale. Q5: How is link prediction performance affected by this preprocessing pipeline? ---- Update after rebuttal: My concerns and questions have been adequately addressed and I am willing to increase my score to an "accept". I think that this is a solid paper and I am looking forward to seeing the discussed experimental analysis on PPI, D&D and with different label rates in the final version of the paper. I heavily encourage the authors to be clear about model limitations and scope of applicability of this approach while preparing the updated manuscript -- this is not meant to devalue the contribution of this paper, but to ensure a healthy progress of the field.

I would like to thank the authors for their response to the reviewer comments. It seems that the general opinion about this paper is positive, so I am looking forward to its camera-ready version. ---- PREVIOUS COMMENTS ---- The paper proposes a method for graph enhancement based on graph diffusions. First, a diffusion operator is applied on the original graph, which transforms the edge densities. This is followed by an edge sparsification step, where edges with low edge weights are removed. Some links have been pointed out between graph diffusion and filtering and the effect of graph diffusion on the graph eigenvalues has been studied. The proposed graph transformation approach is then shown to provide improvements in performance compared to the original graphs when coupled with various semi-supervised and unsupervised graph-based learning algorithms. The methodological contribution seems to be humble, however, the paper is well-written and clear and the experimental evaluation is extensive. Only a minor comment: In the analysis of the effect of sparsification on the eigenvalues, the authors comment that they cannot analyze the change in the eigenvalues analytically. I guess graph sparsification could be formulated as a perturbation on the matrix S. If edges with weights smaller than epsilon are removed, one can bound the magnitude of the perturbation. Then using standard results from matrix perturbation theory (e.g. Weyl’s inequality) it should be possible to bound the change in the eigenvalues.