Paper ID: | 3235 |
---|---|

Title: | DFNets: Spectral CNNs for Graphs with Feedback-Looped Filters |

The paper presents a spectral convolutional layer using feedback-looped filters and show how it can be used efficiently by means of scaled normalization and cut-off frequency. It is clearly written and easy to follow. Introduction and walk-through on spectral convolutions is good and motivates the work. It would be nice to see how spectral methods compare to non-spectral ones, and if there is some interesting aspect that should be considered. The method section could take more space to elaborate on 3.2 as it is key for the method as in this current form I am not sure it is sufficiently detailed and Alternatively details could be provided in the experimental setup section, as long as they are sufficient to grasp details of the proposed method. Section 3.2 contains some details on initialization that should go in experiments, unless they are needed to explain something about the method. If so then please elaborate and explain better. No ablation study has been performed with respect to the architectural choice of dense net. What is the difference in performance? Hope much would other methods, such as GCN, gain from just this architecture? Section 4.5 should define better what are criteria for success, are dense clusters better than sparse ones for example? Figure 4 and 5 are not sufficiently explained to say anything relevant about the methods in my opinion. Please elaborate on this. In introduction Shuman, cite [30], was first to propose convolution on graphs via spectral decomposition. Overall I like the paper and the approach. Some polishing is still required and clarity should be improved.

In this paper, authors study the graph convolution networks. Especially, the authors propose a new filter to approximate the true Fourier transformation, in the same spirit of Chebyshev filters and others. The proposed filter is based on the ARMA filter of Eq. (6). This equation is computationally heavy, so the proposed feedback-looped filter approximates the equation as in Eq.(7). Then the actual implementation of Graph Neural Network is formulated as in Eq.(12). Experimental performance is measured with Cora/Citeseer/Pubmed scholarship networks and NELL knowledge graph. Accuracies in the semi-supervised node classification surpass the several standard GCN models. Since I am not familiar with signal processing literature, I have difficulty in understanding the technical contribution. There are a few barriers I encountered. (1) Why the rational polynomial coefficients are preferable for convolution filter development? Or, why the RPC is less sensitive in an underlying graph structure, compared to Chebyshev? (2) I read the ref. [16, 17], but cannot understand why the Eq.(7) serves as a (valid?) approximation for Eq.(6). Thus I cannot judge whether this is a technically reasonable, or interesting, approximation of the ARMA (Eq.(6)). In Eq. (12), a regularization term \mu is introduced for the layer construction. So, strictly speaking, the spectral convolution layer is not an exact implementation of the feedback-lopped filter, right? Then theoretical discussions in Sec 3.4 is not directly applicable for spectral convolution layers. Is this correct? Also, I expect an ablation study for the additional \mu terms to verify the effect of this regularizer on several experimental performances. I also have a concern about the experimental design. There is no description of how the authors tune the hyperparameters. In Figure 2, a pair of (p,q) = (5, 3) is the best for the "test" dataset, and the authors used this pair of (p, q) for the main result, table 4. Does this mean the authors tuned hyperparameters to maximize the test score? Or, the best (p,q) for the validation dataset is the same as the (p.q) for the test dataset? Please clarify how the hyperparameters are tuned during the training. + Experimental scores seem good. - It is difficult (at least for me) to fully understand the derivation of the feedback-looped filter. - Theoretical discussion is not for the actual convolutional layer (Eq.12), but its ideal(?) formulation (Eq.7) - The effect of \mu regularizer is not assessed. - Unclear hyperparameter optimization procedure. I really expect the author feedback to fully understand the contribution of the paper. ### After author feedback ### I'm happy to see the detailed author feedback. It definitely helps me understand the value of the submission more correctly. So I raised my review score upward. However, the manuscript is still somewhat difficult for some groups of readers. Further explanations and reference pointers will enlarge the potential audience.

Pros: 1, The proposed rational polynomial spectral filter is novel and interesting. 2, I like the summarization and the comparison of different spectral filters in terms of functional forms, complexity measures, etc. 3, The paper is clearly written. Cons: 1, I appreciate the authors' efforts in pushing forward the graph convolution by designing new spectral filters. However, from the current writing, I feel like there is a gap between the claimed new technique and practical benefits brought by the technique. For example, when you introduce and motivate the Cayley filters and the rational polynomial spectral filters in general, could you be more specific and provide more intuition on why “narrow frequency bands” is an issue practically, how these filters resolve this issue, and why improved localization would lead to better empirical performance. 2, I think an in-depth discussion and experimental comparison with the recently proposed multi-scale spectral graph convolution [23] is necessary since: (1) they use the neural networks as the learnable spectral filters which have better data-adaptive capacity; (2) they can cheaply compute the spectral convolution with very a large localization range. 3, In terms of experiments, I strongly discourage authors to only report performances on the small citation networks using the fix spit. As discussed in several recent works, e.g., [1], the ranking of different models dramatically changes when the split changes. I suggest authors either choose other larger datasets or report the performances over multiple random splits and/or decreasing the proportion of labeled nodes. 4, The experimental comparison is unfair to other methods. It is not surprising that adding dense connections improves the performance for many models. However, most of your baselines do not exploit this trick. I would suggest removing the dense connections and just compare the newly proposed spectral convolutional networks with the other methods. You need this experiment as an ablation study to clarify the core contribution. I did not see any empirical analysis on the claimed improved localization contribution. I suggest adding an ablation study to compare against other spectral filters, which have less controlled localization and same settings otherwise, e.g., Chebyshev filter. 5, What is the definition of learning complexity? [1] Shchur, Oleksandr, et al. "Pitfalls of graph neural network evaluation." arXiv preprint arXiv:1811.05868 (2018) ================================================================ Authors' rebuttal addressed most of my concerns on experimental comparison. One suggestion is that it would be more elegant and efficient to improve the coefficient optimization via amortized inference since it is costly to perform such an optimization per layer. I have changed the score accordingly.