__ Summary and Contributions__: The paper proposes a novel graph neural network approach, which employs features extracted from random walk kernels to obtain graph representations in an end-to-end trainable architecture. The basic idea is to compare the graph with some “hidden graphs”, whose structure depends on the task of interest and is learned during training. The similarity is constructed with a differentiable function computing random walks of length p between the graph of interest and a number of hidden trainable graphs. The learned kernel values are used as input for the network model.

__ Strengths__: - To get graph representations that are invariant to node permutation concepts from graph kernels, and in particular random walk kernels, are exploited.
- Efficient way to compute the kernel and matrix power based on sparse multiplication. However, these ideas are generally implicit in the message passing framework.
- An extension of the classical random walk kernel is proposed to account for continuous attributes, by computing their similarities across the nodes

__ Weaknesses__: - In the synthetic experiments the interpretation and similarity of the retrieved motifs is not clearly quantified. Possibly, an evaluation procedure to establish the level of interpretability based on an explicit counting metric would be beneficial.
- I’d recommend comparing RWNN to the random walk kernel and in particular to the k-step random walk kernel [36], given the close resemblance with the proposed method.
- The results are competitive but not outstanding. Possibly, using large graph datasets would show the benefit of this hybrid architecture, as compared to either the kernel and network based methods.

__ Correctness__: As far as I can see, the method and experimental setting are correct.

__ Clarity__: I find the paper clearly written and easy to read.

__ Relation to Prior Work__: The differences and similarities to existing approaches seem to be clearly defined and are summarised in a specific section.

__ Reproducibility__: Yes

__ Additional Feedback__: Thank you for your answer. I updated my score accordingly.

__ Summary and Contributions__: The paper proposes a new kind of graph processing layer based on the application of a differentiable version of the Random Walk kernel computed between the graph and prototype graphs that are learned via backpropagation.

__ Strengths__: -The proposed idea is interesting and different from many proposals in literature
-The paper addresses the interesting issue of merging graph kernels and graph neural networks
-This should not be a strength of a paper in principle, but the experimental evaluation procedure is correct, and this is not so common these days.

__ Weaknesses__: -In my understanding, the learned (weighted) graphs can be in principle fully connected, so not necessarily interpretable.

__ Correctness__: The claims seem correct and the experimental evaluation is correct.

__ Clarity__: yes

__ Relation to Prior Work__: yes

__ Reproducibility__: Yes

__ Additional Feedback__: -The learned graphs are weighted (positive weights). They can thus be fully connected in principle. If I understood correctly, this fact may negatively impact the interpretability of the learned model, on which you stress quire a bit in the writing and also in the broader impact statement.
In the graphs reported in the supplementary material, did you use a threshold value to represent edges?
I suggest to discuss this drawback in the paper, and to possibly analyse cases in which the learned graphs are interpretable and when they are not.
Minor remarks:
-l 47: also graph kernels map the input graphs in vectorial spaces. The main difference is the dimensionality of the space. I suggest to elaborate more on this point.
-l58-67 "better or comparable to the state of the art". On NCI1 it is not true. I think the contribution of the paper is pretty interesting, no need to oversell the results. In this paper, it is the novel approach to graph processing in neural networks that is interesting.
-l87: graph neural networks were introduced before Scarselli in:
A. Sperduti, A. Starita, Supervised neural networks for the classification of structures. IEEE Trans. Neural Networks 8, 714–735 (1997).
and basically at the same time in:
A. Micheli, Neural network for graphs: A contextual constructive approach. IEEE Trans. Neural Networks 20, 498–511 (2009).
----
I confirm my score after the rebuttal and discussion phase.

__ Summary and Contributions__: The authors propose a novel neural network model for graph data based on the random walk kernel.

__ Strengths__: The approach is interesting and novel. The method is sound.

__ Weaknesses__: Comparisons with competitive explainability methods are lacking.

__ Correctness__: The methodology is correct.

__ Clarity__: The paper is well written.

__ Relation to Prior Work__: Comparisons with competitive explainability methods are lacking.

__ Reproducibility__: Yes

__ Additional Feedback__: The authors propose a novel neural network model for graph data based on the random walk kernel. The proposed approach is interesting to me.
My concerns lie in the utility of the proposed method. The proposed method does not necessarily outperform the existing methods in many tasks. Although its interpretability is preferable, other graph kernels also have interpretability, and there are interpretation methods for graph neural networks such as GNNExplainer [Ying et al. 2019]. To show the effectiveness of the interpretability of the proposed method, it is necessary to compare the proposed method with such interpretation methods, discuss the differences, and analyze the hidden graphs in the real-world datasets extensively. Overall, more adequate experiments and discussions for confirming the utility of the proposed method are needed.
Reference
Rex Ying, Dylan Bourgeois, Jiaxuan You, Marinka Zitnik, Jure Leskovec. GNNExplainer: Generating Explanations for Graph Neural Networks. NeurIPS 2019.
Federico Baldassarre, Hossein Azizpour. Explainability Techniques for Graph Convolutional Networks. ICML 2019 workshop.
Hao Yuan, Jiliang Tang, Xia Hu, Shuiwang Ji. XGNN: Towards Model-Level Explanations of Graph Neural Networks. KDD 2020.
=======
After rebuttal: Thank you for the response. That convinced me about the novelty of the proposed approach. Hence I raised my score. I'd still like to see the interpretability of the proposed method in real-world datasets if possible.

__ Summary and Contributions__: Authors provide an alternative to Graph Neural Networks (GNNs) or Message Passing Neural Nets (MPNN) for graph-level classification task. Specifically, they use a learnable graph kernel. The kernel uses "hidden graphs" (updated through training): their adjacency matrices is continuous, restricted to be positive and symmetric. This gives a competitive and interpretable model for graph classification.

__ Strengths__: * Show a way of learning kernels on graphs combined with a neural network. The learned kernels are interpretable, as opposed to multi-layer Graph Neural Networks (GNNs)
* The first presentation looks O((n1 n2) ^ 2) to compute a forward pass, but they describe an O(n1 n2) algorithm for doing the computation; with n1 and n2 being the dimensions of the kernel.
* They mention interpretability, but they could bring some chemistry (as their paper uses such datasets) aspect why interpretability could be useful for chemists, or other fields they want to highlight.

__ Weaknesses__: * The claim that MPNNs will treat nodes as "sets", ignoring edge information, is too strong: it is easy to show cases where the author's statement is not true. Please tone-down this argument because the reader will get shocked. If you want to keep it, provide theorems and proofs detailing the classes of graphs and MPNNs that are, as you say, ignorant about the edges. The claim can be corrected by limiting to the graph-pooling (as they called it, "readout") that go from node to graph representations.
* Why not have experiments with different sizes of hidden graphs, as shown by the figure?

__ Correctness__: Paper seems correct to me: Math and text [except the too-strong of a tone argument, in weaknesses]. I didnt spot many spelling / grammer mistakes, though the last sentence of the conclusion ...

__ Clarity__: The paper is well written and easy to follow. They take the time to explain concepts from scratch making the paper stand on its own for the conference audience.

__ Relation to Prior Work__: Related work is described well. Would it make sense to compare against https://arxiv.org/abs/1904.09671 or at least mention it in the write-up if it is easy to highlight any similarities/differences?

__ Reproducibility__: Yes

__ Additional Feedback__: