__ Summary and Contributions__: The paper propose a Bayesian approach for building a model-agnostic explainer for graph neural networks.
The essential contribution of the paper is to explore an alternative method to the additive feature attribution explainer family.
This alternative is a probabilistic graphical model explainer which doesn't rely on the assumption of linear independence of explained features.

__ Strengths__: The claims are well supported by a strong mathematical background.
The variables selection step of the pipeline addressed cleverly the task of reducing the number of variables and choosing the important ones by a using a very interesting approaches based on the Markov-blanket properties.
The results are interesting - showing some advancements.

__ Weaknesses__: I'm questioning the choice of graph perturbation, why this choice? What are alternatives ?

__ Correctness__: The claims and method are correct and theoretically well grounded.
No obvious errors are observable in the methodology.

__ Clarity__: The contributions of this works and the issues of previous works are clearly identified.
The text is clear and logically organized in order to provide the appropriated explanations of almost all potential questions.

__ Relation to Prior Work__: Related work is relevant and appropriate.

__ Reproducibility__: Yes

__ Additional Feedback__: - At equation (2) M should be defined.
- The concept of Barabasi-Albert graph should be explained
- The notion of structure notion should be defined when used the first time.
- A minimal explanation of Bayesian Networks should be included in the preliminaries with minimal and specific references
- Could you give a formal definition of the Markov-Blanket?
- What is definition of perfect map ?
- Could be good idea to give the meaning of symbol used (in order make the reading easier) those used for conditional independence , definition, likelihood.
- Define the BIC score (Bayesian information criterium) and give an overall signification to grasp is meaning. (However, I have appreciated to read what were the reasons for choosing)
- Why saying BIC score should reflect ... ?It's possible to demonstrate it asymptotically ?
- SHAP, define it. (shapley additive explanation)
- Why GNN must integrate network features in a non-linear manner ?
- It's not clear what is the blue values in the table 1 (GNNExplainer)
- What is superpixel graph ? Minimal explanation in order understand the figure 5a
** Acknowledge of authors feedback

__ Summary and Contributions__: This work aims at generating explanations for graph neural networks based on Bayesian network and proposes a Probabilistic Graphical Model (PGM) model-agnostic explainer for GNNs, which provides statistical information on the contributions of graph’s components in terms of conditional probabilities. It proposes a new framework according to PGM so as to illustrate the dependencies of underlying graph components with regard to the target variable. In order to provide an efficient and tractable algorithm, this work introduces several simplified tricks and provides theoretical analysis on the statistical information that can be encoded. Furthermore, experiments on both synthetic and real-world datasets demonstrate that PGM-Explainer can provide more accurate and intuitive explanations for GNN predictions.
As for contributions, this work adopts Bayesian networks as GNN interpretable models, aiming at explaining GNN decisions in a new interpretable manner. This manuscript also demonstrates its explanation power empirically and reveal its potential applications on various domains.

__ Strengths__: Different from existing gradient based or mutual information based explainers, this work provides PGM-Explainer, a dependency based explanation model and verify its power.
(Section 4) Experimental results show that PGM-Explainer consistently outperforms other methods included in comparison on synthetic and real-world data. PGM-Explainer also can capture more explainable relationships/dependences.

__ Weaknesses__: (Section 4) The complexity of model and its training were not analyzed. Comparison of the running time among 3 different explainers can help. Without theoretical and empirical analysis, it is unclear how applicable PGM-Explainer is to wider range of applications.
(Figure 5 around Line 311) It is expected to provide more explanation about the results as well as more comprehensive results (e.g., from different number of selected nodes), which can be added in the appendix.

__ Correctness__: Yes.

__ Clarity__: The arrangement of the paper is in good order. It first introduces the necessity of explanation models, the overall framework for explainers and then proposes PGM-Explainers in a consistent way.
Several illustrative examples give intuitive introduction, which is easy for understanding. See Figure 1 for the explanation of a GNN’s prediction and Figure 5 for graph classification explanations generated by different explainers.
There are numerous complex notations and thus hinders readers from straightforward understanding. It takes long time to remember the meaning of notations and sometimes leads to confusion and ambiguity.

__ Relation to Prior Work__: This paper provides detailed description of different explainers, providing a clear view on the differences among gradient based, mutual information based and dependency based explanation models. It conducts comparisons on a wide range of datasets including synthetic data and real-world data, ranging from node classification tasks to graph classification tasks. However, this work doesn’t make comparison with popular attention based GNNs.

__ Reproducibility__: Yes

__ Additional Feedback__: It is better to present some intuitive and simple description before detailedmathematical description.

__ Summary and Contributions__: This paper proposes a novel interpretation method, PGM-explainer, to provide an explanation of a Graph Neural Network (GNN) prediction in the form of a probabilistic graph model (PGM). With PGM-explainer, they provide not only highly contributing nodes in an input graph but also the dependencies among the nodes using conditional probabilities.
PGM-explainer comprises three stages; data generation for perturbing input graph, variable selection for improving computational efficiency and compactness, and structure learning for building PGM (Bayesian network). They gave the theoretical proof that generated PGM always includes the Markov-blanket of the target prediction with a Bayesian network. They experimented on synthetic and real-world datasets and showed superior accuracy (or precision) and end-user faithfulness compared with the existing methods.
** post-rebuttal update ***
I have read the author rebuttal. I think that the authors tried their best to address the concerns raised by the reviewers and appreciate their efforts.

__ Strengths__: PGM-explainer is a novel interpretation method which leverages mathematically grounded Bayesian network. Computation could have been very burdensome, but they effectively reduced the computational complexity by introducing various theorem which is thoroughly proven. Since the proposed method does not rely on the linear independence assumption, it provides deeper explanations including dependency among features. As a result, the proposed method can cover more flexible distributions of graph structures and variables compared to existing methods. The conditional probabilities in Figure 1 seem very helpful for deeply understanding the GNNs’ prediction.

__ Weaknesses__: The reported accuracies and precisions only prove that the highly contributed nodes appear in the explanation. There is no verification of the induced dependencies. Some of their arguments are not grounded concretely. If they cannot provide the justification, it would be better to smooth the expressions (e.g. the argument that the mutual information of GNNExplainer is not appropriate for many nodes is not concrete (line 302-303).) In addition, the description of experimental settings seems not enough to understand an implementation. It should provide more details, for example, an initial embeddings and model depth (although same as the setting from previous works). Moreover, the paper should report more ablation study results with various experimental settings to verify the underlying intuitions and theoretical analyses.

__ Correctness__: I did not find any questionable points in the manuscript.

__ Clarity__: This paper is written easily and clearly considering the complex theorems and methodologies.

__ Relation to Prior Work__: Highly related to the prior work.

__ Reproducibility__: Yes

__ Additional Feedback__: In order to address the concerns in the weaknesses,
Please verify the validity of the dependencies in an explanation.
Please provide the justification line 302-303, which argues that the mutual information used in GNNExplainer is not suitable for the large number of nodes.
Please give the explanation about why the Laplacian-based GCN ([5] Kipf and Welling, 2017) was picked to train MNIST SuperPixel-Graph dataset (among various types of existing GNNs).

__ Summary and Contributions__: This paper proposes a model-agnostic explanation algorithm, called PGM-Explainer, for graph neural networks by identifying important graph components for a given target. PGM-Explainer can illustrate the dependency among the explained features by searching an optimal structure for the filtered important neighbors. Evaluations on the synthetic and real-world datasets show the effectiveness of PGM-Explainer.

__ Strengths__: 1. The writing is good and easy to read. Examples are given before the authors illustrate the details of the proposed algorithm.
2. The tackled problem is of significance. GNN explanation is an important and promising topic with the burst of GNN models these days. Existing explainable GNN algorithms are all interpretable models that can make predictions as well as provide the evidence for the final predictions. However, PGM-Explainer is a model-agnostic algorithm to explain the predictions from black-box GNN predictors. Thus, the black-box GNN predictors can be a simple and efficient model with high prediction accuracy.
3. The proposed algorithm is reasonable. The data generation step is to generate a set of input neighbors and output prediction pairs for explanation. The variable selection is to select important variables for the final explanation. The last step is to learn the relationship between the selected important variables.

__ Weaknesses__: 1. What is the black-box GNN model used in the experiments? What is the prediction accuracy of the GNN model?
2. The paper lacks the time complexity analysis for the proposed algorithm, especially for the first step of the entire algorithm. How many trials are used to generate perturbed feature series?

__ Correctness__: Yes

__ Clarity__: Yes

__ Relation to Prior Work__: Yes

__ Reproducibility__: No

__ Additional Feedback__: The author's rebuttal has addressed by concern.