__ Summary and Contributions__: 1. the paper proposes a neural network on factor graphs for MAP inference;
2. the paper proves that the max-product algorithm is a special case of it (though there can be exponential number of rank-1 tensors);
3. evaluation on synthetic data, LDPC decoding, and human motion prediction.

__ Strengths__: I think learnable inference algorithm for PGMs is definitely interesting and novel topic to research. This paper is novel by bridging graph neural network and max-product algorithms. Impressive empirical results are presented. The results in this paper can be further explored by the probabilistic graphical model community, to improve current inference algorithms.

__ Weaknesses__: 1. I am not entirely sure how useful is the theoretical result in Sec. 3.2. First of all, the decomposition may result in exponential rank-1 tensors, does it mean the size of the network should be exponential? Secondly, I think the point of this work is a more powerful MAP inference algorithm than the max-product algorithm (as the empirical result shows). However, the theory does not explain why the proposed algorithm can be more powerful than max-product.
2. the synthetic data experiment might be unfair, since the neural network is trained with exact MAP solutions, and it can possibly over-fit the solution. Furthermore, only belief propagation algorithms are compared. How does variational inference work for these tasks?

__ Correctness__: I have not check the theory thoroughly. According to my understanding, the claims and method seem reasonable. The empirical methodology generally sounds, but may have some flaws, as pointed out above.

__ Clarity__: The writing needs to be improved.
1. there are many typos (missing spaces and citations);
2. Q(t | Phi), M([g, f] | Theta), Phi, and Theta in Algorithm 1 are undefined;
3. There are many lemmas and propositions in Sec. 3.2 as the sketch of the proof. However, I cannot find the main theorem itself in the main text.

__ Relation to Prior Work__: It clearly discussed the difference from MPNN. I am not aware if there are any other works on learnable MAP inference algorithms though.

__ Reproducibility__: Yes

__ Additional Feedback__: Post rebuttal:
I think my concerns about the theory and experiments are addressed in the rebuttal. Therefore I increase my score from 5 to 6.

__ Summary and Contributions__: The paper generalizes graph neural networks to factor graphs

__ Strengths__: Relevant work to NeurIPS community, established connection with belief propagation

__ Weaknesses__: Detailed motivation on the choice of the architecture, aggregation function, and some empirical details are lacking (see below)

__ Correctness__: Yes

__ Clarity__: Yes

__ Relation to Prior Work__: Yes

__ Reproducibility__: Yes

__ Additional Feedback__: - Max-product is known not to decode well LDPC codes, that’s probably why the authors compared to schemes that use sum-product. But then why is max used in the architecture? Does the scheme work well for the sum rule instead of max?
- The training time is not reported (50 epochs – how much time is that?), only inference time. Doesn’t this automatically represent a computational disadvantage of graph neural networks compared to message-passing methods?
- The number of potential input sequences to LDPC code is huge, how representative is the test with 1000 instances?
----------------
Post-rebuttal:
I am satisfied with the additional clarifications and evidence presented by the authors during the rebuttal. It would be beneficial to include these additional explanations in the final version of the paper. I think that overall this paper is a good study, and I am keeping my original score 7 (accept).

__ Summary and Contributions__: This paper proposes a message passing neural network that is theoretically guaranteed to express certain higher-order dependencies among the random variables. To be specific, the proposed parameterization generalizes the max-product belief propagation (BP) run on some graphical model with higher-order interactions. In the experiment, the proposed approach outperforms over several baselines.

__ Strengths__: This paper proposes a new member of the message passing neural network (MPNN) with analyzable expressive power. The authors establish some theoretical analysis on the power of the proposed architecture (by showing it to generalize the max-product BP). Experiments were done across several tasks.

__ Weaknesses__: The proposed architecture is not particularly novel and experiments can be improved. While the theoretical analysis is quite interesting, it is not significant enough to bypass the aforementioned issues (e.g., the analysis mainly relies on the Lemma 1 proposed by Kohli et al.).
While the proposed factor graph neural network (FGNN) is guaranteed to express a family of higher-order interactions, in the end, FGNN is a member of MPNN applied to heterogeneous graph with two types of vertices (random variable and factor).
I also think the considered experiments are limited since they only consider the case where (1) training and evaluation are done on the same graph and (2) factors are easily expressed as a representation of fixed dimension. In other words, the considered experiments are not very convincing for showing that the proposed FGNN works across general graphical models. Especially, I think it is critical to consider training FGNN on factor graphs where (2) is not satisfied, e.g., represented by (strictly positive) higher-order tensors with elements sampled from a uniform distribution. This would be useful for validating that FGNN is indeed able to model the higher-order interactions in general graphical models with higher-order interactions, as analyzed in Section 4.

__ Correctness__: The proposed method and analysis are correct to my knowledge.

__ Clarity__: I think the architecture of FGNN should be explained in more detail, e.g., explaining more about the right side of Figure 3 in the main text.

__ Relation to Prior Work__: The authors should discuss and reference the existing work using graph neural network to perform inference in a graphical model (defined on pair-wise interactions), i.e., Yoon et al. 2018.
Yoon et al. 2018, Inference in Probabilistic Graphical Models by Graph Neural Networks

__ Reproducibility__: Yes

__ Additional Feedback__: I think the paper should more clearly describe how the FGNN works on heterogeneous graphs with fixed-size representation for both the random variable and the factors in Section 3.1. Explicitly stating (in the main paper) on how the input representation is constructed from the higher-order factor graph would also help.
I also suggest the authors to either publish the code used for the experiment, or provide a more detailed description of the experimental settings and the baselines considered.
============
I appreciate the authors' detailed and thoughtful rebuttal. However, I still think the experiment on higher-order factor graphs is necessary (since this is related to main theoretical result of the paper). I do not think the human motion prediction task is a good example in this case because it is more commonly expressed as a set of variables with pairwise interactions. Hence, I am keeping my current score.