I agree with the authors that R1's concerns are not relevant to the acceptance decision and have removed their review from consideration. R2 raised the concern that there are insufficient benchmarks to judge the value of the work; the author rebuttal countered that the baselines identified by R2 do not attack the same use case as the proposed algorithm. I concur with this assessment. R2 also pointed out that the method was evaluated on 1d problems only; the authors rebutted that the method was demonstrated on both 1- and 2d problems, and gave an example of a Bayesian inverse problem that motivates this method even in low dimensions. R4 recommended accept because of the novelty of the proposed multipole graph neural operator. Taking the reviews and author rebuttals into consideration (excluding R1), while I see value in this work, the conclusion is that the potential impact of the method is not clear. What is encompassed in the class of problems for which this heuristic will work: will it be limited to Bayesian solutions of 1d inverse problems, as the example offered by the authors in the rebuttal? Unfortunately, NeurIPS is a selective conference, and both novelty and impact must be clear. The former is evident, but because the latter is unclear, I reject. SAC metareview: The paper received 3 reviews, which were not reliable enough, so we asked two additional reviews, edited below. This triggered additional discussions. We decided this was a borderline accept. We trust the authors will take into account the several remarks we were able to gather and **update** their camera-ready. Reveiewer A: I think the experiments are lacking in terms of variety, more concretely the proposed architecture is only compared to others in a 1-D problem. It also feels a bit weird that they did evaluate on a 2-D equation but didn't report comparisons with other methods there (I didn't have access to the appendix). Having said that, I think this paper is relevant and important in the area of deep learning to learn solutions from parameters This work combines essentially 3 ideas: - (1) using GNNs for learning PDEs from data; relevant work w.r.t. this is cited to the best of my knowledge - (2) expressing the action of the integral operator with graph neural network message passing with a particular kernel function, which allows seamless multi-resolution. Some interesting works by Welling's group feel very relevant here: Gauge Equivariant Mesh CNNs Anisotropic convolutions on gemoetric graphs (de Haan, Weiler et al.), Gauge Equivariant Convolutional Networks and the Icosahedral CNN (Cohen et al.). There is non-zero probability that I misunderstood something in either this work or the two Welling works and they are not relevant. However, if they are, they also make me think that the application of their ideas is not as trivial for non-Euclidean meshes. - (3) hierarchical message passing [some work is cited, but two relevant works in physics-based problems aren't: Flexible Neural Representation for Physics Prediction (Mrowca et al.), Learning particle dynamics for manipulating rigid bodies, deformable objects, and fluids (Li et al.). Note that both works deal with particle-based simulations of objects, not PDEs in continuous spaces. - Even with the non-cited work I still feel this work is novel and non-trivial. Moreover, the combination of these three ideas is of particular importance and thus this paper still provides an important stepping stone for the community. The paper is mostly sound and well explained. - As I mentioned in related work, some of Welling's work shows that putting a graph on a mesh should be done with care because traditional GNNs impose that all edges are created equal, but this is too restrictive for most meshes. Instead GNNs on meshes should satisfy a set of equations depending on the local tangent space, as well as satisfying some global symmetries. I suspect (but can't say for certain) that this may cause trouble for the approach presented in non-Euclidean manifolds (say, a sphere) The contribution is important and a good one. It is more relevant to a generic audience than most papers, IMO. Moreover, both the GNN and the Deep Learning for Science communities are growing. Overall thoughts: - I think the combination of ideas and the application is relevant, important and pretty well implemented. My main concern is that the experiments were only done in a 1-D equation and the 2-D equation only reported times and did not compare with alternatives. Moreover, the proposed approach should be able to run on non-Euclidean manifolds and I have doubts (though I'm not certain) that some issues may arise there, as described in technical soundness. If there were convincing comparisons in a 2-D setting I would argue for a soft accept and if there were satisfying experiments in a non-Euclidean manifold with good explanations on how to adapt things there I would argue for a strong accept. Reviewer A update, after being given access to the appendix: Thanks for sending the appendix. Looking at it carefully, I agree they tested 2D Darcy flow against baselines, they probably didn't show them in the main text for the reason I suspected: because multiple baselines beat their model. On a related note, thinking about these values more, the premise on increasing mesh size for the sake of doing so is erroneous IMO: if a lower mesh size gives a lower error, then just go with it and you'll also have a cheaper cost. This is relevant for the GCN baseline which they say works well on an s=16, which they don't show as well as the FCN baseline, which gets great results (better than the proposed method) at low resolution. MGKN should also aim at getting better performance, not just equally good, at higher resolutions. Unless I misunderstood something, for each baseline and each mesh size s they should have reported the min for all s'<=s (technically, the test after cross-validation to find best s'). This understanding would also experimentally contradict the need for denser meshes, which is the main premise of the paper. I do think denser meshes are valuable in principle, which is why I like the idea behind this paper, I'm just doubtful the experiments are showing this need. Reviewer B: The method of the paper is definitely novel and, I think, very much deserves to be published. The authors are correct in their reply to reviewer 2, when they say that the multigrid method and the FMM are essentially unrelated (except very superficially by the presence of grids or varying coarseness).[...] For the same reason, the authors' method and MgNet are essentially unrelated. The authors' use of the term "V-cycle" might be partially to blame for this confusion, since that term is used frequently in the multigrid community but is rarely used in the FMM community (though its meaning in that context is clear) [...] Weaknesses: 1. I agree with reviewer 2 that, for hyperbolic problems, the theory behind the method is much less clear. On the other hand, the method's applicability to elliptic PDEs whose solutions depend in a nonlinear way on a parameter (equation (14), for example) is compelling. 2. Figure 1 is misleading, since it shows a nearest neighbor graph (with O(N) interactions). The lowest level should be have every node connected to every node (O(N^2) interactions), the next level should have a smaller subset of those nodes, all once again connected to one another, and so on. Their method uses the lowest level to compute nearby interactions, the next level to compute more well-separated interactions, the level above that to compute even more well-separated interactions, etc. They use NNs to construct the kernels on each level, as well as the kernels of the "transition" matrices which map between different levels.