This paper discusses some implicit limitations of GNNs in terms of bias towards homophilous predictions. The paper presents three GNN architectural guidelines for combating this, which can lead to improved predictions, particularly on networks exhibiting heterophilous structure (i.e., non-homophilous labels). The design choices are motivated theoretically and intuitively and then combined into a single model that can provide better predictions on networks with heterophilous structure, as demonstrated by synthetic and real-world data experiments. The paper provides a number of interesting insights into why certain GNN architectural choices can help predictions in the case of low network homophily. Although not mentioned in their paper, a similar idea to higher-order neighborhoods (Section 3.1.2) has recently been deployed within more classical graph-based semi-supervised algorithms (i.e., label propagation or label smoothing) for predicting gender on Facebook networks, where gender is not majorly homophilous (see Altenburger-Ugander, Nature Human Behavior 2018 and Chin et al., WWW 2019; full references below). I believe that these ideas provide further motivation for the design choices appearing in this paper and including them will strengthen some of the intuition. Gender prediction is another task that the authors might consider. As pointed out by other reviewers, some of the methodological ideas already appear in the literature in a few places (see also my above comments). However, there is still important novelty in this paper, which was clarified in the authors’ response and subsequent discussion. Specifically, the central problem of making predictions in networks with heterophilous label structure is the motivator for the components, which is different from the original papers. Furthermore, there is clear theoretical and intuitive justification in the paper for why these approaches should help address the central problem. Because of this, I believe that the paper actually generates insight into why some existing ideas may work as well as they do while offering a new way to think about evaluation of GNNs. In addition, the setup of the experiments and evaluation (both on synthetic and real datasets) are novel. There is one dissenting opinion in the set of four reviewers. Reviewer #1’s concerns boil down to the following: (a) a lack of motivation for the methods; (b) the proposed methods sometimes do not perform as well as other ones; and (c) some missing benchmarks. These are all non-issues in my view. For (a), there is clear motivation in the paper, and this was listed as a strength by the three other reviewers. Issue (b) is a misunderstanding. The authors intentionally present some datasets with high levels of homophily where the proposed methods are not supposed to be better than existing approaches; this appeared in the author response (lines 26--32). Finally, the supposedly missing benchmarks in issue (c) (Cora, CiteSeer, and PubMed with standard train/validation/test splits) are actually in the paper. I think confusion arose because the paper includes one dataset (“Cora Full”, which has different training/validation/test splits) for which they did not have prediction performance of two baseline methods (GAT* and GEOM-GCN*), as the corresponding paper did not run experiments on that dataset. I commend the authors for addressing an interesting and important problem with intuition, theory, and experiments. The paper also opens the door to further research addressing the role of homophily or heterophily when making structured predictions. Some recent research has been circling these ideas (see below for a few more references), so I also think the research is timely. Overall, I would be very pleased to see this paper appear at NeurIPS. For the camera version, here are several recommendations that are in addition to smaller points appearing in the reviews. 1. Give proper credit for some of the individual components of the model (which have appeared elsewhere) and focus the paper on the problem (going beyond homophily). 2. Include a better description of the predictions. The non-homophilous attributes are not mentioned in the main text, even though they are described in the appendix. 3. Include some acknowledgment, discussion, and (possibly) appropriate comparison against other graph-based semi-supervised learning methods from the data mining community that deal with homophily / heterophily: -- Gatterbauer. Semi-Supervised Learning with Heterophily. 2014. -- Peel. Graph-based semi-supervised learning for relational networks. SDM, 2017. -- Altenburger and Ugander. Monophily in social networks introduces similarity among friends-of-friends. Nature Human Behavior, 2018. -- Chin et al. Decoupled smoothing on graphs. WWW, 2019. -- Jia and Benson. Residual Correlation in Graph Neural Network Regression. KDD, 2020.