__ Summary and Contributions__: This paper proposes a contrastive learning algorithm to learn graph representations in an unsupervised manner. It is an extension of SimCLR [1] applied to learn graph representations that can be used for different graph classification tasks, either in semi-supervised learning, unsupervised learning or transfer learning scenarios.
To do so, the authors propose several graph augmentation techniques that are needed for the contrastive learning algorithm, and analyse its effects on different types of datasets.
The four different types of data augmentation techniques explored in the paper are: node dropping, edge perturbation, attribute masking and subgraph. In their empirical study, the authors explore the effect of these data augmentation techniques in different kinds of graph structure data like social networks and biochemical molecules, showing that different techniques work better on each domain, depending on the nature of the structure represented by the graph.
This pre-training technique shows promising results across different datasets and tasks.
[1] Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A simple framework for contrastive learning of visual representations. arXiv preprint arXiv:2002.05709.

__ Strengths__: The main strength of this paper is the novelty of the proposed contrastive learning technique and the detailed experimental evaluation of it.
Even though the proposed technique is adapted from a similar framework to learn representations using contrastive learning with visual data, the shift from images to graphs is not trivial, and most of the framework components need to be adapted to graph structured data.
The empirical evaluation consists on a thorough analysis of the effect of different data augmentation techniques in the contrastive learning framework. The authors show that as expected, data augmentation is crucial for the proposed contrastive learning technique, but also that the data augmentation techniques to be used depend on the data’s domain, since the nature of the structure of the graph defines the meaning of each data augmentation technique.

__ Weaknesses__: The technique presented in the paper would have been more complete if it had been evaluated for unsupervised node representation learning too. Node or edge level tasks are very relevant to graph structured data, and it would have increased the contribution and novelty of the paper.
Additionally, some details are not entirely clear from the paper. For example, the different datasets used, or more details about the choice of encoder, readout and projection functions used.

__ Correctness__: Yes

__ Clarity__: Yes, the paper is generally well written.

__ Relation to Prior Work__: It is discussed how the technique presented in this paper relates to previous work, specially SimCLR and Infograph, but the discussion could be longer and better structure. Also the comparison between this work and other unsupervised graph representation learning like Deep Graph Infomax would strengthen this section.

__ Reproducibility__: Yes

__ Additional Feedback__: The extension of this technique to node-level representation learning wouldn't be too different from the proposed technique, and it would strengthen the results of the paper if it showed improvement on node-level tasks as well.
Regarding the Broader Impact statement, it reads like a conclusion or summary of the contributions of the paper to the graph representation learning field, instead of the original goal of this section.
====Post-rebuttal update=====
The authors response what satisfactory.

__ Summary and Contributions__: In this paper, the authors propose GraphCL, a novel contrastive pre-training framework for graph representation learning. GraphCL first generates graph samples by applying four kinds of data argumentations on graphs, then it applies a contrastive loss to maximize agreement between graph embeddings of the same graph under different argumentations. The authors comprehensively studied four graph argumentations are proposed, namely node dropping, edge perturbation, attribute masking and subgraphs, through empirical experiments, and made some useful observations. Furthermore, experiments show that the pre-training technique proposed in GraphCL consistently benefits GNN performance and enable GNNs to be more robust to adversarial attacks.

__ Strengths__: + Data augmentation on graphs is relatively under-explored in previous work. This paper empirically studies four kinds of graph augmentation techniques, making a novelty contribution to this field.
+ GraphCL conducts extensive experiments on both properties and influence of graph data argumentations and the performance of pre-training GNNs, and makes several useful observations.
+ The paper is clearly written and the idea is easy to follow.

__ Weaknesses__: - The proposed GraphCL framework resembles contrastive framework in the vision domain such as SimCLR. The authors studied the role of data augmentations on graphs. Though extensive comparison have been made, the observations with respect to different augmentations on certain kinds of datasets are rather shallow, making this paper more like a survey paper.
- No theoretical analysis is provided behind the proposed GraphCL framework. Why does GraphCL works by optimizing the contrastive objective? How does the loss differs from previous attempts, such as DeepWalk-like objectives? How do different data augmentations on graphs affect the mutual information between views?
- GraphCL only considers pre-training the GNN at the graph level. However, node-level tasks are also very important. How does GraphCL differ from previous node-level work, such as DGI?

__ Correctness__: Commonly-used protocols are applied in empirical evaluation. Findings are based on experimental results.

__ Clarity__: The idea is easy to follow and the paper is clearly written.

__ Relation to Prior Work__: There has been a lot of research which has leveraged contrastive learning:
- Graph Representation Learning via Graphical Mutual Information Maximization, WWW 2020
- Unsupervised Attributed Multiplex Network Embedding, AAAI 2020
- Heterogeneous Deep Graph Infomax, arXiv preprint
Moreover, the authors claim that none of existing work studied contrastive methods from the perspective of enforcing perturbation invariance (Line 96 -- Line 98). However, I in person believe DGI and InfoGraph, which construct negative (i.e. "corrupted") graphs by performing node-level permutation, do enforce perturbation invariance, as they still target at maximizing MI between input and embeddings. The main difference should be prior arts use different ways of constructing negative graphs.

__ Reproducibility__: Yes

__ Additional Feedback__: Please see my detailed comments. I would like to particularly hear from the authors regarding the following points.
- I hope the authors can provide some insights into the motivation and methodology of the contrastive methods through deeper analysis on experimental results.
- The authors are expected to give theoretical analysis on how different data augmentations on graphs affect the mutual information between views.
====Post-rebuttal comments=====
I appreciate the authors' response to my concerns. However, I am still inlined to keep my score as I hope there will be in-depth discussion of the observations with different augmentations on certain kinds of datasets.

__ Summary and Contributions__: The paper proposed a method for pre-training graph neural networks. Its learning framework follows SimCLR, a SOTA method for pretraining CNNs. For data augmentation, authors design 4 methods to perturb graphs without affecting their semantic labels. Extensive experiments have been conducted to prove the effectiveness of the method. The paper is of good quality. My main concern is its novelty.

__ Strengths__: - GNNs pre-training is a valuable topic to explore.
- Authors transfer the contrastive learning framework to GNNs pretraining and prove its effectiveness through extensive experiments.

__ Weaknesses__: - The contrastive learning framework is the same as SimCLR.
- Graph augmentation methods, such as DropNode, DropEdge, FeatureMask, have been adopted in previous GNNs work, such as [1,2].
[1] DROPEDGE: TOWARDS DEEP GRAPH CONVOLUTIONAL NETWORKS ON NODE CLASSIFICATION.
[2] STRATEGIES FOR PRE-TRAINING GRAPH NEURAL NETWORKS.

__ Correctness__: To the best of my knowledge, the method is correct.

__ Clarity__: The paper is well written and clearly organized.

__ Relation to Prior Work__: Yes.

__ Reproducibility__: Yes

__ Additional Feedback__:

__ Summary and Contributions__: The authors propose a graph contrastive learning (GraphCL) framework to learn perturbation-invariant unsupervised representations of graph data. The method can produce graph representations of similar or better generalizability, transferability, and robustness compared to state-of-the-art methods. They also proposed several graph data augmentations.

__ Strengths__: 1) The paper is well-organized with motivation and approaches clearly stated and illustrated.
2) The experiments results verify the state-of-the-art performance of the proposed framework in both generalizability and robustness
3) The proposed GraphCL framework is novel and worth further exploring.

__ Weaknesses__: 1) Missing Citations and prior work:
[1] The MOCO paper is missing:
Kaiming He, et al., Momentum Contrast for Unsupervised Visual Representation Learning, CVPR 2020.
[2] Herzig, et al., Learning Canonical Representations for Scene Graph to Image Generation, ECCV 2020.
Recently, the authors showed in [2] that by using a canonicalization process for graphs, the information is propagated in the graph better than deeper networks, and thus they can generate complex visual scenes. Maybe it could be relevant as a new data augmentation (see next bullet)
2) The data augmentations are very straightforward. I wonder if more sophisticated augmentations could be used, such as capturing invariance to logical equivalences (see [2]), permutation invariant for the nodes, and more.
3) Although the graph domain is new, the graph contrastive learning framework seems to be employed from past works (SimCLR and MOCO).

__ Correctness__: Yes

__ Clarity__: Overall this paper is well written and the technical details are easy to follow.

__ Relation to Prior Work__: Yes.

__ Reproducibility__: Yes

__ Additional Feedback__: ================ POST-REBUTTAL UPDATE: ========================
After reading the authors' feedback and other reviewers' opinions, I would like to thank the authors for their rebuttal.
The rebuttal addresses my concerns. I believe this paper contains an interesting novelty, which is a promising direction worth investigating, and thus I would like to accept the paper. I raised my score to 7.
========================================================================