The paper makes two incremental contributions in using online cluster assignments in self-supervised learning and using multiple crops in different resolutions for data augmentation. When these contributions are combined, decent gains in classification accuracy are obtained. The reviewers raise many issues with the current manuscript, including the discussion of momentum encoder, the discussion of existing clustering-based approaches, and the potential misuse of the term clustering. I ask the authors to incorporate all of these comments in the final version, but I believe the contributions even though incremental in nature, can benefit the fast growing field of self-supervised learning.