__ Summary and Contributions__: The paper studies a binary version of the matrix completion problem, where we are additionally given a graph that provides "side information". Specifically, there is an ideal ratings matrix (users, items), and there is a graph between the users. The ideal matrix is assumed to have a _very stylized_ structure: users are divided into precisely two clusters of equal size, each cluster is divided into three groups of equal size, and within each group all the users have _exactly the same_ rating vector. (There's an additional linear algebraic dependence which I will ignore here.)
The side information graph is assumed to be a block model whose parameters correspond to edge probability within groups, within cluster but across groups, and across clusters (and these are in decreasing order).
In this setting, given a partially observed and independently corrupted ranking matrix, the paper gives an iterative refinement algorithm that can _perfectly_ reconstruct the ideal matrix with a good probability. They give conditions under which the error probability goes to 0.

__ Strengths__: - The algorithm is fairly clean, and the fact that perfect recovery is possible is nice.
- Taking advantage of side information in a provable way is a nice feature that is often difficult to prove.

__ Weaknesses__: - The setting is a bit too stylized, and I found it very unconvincing. Perhaps the authors can generalize at least to a "k distinct vectors" and "t clusters" model instead of 2 vectors and 2 clusters.
- The boolean operations are also unrealistic. Is there an intuitive rationale for one group of people having rating vector x, another y, and a third having x \XOR y ?
- While there are technical improvements, the substantial improvement over [39, 40] seems unconvincing.
- Experiments are primarily on synthetic data. The only real dataset appears to be one where the graph is real but the ratings are synthetic. Some explanation on this would be helpful.
Overall, I think that a slightly worse result (in terms of imperfect recovery), but allowing a richer model (e.g., more groups/clusters, not necessarily perfect alignment between the graph and the matrix) would be a welcome addition to the paper.

__ Correctness__: Claims and theorems seem correct to me.

__ Clarity__: Paper is quite well written.

__ Relation to Prior Work__: Yes

__ Reproducibility__: Yes

__ Additional Feedback__: Do you have any experiments by running the algorithms discussed to real ratings data? (Having hypothetical ratings but real graph seems like a validation of the graph clustering subroutines rather than ratings..)

__ Summary and Contributions__: This paper proposes a matrix completion algorithm for binary matrix with graph side information. While this problem has been studied before, the paper assumes there is group structure inside the cluster. The paper gives the sample complexity for such kind of problem, depending on the graph structure and the rate of observation. The paper also proposes an algorithm for solving such kind of algorithm, with a theoretical guarantee. Empirical results on synthetic and semi-synthetic data are used.

__ Strengths__: The paper proposes a new structure of graph side information. Rigorous theoretical bound are provided for using such new structure of graph side information. They also propose an algorithm which are shown to be both theoretically and empirically tractable.

__ Weaknesses__: The new structure needs a motivation. The problem of binary matrix completion with graph side information is already been studied. The novelty of the current paper is that in addition to the classical clustering structure, it assumes one-layer of clustering: cluster within cluster. How can such a new structure agree with the real application? I see from the experimental part no experiments on real data is given. Does this mean this problem is theoretically interesting but has few practical meanings?
It is not clear why using such a performance measurement in the paper. In Eq.1, \psi is used to estimate the matrix. So, the worse-case means there is no ground truth for the target matrix? Or there is no constraint on the ground truth for the target matrix M? Such a part needs more explanation to make the metric reasonable.
More discussions are needed on how the current theoretical guarantee on sample complexity compared to previous ones. Otherwise, we can not sure the new structure information in the graph do help completion.
--------------------------------------------------
Not all questions have been successfully addressed (such as experimental on real data and a motivation for the problem regarding "binary" matrix completion). But I still think this paper can be accepted. I will keep my score.

__ Correctness__: Correct

__ Clarity__: It is generally clear. Some more explanations are need. See "Weaknessses"

__ Relation to Prior Work__: More discussions are need. See "Weaknessses"

__ Reproducibility__: Yes

__ Additional Feedback__:

__ Summary and Contributions__: The authors consider a matrix completion problem with graph side information, where the graph forms a hierarchy. They consider the stochastic block model for analysis. Under a specific observation model and structure on the graph, the authors characterize the recovery probability of the underlying low rank matrix. They also present an algorithm that is guaranteed to obtain the right solutions under the same assumptions. The authors experimentally verify that the proposed method achieves superior results compared to a few baselines.

__ Strengths__: Well written paper, and the theoretical results are well explained with intuition. Both an information theoretic bound as well as an algorithm that achieves the correct answer when the bounds are satisfied are presented. I'm not completely aware of some of the related works, so it's hard to tell how truly novel the theoretical results are. The experimental results are encouraging, but i'd prefer some runtime plots too.

__ Weaknesses__: The problem setting seems contrived. Why 2 groups within each cluster specifically? can this be generalized?
the polynomial time dependence in (n) for the method is potentially prohibitive. Are there ways this can be addressed?
Algorithmic runtime is not discussed. Especially, in comparison to other methods that use graph structured information and are highly scalable, and in light of the poly(n) complexity, this is something that should be addressed.

__ Correctness__: Yes

__ Clarity__: Yes

__ Relation to Prior Work__: Couple of references can be added in the context of using graph side information:
"Collaborative filtering with graph information: Consistency and scalable methods." Advances in neural information processing systems. 2015.
"Kernelized probabilistic matrix factorization: Exploiting graphs and side information." Proceedings of the 2012 SIAM international Conference on Data mining. Society for Industrial and Applied Mathematics, 2012.

__ Reproducibility__: Yes

__ Additional Feedback__: Line 64: not sure i follow the distinction from prior work: if clustering is a consequence of matrix completion, does it not mean matrix completion is a necessary condition? how is that more relaxed?
Line 90: the problem setting seems contrived. Why 2 basis vectors specifically? can this be generalized? What happens instead, if we have 'k' bases and hence (k+1) groups? what if there are more clusters?
eqn 1: considering the observations can be noisy, are you considering exact matrix recovery in the limit? if so, it will be good to call out.
remark 1: This is interesting for another reason. The sample complexity for matrix recovery is O(m logm). So it seems that when the graph can be perfectly clustered, there is no advantage in using the graph? isn't this counterintuitive? I'd assume at the very least, you can solve 'K' separate matrix completion problems when the graph can be separated into K clusters.
line 220: why not map +1 to +1 and 0 to -1 ? Also, why not just assume you get to see Z instead of Y in the problem formulation?
Line 302: can you provide details about the subgraph? what was m, n? how many samples?

__ Summary and Contributions__: This paper considers a binary matrix completion problem with side information and proposes a hierarchical clustering method with refinements to achieve efficient and accurate matrix completion. The information-theoretic limit on the number of observed matrix entries is provided. Empirical analyses on synthetic and semi-real datasets showed that the proposed method can outperform several baseline methods.

__ Strengths__: The theoretical contribution of this paper is to provide the optimal sample complexity bound for exact recovery of the binary matrix completion problem. The proposed method is not limited to recommendation problem but is also applicable to clustering problems with side information.

__ Weaknesses__: The problem setting of this paper seems to be impractical. The authors assumed that the users are divided into two clusters and each cluster only contains three groups. This may not be a very practical setting for real recommender systems.
The experiments are conducted on synthetic datasets. Although some experiments are conducted on the Poliblog Dataset, the ratings are synthetic. It will be more interesting to see how the performance of the proposed method differ from state-of-the-art recommendation algorithms on real recommendation tasks.
-------------------------
I have read the authors' rebuttal and will keep my original scores for this paper.

__ Correctness__: The theoretical and empirical results seem correct to me.

__ Clarity__: The paper is well written.

__ Relation to Prior Work__: Differences from prior works are clearly discussed in this paper.

__ Reproducibility__: Yes

__ Additional Feedback__: