Export Reviews, Discussions, Author Feedback and Meta-Reviews

Paper ID:	386
Title:	A Nonconvex Optimization Framework for Low Rank Matrix Estimation

Current Reviews

Submitted by Assigned_Reviewer_1

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)

The paper addresses convergence of alternating algorithms in non-convex formulations of matrix completion and matrix sensing. Global convergence with geometric rate is established under certain conditions.

The paper is very ambitious. It addresses 3 algorithms (alternating exact minimisation, alternating gradient descent, gradient descent) for 2 problems (completion and sensing). Because it is impossible to address the 6 all possible cases, many parts are either sketched or relegated to supplementary materials. As the results, the paper feels a bit scattered and doesn't read very well.

I would prefer that the authors address one special case rigorously and then explain how the results apply to other cases with minor modifications.

The experiments are trivial - they should be either discarded or greatly improved (preferred).

** General comments - the role of initialisation in the proposed convergence results is not clear. For example, it doesn't seem to intervene in the proposed Theorems (though this is announced in line 38). I suppose that initialisation is critical to obtain the global convergence ?

- after reading Section 3.1, it feels like the theoretical study is done, that Eq. on line 223 provides the necessary results. What is it that Theorems 3.4 and 3.6 specifically address in the context of the discussion of Section 3.1 ? - more generally, the paper doesn't sufficiently emphasises what are the difficulties that are being solved and doesn't provide enough intuitions about the solutions. For example, Section 4 is not very inspiring and does not sufficiently explains how the proof works at the general level. A key contribution seems to be the use of the projected Oracle divergence as defined in Eq. (3.5) and I wish it was more motivated. - how realistic/stringent are the conditions of Theorems 3.4 and 3.6 in practice ? - please explain the difference between alternating gradient descent and gradient descent - examining Algorithm 1 doesn't tell much. - the issue of identifiability and the chosen forms of U and V could be more discussed, as well as the relation between the respective convergences of U, V and M=UV.

** Specific comments - I don't see why the decomposition of M* in line 309 should depend on t ? - what's the meaning of Assumption 3.5 ? I don't see how it prevents sparsity. The use of star indices is confusing (clashes with the use of star to denote ground truth). - what's Omega in line 241, Eq. 3.10, etc ? - the sentence in lines 122-123 seems useless. - what's the renormalisation issue in line 188 ? - weird phrasing in lines 228-229. - caption of Fig.1 essentially repeats the text of Section 5.

** typos - entires - board - takeS line 75 - existS line 276 - noting that line 125 - that our line 253