Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
This paper presents an interesting finding that is backed up by rigorous analysis and reasonably persuasive examples. Although the idea is quite simple (vanilla gradient descent with warm enough initialization), the execution is precise and the writing quality is very high. I find the theoretical contributions to be clear and elegant, while the data analysis is a bit lacking. In particular, the lack of convincing real data analysis makes it unclear how how realistic the assumptions are/how strong the theoretical statements are in practice. However, the assumptions are rather standard, and it is unsurprising that requiring many random restarts etc are necessary for difficult non-convex settings. Finally, it is not totally clear exactly which results extend to non-symmetric tensors, though I assume many of the choices in this regard are for exposition.
In this paper, a fast nonconvex algorithm along with theoretical guarantees on local convergence and linear time computational complexity are developed and analyzed for symmetric tensor completion. The performance of the proposed algorithm is evaluated by conducting numerical tests on synthetic data and it is shown that the proposed method has some merits for dealing with noisy data. Overall, the paper is well-written and easy to read. The motivations are clear and the technical content of the paper seems to be correct. However, there are some issues that need to be addressed to further improve the current paper. - The experimental results are very limited. The main drawback of this paper is that there is no comparison with existing methods, which limits their contribution. - It is unclear whether the proposed method is really useful in practice. More experiments need to be done on real data. - Except for the theoretical analysis, it would be better to show the efficiency of the proposed algorithm with experimental results. - There is a discrepancy between the title of the paper that promises a tensor completion method and the proposed algorithm that is done only for symmetric tensor, and hence completely neglects the difference of decompositions to deal with symmetric tensor.
This paper studies the low-rank tensor completion problem. Authors propose a two-stage nonconvex algorithm, where the first stage is initialization and the second stage is gradient descent. Authors prove that sample complexity is same as existing works, but this algorithm enjoys linear convergence. Overall, the paper is readable and has good quality, but I suspect whether it is suitable for conference paper. One reason is that many key intuitions and backgrounds, especially for initialization step, are deferred to supplementary material. So the story is incomplete and has no clear explanations. Besides this, I have some technical concerns. 1, the most pressing concern is assumption A3. Can we relax A3 since that's too strong. It is not usually required in other related methods. Is it because of A3 that you don't need regularization in GD? Under A3, the least square-like problem has no design. 2, Q1 and Q2 in Sec 1.2 are not well proposed. It would be better if authors clarify contributions more clearly. To me, the main challenge is sample complexity, which is not improved. It's not surprising that two-stage algorithm achieves linear rate with certain near-optimal statistical rate, since in principle linear rate comes from applying GD on square objective. This nonconvex objective appears in low-rank matrix completion and robust estimation quite often, and has been studied in . 3, there is no clear dependence with kappa in main theorem 1.4. Please clarify how the result depends on kappa. What's the power of kappa in step size. 4, why GD has no projection step such that the incoherence condition keeps holding along the iteration. Also, why we have an upper bound for the iteration times? This is not common for two-stage algorithms when it is applied to other problems, e.g. robust estimation. 5, A separate result for initialization step is needed. Using this initialization for other methods, e.g. , will the performance of other methods get improved? 6, for experiments, authors should provide comparisons with other advanced methods, especially [33, 63], to show the superiority.