NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Reviewer 1
This paper investigates the matrix decomposition under the assumption that the spiked vector comes from a generated model. In particular, a single layer generating model with a linear/non-linear activation is considered. The authors study the phase transition on when the underlying spiked vector can be recovered, and shows that there is no algorithmic gap with generative-model priors, which is different from the sparse model. In addition, a new spectral method based on approximate messaging is proposed. The authors shows that this algorithm can reach the statistically optimal threshold. In general, this manuscript is well-written. The contribution is sufficiently significant. I recommend publication. On the other hand, I feel the 1-layer generative model may be overly simple. Some discussion on multi-layer networks as a prior is expected. Update after reading the rebuttal letter: The authors well addressed my question on how to generalize the model to the multi-layers case by providing additional experimental results. On the other hand, I do expect more interesting results on how this Bayes settings can give us more insight on practical problems that beyond information theoretical bound and algorithmic gap. My score is the same and I recommend publication.
Reviewer 2
The paper provides interesting and technically sound results that extend results for sparse models to ones where the spikes are generated with a (random) generative neural network: - The paper finds that contrary to sparse models, known algorithms obtain asymptotically optimal performance even if k is small. - The paper designs a spectral method, an approximate message passing algorithm, that reaches the statistical optimal threshold. - The paper provides information theoretic results on the estimation of the spikes, for a multilayer network, extending previous results for the single layer network. The paper writes in Section 3 that the AMP algorithm asymptotically reaches optimal performance, and that the analysis from the previous section on fixpoints applies. In the previous section the fix points are determined numerically in Figure 1. How can the paper conclude that the AMP reaches optimal performance in general, would that not only apply to the regime studied numerically? For the
Reviewer 3
This paper is well-written and well cited. In terms of techniques, the techniques used in this paper is mostly established tools developed in recent years. Though, for this complicated model, applying these established tools is already challenging. In terms of contribution, this model provides another example towards understanding the statistical-computational gap in teacher-student scenario: the behavior of this model is different from sparse PCA. Though, it is arguable that this different phenomenon is interesting enough. Overall, this is a good paper, so I vote for acceptance. -- The question was that, whether this result is interesting enough. The authors' reply sounds reasonable.