NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:6878
Title:Scalable Deep Generative Relational Model with High-Order Node Dependence

The paper was reviewed by three experts in the field. The reviewers and AC all agree that the paper contains novel contributions, but share the same opinion that it could be strengthened by addressing the reviewers' comments. In addition to the reviewers' comments such as the need to adding comparison with VGAE and its variates, the AC would like to provide some additional feedback to the authors: The AC views the paper as some kind of smart combination of edge partition model, gamma belief net, and Dirichlet belief net, enhanced by adding covariate dependence and by incorporate the network information in learning the connection weights of the Dirichlet belief net. Pros: 1) the combination is non-trival: replacing the gamma weights in edge partition model with latent counts is the key to allow closed-form Gibbs sampling (upward latent count propagation followed by downward variable sampling). How the X is used in (3) and sampled in (5) is novel. 2) the way the network information is used to build the Dirichlet belief net, as shown in (2), is smart and seems novel. 3) showing an interesting connection that the proposed model is related to mixed-membership stochastic blockmodel, but does not suffer from the same issue of having O(N^2) computation. 4) the authors have mastered a number of non-trival data augmentation techniques for multinomial/Poisson/gamma/Dirichlet variables and nicely combined them to derive Closed-form Gibbs sampling updates. Both Good and Bad: While improving Edge partition model with node covariates has been considered before by HLFM [11], where the gamma weights have been replaced with binary weights, here the authors use count-valued weights modeled with a Dirichlet belief net whose N*N connection weights are sparsified by the N*N network adjacency matrix. Cons: 1) The adjacency matrix, which the hierarchical model is trying to model, has been used to parameterize the Dirichlet belief net prior, which makes the model no-longer an exact generative model. 2) The comparison with HGP-EPM in Figure 4 might not be that fair, as HGP-EPM is not designed to include covariates (the authors are aware of that, as in Line 286-287). It is unclear to me why the Plain-SDREM (i.e., SDREM without node covariates) is not used for comparison with HGP-EPM. 3) In addition to having close-form Gibbs sampling updates, a great defense for the authors to argue against the need to compare with GCN and VGAE is demonstrating the interpretability of the proposed model, e.g., showing communities memberships and overlapping community structures, as it has been done in both EPM [29] and HLFM [11]. 4) It could follow HGP-EPM to model the Lambda matrix with a relational hierarchical gamma process prior, and hence could potentially infer the number of communities. Unfortunately, the authors choose to impose an i.i.d. gamma prior instead and hence could no longer infer K. Related to this point, Figure 2 second panel shall further increase K, which may show performance drop when K is over certain size (which could be prevented if the relational hierarchical gamma process prior is used). In summary, the paper does have clear novel contributions, but also has clear room for improvement. The authors are encouraged to carefully revise their paper to further enhance its quality.