Paper ID: | 5203 |
---|---|

Title: | Bayesian Joint Estimation of Multiple Graphical Models |

The approach outlines in the paper is well-grounded on a theoretical basis. However, it would be extremely helpful to the reader to explain some of the conditions that allow for theoretical justification. It is not immediately clear if the conditions can be relaxed any further while retaining the same results. The estimation of sparsity structure is an important problem in graphical models.

Update after author response: I thank the authors for their very clear response. I think their response very well addresses many of the criticisms that were brought up in the reviews. I especially think that the addition of runtimes will be beneficial to the paper, as this can be a big issue in graphical model learning. On the various technical conditions: I think that providing more intuition for the technical conditions will be great (which the authors address in their response). I think the paper could be made even stronger by adding a discussion of how these conditions compare to those in the literature. ----------------------------------------------------------------------------------------- This paper is focused on estimating multiple Gaussian graphical models. The specific focus is on the setting when we have observations from multiple graphical models with the same underlying sparsity pattern in their precision matrices. The authors propose a point estimator to recover both the individual precision matrices and the underlying sparsity pattern. The authors prove a number of theoretical properties about their method and show that it works well empirically, even when their theoretical assumptions are violated. Clarity: the paper is well written and easy to understand. Originality: The ideas of grouped graphical model estimation and using Bayesian methods in this area are not new. The specific estimator the authors propose and their analysis of it seem to be new. Quality: The theoretical analysis of their method carefully states their assumptions and conclusions. In their empirical work, I appreciated that the authors empirically investigate what happens when their assumption of group structure is violated. I do think that the experiments could be made more convincing, though. First, the authors do not include runtimes for their method, so it is hard to say if the non-huge gains in accuracy are worth the computation cost. Second, it would be good to know how the hyperparameters of the competing methods were selected. Significance: I am not particularly familiar with the graphical model estimation literature, but the authors seem to have provided a good theoretical contribution over other work in the area. My main complaint in this area is that I do not think the characterization of their estimator as "Bayesian" is particularly significant. What is ultimately reported is a constrained MAP estimate that is then thresholded to obtain a sparsity pattern. In contrast, the two Bayesian works that the authors cite in this space ([18] and [22]) actually propose samplers to recover the full posterior. Additionally, it is hard to say that the author's prior has much meaning, given that it places zero probability mass on the sparse solutions of interest. I think this is sort of equivalent to saying the Lasso is a "Bayesian" method. Small things: - In condition A1, should clarify with parentheses whether log p / n < eta means log(p) / n < eta or log(p/n) < eta.

This paper proposes a Bayesian group regularization method to jointly estimate the multiple graphical models. The method is based on the spike-and-slab lasso prior. The convergence rate and structure recovery guarantee are provided. Experimental results show the advantage of the proposed method. Overall, the work is solid and interesting. However, I have a few concerns and questions. (1) Why does the training of the model use an EM procedure? Why not optimize the objective at (2.4)? (2) The paper claims the proposed approach is Bayesian. Then why does the paper not compare with other Bayesian approaches to learn multiple CGMs, such as the work [8] and [12] cited in the paper? If there are no special reasons, I strongly recommend the authors to supplement the experiments.

I updated my score to 6 since the author's provided some nice explanations in their author response including timing estimates and Bayesian characterization. ----- Original review ----- Originality: Overall the paper seems weak in originality given the amount of previous work on similar problems. The Bayesian viewpoint is somewhat interesting but the paper suggests using a MAP estimate anyways, which is equivalent to a penalized likelihood optimization. Quality: The theoretical results seem reasonable but not particularly surprising given the amount of work on sparse estimation including multiple precision matrices. The empirical results are slightly more promising but given that only one real dataset is used, I'm concerned that this won't generalize to other datasets. Maybe other datasets from [5, 11, 12, 15] could be used? Clarity: Overall, it would be beneficial if the paper included more insights and intuitions about the theorems. Why are the theoretical results particularly surprising or interesting? Additionally, the theorems seem to require many definitions and conditions, which makes it difficult to separate out what's new. Could these be simplified or reorganized for better clarity? Significance: Overall, the paper seems relatively incremental since many others have explored the general problem of learning multiple graphical models together including some theoretical results. The theoretical results are slightly stronger than previous papers but I'm not sure this work would impact ML practitioners or inspire new theoretical work. Other note: What is the computational complexity of this method compared to others? This is always an important thing to know in these situations.