Paper ID: | 3427 |
---|---|

Title: | Bayesian Learning of Sum-Product Networks |

The paper is technically sound, clearly written and leads the reader through the steps required to understand the formulations. The contribution is significant as it solves a difficult task previously avoided in the literature. The empirical section places the method as a strong contender in the learning stack of SPNs. A task that is increasingly difficult as there is constant progress evaluated on the same benchmarks. The contribution is original, the authors give clear references to previous work that is related to bayesian learning and region graph structure on SPNs.

**************** Thanks for the detailed response, which addressed most of my concerns (based on some promise). Given the space constraint of the rebuttal, I will trust the authors to indeed incorporate the changes as promised, and given this I increased my score. ***************** This paper proposes a novel method to do structure learning for SPNs. However, at several places in this paper, it is too dense to follow. More detailed comments are as follows. First, this paper lacks a dedicated related work section. There is some brief discussion about how this work differs from existing literature, in the introduction, yet it is not enough. Section 2 is called "Background & Related works". However, I would say 95% of this section is just background knowledge on SPNs. More importantly, this paper lacks coverage about related works that are not about SPNs but on some other dialects of tractable graphical models (for example, cutset networks and probabilistic sentential decision diagrams, etc.) Second, it is not easy to follow exactly how a computation graph is constructed from a region graph. Here, having an example figure would be of great help. Furthermore, it seems Figure 1 never gets mentioned in the paper. And I am also not sure how to exactly interpret Figure 1 given no explanations accompany it. Third, this point is related to my first point. In experiments, I also encourage the authors to include results from structure learning methods on other dialects of tractable graphical models to form a real SOTA baselines. Also I encourage the authors to drop the arrows in the tables. They dilute the focus of the tables, which should be the bold/underlined numbers. Fourth, I am not quite convinced by the results on missing data. Mean imputation, the arguably most popular imputation method, is missing from the results. More importantly, I believe the baselines may be too weak. One natural way to deal with missing data is to run EM + MPE inferences. Fifth, in some sense, I am not quite convinced that the method proposed in this paper is a complete structure learning algorithm, as the region graph and hence the computation graph is predefined. In other words, this paper’s structure learning is solely on learning the scope functions. Perhaps learning the scope functions alone is enough? If so, can the authors provide a compelling for it? Typos: parametrization -> parameterization on line 13, 61, 74,…

This paper proposes a Bayesian approach to learn a SPN model from data. Unlike the existing methods that jointly estimate the computational graph and scope function, the paper separates the two tasks, and claims that the first task is similar to neural network structure validation. The paper assumes the graph is given, then parameterizes the scope function and SPN distribution so that a Bayesian generative model can both sample SPN parameters and determine the scope function. Then a Gibbs sampling algorithm is developed for the posterior inference, which in turn result in the learned SPN model and parameters. The experiments verified the effectiveness of the proposed approach. Overall, this is a piece of interesting and solid work. The ideas of using induced trees to parameterize SPN distribution and using regional graphs to parameterize the scope function are amazing, because they render a Bayesian model formulation that can include both the SPN distribution parameters and the scope function. My major concern is another task --- the identification of the computational graph. Although it is reasonable to separate the two tasks, the paper seems to overlook the difficulty or importance of the structure identification. In fact, identifying appropriate NN structure is non-trivial and require massive computational resources. The authors are referred to the recent work of AutoML. Listed are my detailed concerns and suggestions. (1) Although the authors claim that they decompose the SPN model learning into the two tasks, they never give a detailed solution of the first task, i.e., computational graph identification. They always assume the graph is given. A natural problem is, once the scope function is learned for a particular graph, what is the next step? Do we have an iterative procedure to switch to a better graph, and then learn the scope function again? Although the first step is not the major focus, the paper should at least discuss possible methods. (2) Since there have already been prior work that jointly learn the graph and scope function, it seems unfair to fix the graph structure during the evaluation. The authors probably can embed the scope function learning approach into an existing framework that can search for graphs and then compare with the existing approaches. Then the comparison results will be much more convincing.