Harald Steck, Tommi Jaakkola
A common objective in learning a model from data is to recover its network structure, while the model parameters are of minor in(cid:173) terest. For example, we may wish to recover regulatory networks from high-throughput data sources. In this paper we examine how Bayesian regularization using a product of independent Dirichlet priors over the model parameters affects the learned model struc(cid:173) ture in a domain with discrete variables. We show that a small scale parameter - often interpreted as "equivalent sample size" or "prior strength" - leads to a strong regularization of the model structure (sparse graph) given a sufficiently large data set. In par(cid:173) ticular, the empty graph is obtained in the limit of a vanishing scale parameter. This is diametrically opposite to what one may expect in this limit, namely the complete graph from an (unregularized) maximum likelihood estimate. Since the prior affects the parame(cid:173) ters as expected, the scale parameter balances a trade-off between regularizing the parameters vs. the structure of the model. We demonstrate the benefits of optimizing this trade-off in the sense of predictive accuracy.