Efficient Structure Learning of Markov Networks using $L_1$-Regularization

Part of Advances in Neural Information Processing Systems 19 (NIPS 2006)

Bibtex Metadata Paper

Authors

Su-in Lee, Varun Ganapathi, Daphne Koller

Abstract

Markov networks are commonly used in a wide variety of applications, ranging from computer vision, to natural language, to computational biology. In most current applications, even those that rely heavily on learned models, the structure of the Markov network is constructed by hand, due to the lack of effective algorithms for learning Markov network structure from data. In this paper, we provide a computationally efficient method for learning Markov network structure from data. Our method is based on the use of L1 regularization on the weights of the log-linear model, which has the effect of biasing the model towards solutions where many of the parameters are zero. This formulation converts the Markov network learning problem into a convex optimization problem in a continuous space, which can be solved using efficient gradient methods. A key issue in this setting is the (unavoidable) use of approximate inference, which can lead to errors in the gradient computation when the network structure is dense. Thus, we explore the use of different feature introduction schemes and compare their performance. We provide results for our method on synthetic data, and on two real world data sets: pixel values in the MNIST data, and genetic sequence variations in the human HapMap data. We show that our L1 -based method achieves considerably higher generalization performance than the more standard L2 -based method (a Gaussian parameter prior) or pure maximum-likelihood learning. We also show that we can learn MRF network structure at a computational cost that is not much greater than learning parameters alone, demonstrating the existence of a feasible method for this important problem.

Undirected graphical models, such as Markov networks or log-linear models, have been used in an ever-growing variety of applications, including computer vision, natural language, computational biology, and more. However, as this modeling framework is used in increasingly more complex and less well-understood domains, the problem of selecting from the exponentially large space of possible network structures becomes of great importance. Including all of the possibly relevant interactions in the model generally leads to overfitting, and can also lead to difficulties in running inference over the network. Moreover, learning a "good" structure can be an important task in its own right, as it can provide insight about the underlying structure in the domain. Unfortunately, the problem of learning Markov networks remains a challenge. The key difficulty is that the maximum likelihood (ML) parameters of these networks have no analytic closed form; finding these parameters requires an iterative procedure (such as conjugate gradient [15] or BFGS [5]), where each iteration runs inference over the current model. This type of procedure is computationally expensive even for models where inference is tractable. The problem of structure learning is considerably harder. The dominant type of solution to this problem uses greedy local heuristic search, which incrementally modifies the model by adding and possibly deleting features.