{"title": "A Convex Upper Bound on the Log-Partition Function for Binary Distributions", "book": "Advances in Neural Information Processing Systems", "page_first": 409, "page_last": 416, "abstract": null, "full_text": "A Convex Upper Bound on the Log-Partition Function for Binary Graphical Models\n\nLaurent El Ghaoui Department of Electrical Engineering and Computer Science University of California Berkeley Berkeley, CA 9470 elghaoui@eecs.berkeley.edu Assane Gueye Department of Electrical Engineering and Computer Science University of California Berkeley Berkeley, CA 9470 agueye@eecs.berkeley.edu\n\nAbstract\nWe consider the problem of bounding from above the log-partition function corresponding to second-order Ising models for binary distributions. We introduce a new bound, the cardinality bound, which can be computed via convex optimization. The corresponding error on the logpartition function is bounded above by twice the distance, in model parameter space, to a class of \"standard\" Ising models, for which variable inter-dependence is described via a simple mean field term. In the context of maximum-likelihood, using the new bound instead of the exact log-partition function, while constraining the distance to the class of standard Ising models, leads not only to a good approximation to the log-partition function, but also to a model that is parsimonious, and easily interpretable. We compare our bound with the log-determinant bound introduced by Wainwright and Jordan (2006), and show that when the l1 -norm of the model parameter vector is small enough, the latter is outperformed by the new bound.\n\n1 Introduction\n1.1 Problem statement\n\nThis paper is motivated by the problem fitting of binary distributions to experimental data. In the second-order Ising model, PUT REF HERE the fitted distribution p is assumed to have the parametric form p(x; Q, q ) = exp(xT Qx + q T x - Z (Q, q )), x {0, 1}n , where Q = QT Rn and q Rn contain the parameters of the model, and Z (Q, q ), the normalization constant, is called the log-partition function of the model. Noting that xT Qx + q T x = xT (Q + D(q ))x for every x {0, 1}n , we will without loss of generality assume that q = 0, and denote by Z (Q) the corresponding log-partition function x Z (Q) := log exp[xT Qx] . (1)\n{ 0 , 1 } n\n\nIn the Ising model, the maximum-likelihood approach to fitting data leads to the problem min Z (Q) - TrQS,\nQ Q\n\n(2)\n\n\f\nn where Q is a subset of the set S n of symmetric matrices, and S S+ is the empirical second-moment matrix. When n Q = S , the dual to (2) is the maximum entropy problem x max H (p) : p P , S = p(x)xxT , (3) p {0,1}n\n\nwhere P is the set of distributions with support in {0, 1}n , and H is the entropy x H (p) = - p(x) log p(x).\n{ 0 , 1 } n\n\n(4)\n\nThe constraints of problem (3) define a polytope in R2 called the marginal polytope. For general Q's, computing the log-partition function is NP-hard. Hence, except for special choices of Q, the maximum-likelihood problem (2) is also NP-hard. It is thus desirable to find computationally tractable approximations to the log-partition function, such that the resulting maximum-likelihood problem is also tractable. In this regard, convex, upper bounds on the log-partition function are of particular interest, and our focus here: convexity usually brings about computational tractability, while using upper bounds yields a parameter Q that is suboptimal for the exact problem. Using an upper bound in lieu of Z (Q) in (2), leads to a problem we will generically refer to as the pseudo maximumlikelihood problem. This corresponds to a relaxation to the maximum-entropy problem, which is (3) when Q = S n . Such relaxations may involve two ingredients: an upper bound on the entropy, and an outer approximation to the marginal polytope. 1.2 Prior work Due to the vast applicability of Ising models, the problem of approximating their log-partition function, and the related maximum-likelihood problem, has received considerable attention in the literature for decades, first in statistical physics, and more recently in machine learning. The so-called log-determinant bound has been recently introduced, for a large class of Markov random fields, by Wainwright and Jordan [2]. (Their paper provides an excellent overview of the prior work, in the general context of graphical models.) The log-determinant bound is based on an upper bound on the differential entropy of continuous random variable, that is attained for a Gaussian distribution. The log-determinant bound enjoys good tractability properties, both for the computation of the log-partition function, and in the context of the maximum-likelihood problem (2). A recent paper by Ravikumar and Lafferty [1] discusses using bounds on the log-partition function to estimate marginal probabilities for a large class of graphical models, which adds extra motivation for the present study. 1.3 Main results and outline The main purpose of this note is to introduce a new upper bound on the log-partition function that is computationally tractable. The new bound is convex in Q, and leads to a restriction to the maximum-likelihood problem that is also tractable. Our development crucially involves a specific class of Ising models, which we'll refer to as standard Ising models, in which the model parameter Q has the form Q = I + 11T , where , are arbitrary scalars. Such models are indeed standard in statistical physics: the first term I describes interaction with the external magnetic field, and the second (11T ) is a simple mean field approximation to ferro-magnetic coupling. For standard Ising models, it can be shown that the log-partition functions has a computationally tractable, closed-form expression. Due to space limitation, such proof is omitted in this paper. Our bound is constructed so as to be exact in the case of standard Ising models. In fact, the error between our bound and the true value of the log-partition function is bounded above by twice the l1 -norm distance from the model parameters (Q) to the class of standard Ising models. The outline of the note reflects our main results: in section 2, we introduce our bound, and show that the approximation error is bounded above by the distance to the class of standard Ising models. We discuss in section 3 the use of our bound in the context of the maximum-likelihood problem (2) and its dual (3). In particular, we discuss how imposing a bound on the distance to the class of standard Ising models may be desirable, not only to obtain an accurate approximation to the log-partition function, but also to find a parsimonious model, having good interpretability properties. We then compare the new bound with the log-determinant bound of Wainwright and Jordan in section 4. We show\n\nn\n\n\f\nthat our new bound outperforms the log-determinant bound when the norm Q 1 is small enough (less than 0.08n), and provide numerical experiments supporting the claim that our comparison analysis is quite conservative: our bound appears to be better over a wide range of values of Q 1 . Notation. Throughout the note, n is a fixed integer. For k {0, . . . , n}, define k := {x {0, 1}n : Card(x) = k }. Let ck = |k | denote the cardinal of k , and k := 2-n ck the probability of k under the uniform distribution. For a distribution p, the notation Ep refers to the corresponding expectation operator, and Probp (S ) to the probability of the event S under p. The set P is the set of distributions with support on {0, 1}n . For X Rnn , the notation X 1 denotes the sum of the absolute values of the elements of X , and X the n largest of these values. The set S n is the set of symmetric matrices, S+ the set of symmetric positive semidefinite n matrices. We use the notation X 0 for the statement X S+ . If x Rn , D(x) is the diagonal matrix with x nn on its diagonal. If X R , d(X ) is the n-vector formed with the diagonal elements of X . Finally, X is the set {(X, x) S n Rn : d(X ) = x} and X+ = {(X, x) S n Rn : X xxT , d(X ) = x}.\n\n2 The Cardinality Bound\n2.1 The maximum bound To ease our derivation, we begin with a simple bound based on replacing each term in the log-partition function by its maximum over {0, 1}n . This leads to an upper bound on the log-partition function: Z (Q) n log 2 + max (Q), where max (Q) :=\nx{0,1}n\n\nmax\n\nxT Qx.\n\nComputing the above quantity is in general NP-hard. Starting with the expression max (Q) =\n(X,x)X+\n\nmax\n\nTrQX : rank(X ) = 1,\n\nand relaxing the rank constraint leads to the upper bound max (Q) max (Q), where max (Q) is defined via a semidefinite program: max (Q) = max TrQX, (5)\n(X,x)X+\n\nwhere X+ = {(X, x) S R\n\nn\n\nxx , d(X ) = x}. For later reference, we note the dual form: D 0 ( ) - Q 1 2 max (Q) = min t : 1T t t, 2 1T = min (D( ) - Q)-1 : D( ) Q. 4\n\nn\n\n:X\n\nT\n\n(6) (7)\n\nThe corresponding bound on the log-partition function, referred to as the maximum bound, is Z (Q) Zmax (Q) := n log 2 + max (Q). The complexity of this bound (using interior-point methods) is roughly O(n3 ). Let us make a few observations before proceeding. First, the maximum-bound is a convex function of Q, which is important in the context of the maximum-likelihood problem (2). Second, we have Zmax (Q) n log 2 + Q 1 , which follows from (5), together with the fact that any matrix X that is feasible for that problem satisfies X 1. Finally, we observe that the function Zmax is Lipschitz continuous, with constant 1 with respect to the l1 -norm. It can be shown that the same property holds for the log-partition function Z itself. Due to space limitation such proof is omitted in this paper. Indeed, for every symmetric matrices Q, R we have the sub-gradient inequality Zmax (R) Zmax (Q) + TrX opt (R - Q), where X opt is any optimal variable for the dual problem (5). Since any feasible X satisfies X 1, we can bound the term TrX opt (Q - R) from below by - Q - R 1, and after exchanging the roles of Q, R, obtain the desired result.\n\n\f\n2.2 The cardinality bound For every k {0, . . . , n}, consider the subset of variables with cardinality k , k := {x {0, 1}n : Card(x) = k }. This defines a partition of {0, 1}n , thus . n kx T Z (Q) = log exp[x Qx]\n=0 k\n\nWe can refine the maximum bound by replacing the terms in the log-partition by their maximum over k , leading to n , k Z (Q) log ck exp[k (Q)]\n=0\n\nwhere, for k {0, . . . , n}, ck = |k |, and k (Q) := max xT Qx.\nxk\n\nComputing k (Q) for arbitrary k {0, . . . , n} is NP-hard. Based on the identity k (Q) =\n(X,x)X+\n\nmax\n\nTrQX : xT x = k , 1T X 1 = k 2 , rankX = 1,\n\n(8)\n\nand using rank relaxation as before, we obtain the bound k (Q) k (Q), where k (Q) = We define the cardinality bound, as Zcard (Q) := log k\n(X,x)X+\n\nmax\n\nTrQX : xT x = k , 1T X 1 = k 2 .\n\n(9)\n\nn ck exp[k (Q)]\n=0\n\n.\n\nThe complexity of computing k (Q) (using interior-point methods) is roughly O(n3 ). The upper bound Zcard (Q) is computed via n semidefinite programs of the form (9). Hence, its complexity is roughly O(n4 ). Problem (9) admits the dual form k (Q) :=\nt,,,\n\nD min t + k + k 2 :\n\n( ) + I + 11T - Q 1T 2\n\n1 2\n\n0 . (10)\n\nt\n\nThe fact that k (Q) max (Q) for every k is obtained upon setting = = 0 in the semi-definite programming problem (10). In fact, we have k (Q) = min k + k 2 + max (Q - I - 11T ).\n,\n\n(11)\n\nThe above expression can be directly obtained from the following, valid for every , : k (Q) = k + k 2 + k (Q - I - 11T ) k + k 2 + max (Q - I - 11T ) k + k 2 + max (Q - I - 11T ). It can be shown (proof which we omit due to space limitation) that, in the case of standard Ising models, that is if Q has the form I + 11T for some scalars , , then the bound k (Q) is exact. Since the values of xT Qx when x ranges k are constant, the cardinality bound is also exact. By construction, Zcard (Q) is guaranteed to be better (lower) than Zmax (Q), since the latter is obtained upon replacing k (Q) by its upper bound (Q) for every k . The cardinality bound thus satisfies Z (Q) Zcard (Q) Zmax (Q) n log 2 + Q 1 . (12)\n\n\f\nUsing the same technique as used in the context of the maximum bound, we can show that the function k is Lipschitzcontinuous, with constant 1 with respect to the l1 -norm. Using the Lipschitz continuity of positively weighted log-sumexp functions (with constant 1 with respect to the l norm), we deduce that Zcard (Q) is also Lipschitz-continuous: for every symmetric matrices Q, R, l - n n k k ck exp[k (Q)] log ck exp[k (R)] |Zcard (Q) - Zcard (R)| og\n=0 0kn =0\n\nmax |k (Q) - k (R)|\n\n Q - R 1, as claimed. 2.3 Quality analysis We now seek to establish conditions on the model parameter Q, which guarantee that the approximation error Zcard (Q) - Z (Q) is small. The analysis relies on the fact that, for standard Ising models, the error is zero. We begin by establishing an upper bound on the difference between maximal and minimal values of xT Qx when x k . We have the bound\nxk\n\nmin xT Qx k (Q) :=\n\n(X,x)X+\n\nmin\n\nTrQX : xT x = k , 1T X 1 = k 2 .\n\nIn the same fashion as for the quantity k (Q), we can express k (Q) as k (Q) = max k + k 2 + min (Q - I - 11T ),\n,\n\nwhere min (Q) :=\n\n(X,x)X+\n\nmin\n\nTrQX . Based on this expression , we have, for every k : =\n,, ,\n\n0 k (Q) - k (Q)\n\nmin\n\nk\n\n( - ) + k 2 ( - ) +\n\nmax (Q - I - 11T ) - min (Q - I - 11T ) min\n,\n\nmax (Q - I - 11T ) - min (Q - I - 11T ) Q - I - 11T 1, = max TrR(X - Y ) TrR(X - Y )\n\n 2 min, 0 max (R) - min (R)\n\nwhere we have used the fact that , for every symmetric matrix R, we have\n(X,x),(Y ,y )X+ X\n 1,\n\nmax\n\nY 1\n\n= 2 R 1. Using again the Lipschitz continuity properties of the weighted log-sum-exp function, we obtain that for every Q, the absolute error between Z (Q) and Zcard (Q) is bounded as follows: n - n k k ck exp[k (Q)] log ck exp[k (Q)] 0 Zcard (Q) - Z (Q) log\n=0 0kn =0\n\nmax (k (Q) - k (Q))\n,\n\n 2Dst (Q), Dst (Q) := min Q - I - 11T 1,\n\n(13)\n\nThus, a measure of quality is Dst (Q), the distance, in l1 -norm, between the model and the class of standard Ising models. Note that this measure is easily computed, in O(n2 log n) time, by first setting to be the median of the values Qij , 1 i < j n, and then setting to be the median of the values Qii - , i = 1, . . . , n. We summarize our findings so far with the following theorem:\n\n\f\nTheorem 1 (Cardinality bound) The cardinality bound is . n k ck exp[k (Q)] Zcard (Q) := log\n=0\n\nwhere k (Q), k = 0, . . . , n, is defined via the semidefinite program (9), which can be solved in O(n3 ). The approximation error is bounded above by twice the distance (in l1 -norm) to the class of standard Ising models: 0 Zcard (Q) - Z (Q) 2 min Q - I - 11T 1.\n,\n\n3 The Pseudo Maximum-Likelihood Problem\n3.1 Tractable formulation Using the bound Zcard (Q) in lieu of Z (Q) in the maximum-likelihood problem (2) leads to a convex restriction of that problem, referred to as the pseudo-maximum likelihood problem. This problem can be cast as n - k 2 min log ck exp[tk + k k + k k ] TrQS\nt,,,Q =0\n\nD\n\ns.t. Q Q,\n\n(k ) + k I + k 11T - Q 1T 2 k\n\n1 2 k tk\n\n0 , k = 0, . . . , n.\n\nThe complexity of this bound is XXX. For numerical reasons, and without loss of generality, it is advisable to scale the ck 's and replace them by k := 2-n ck [0, 1]. 3.2 Dual and interpretation When Q = S n , the dual to the above problem is max n -D(q || ) : S= Y\nk T yk 1T yk\n\nkn\n=0\n\n(Yk ,yk ,qk )k=0\n\nYk , q 0, q T 1 = 1, yk qk 0 , d(Yk ) = yk ,\n\n= k qk , 1T Yk 1 = k 2 qk , k = 0 . . . , n.\n\nwhere is the distribution on {0, . . . , n}, with k = Probu k = 2-n ck , and D(q || ) is the relative entropy (Kullback-Leibler divergence) between the distributions q , : D(q || ) := kn\n=0\n\nqk log\n\nqk . k\n\n- - To interpret this dual, we assume without loss of generality q > 0, and use the variables Xk := qk 1 Yk , xk := qk 1 yk . We obtain the equivalent (non-convex) formulation\n\n(Xk ,xk ,qk )n=0 k\n\nmax\n\n-D(q || ) : S =\n\nkn\n=0\n\nqk Xk , q 0, q T 1 = 1,\n\n(14)\n\n(Xk , xk ) X+ , 1T xk = k , 1T Xk 1 = k 2 , k = 0 . . . , n. The above problem can be obtained as a relaxation to the dual of the exact maximum-likelihood problem (2), which is the maximum entropy problem (3). The relaxation involves two steps: one is to form an outer approximation to the marginal polytope, the other is to find an upper bound on the entropy function (4).\n\n\f\nFirst observe that we can express any distribution on {0, 1}n as p(x) = where qk = Probp k = x\nk\n\nkn\n=0\n\nqk pk (x), q -1 0\nk\n\n(15)\n\np(x), pk (x) =\n\np(x) if x k , otherwise.\n\nNote that the functions pk are valid distributions on {0, 1}n as well as k . To obtain an outer approximation to the marginal polytope, we then write the moment-matching equality constraint in problem (3) as kn S = Ep xxT = qk Xk ,\n=0\n\nwhere Xk 's are the second-order moment matrices with respect to pk : x - Xk = Epk xxT = qk 1 p(x)xxT .\nk\n\nTo relax the constraints in the maximum-entropy problem (3), we simply use the valid constraints Xk d(Xk ) = xk , 1T xk = k , 1T Xk 1 = k 2 , where xk is the mean under pk : x - xk = Epk x = qk 1 p(x)x.\nk\n\nxk xT , k\n\nThis process yields exactly the constraints of the relaxed problem (14). To finalize our relaxation, we now form an upper bound on the entropy function (4). To this end, we use the fact that, since each pk has support in k , its entropy is bounded above by log |k |, as follows: -H (p) = x\n{0,1}n\n\np(x) log p(x) =\n\nkn x\n=0 k\n\np(x) log p(x)\n\n= = \n\nkn x\n=0 k kn =0 kn =0\n\nqk pk (x) log(qk pk (x))\n\nqk (log qk - H (pk )) ( | k | = 2 n k )\n\nqk (log qk - log |k |) qk log qk - n log 2, k\n\nkn\n=0\n\nwhich is, up to a constant, the objective of problem (14). 3.3 Ensuring quality via bounds on Q We consider the (exact) maximum-likelihood problem (2), with Q = {Q = QT :\nQ=QT\n\nQ\n\n1\n\n }: (16)\n\nmin Z (Q) - TrQS :\n\nQ\n\n1\n\n,\n\nand its convex relaxation:\nQ=QT\n\nmin Zcard (Q) - TrQS :\n\nQ\n\n1\n\n.\n\n(17)\n\n\f\nThe feasible sets of problems (16) and (17) are the same, and on it the difference in the objective functions is uniformly bounded by 2 . Thus, any -suboptimal solution of the relaxation (17) is guaranteed to by 3 -suboptimal for the exact problem, (16). In practice, the l1 -norm constraint in (17) encourages sparsity of Q, hence the interpretability of the model. It also has good properties in terms of the generalization error. As seen above, the constraint also implies a better approximation to the exact problem (16). All these benefits come at the expense of goodness-of-fit, as the constraint reduces the expressive power of the model. This is an illustration of the intimate connections between computational and statistical properties of the model. A more accurate bound on the approximation error can be obtained by imposing the following constraint on Q and two new variables , : Q - I - 11T 1 . We can draw similar conclusions as before. Here, the resulting model will not be sparse, in the sense of having many elements in Q equal to zero. However, it will still be quite interpretable, as the bound above will encourage the number of off-diagonal elements in Q that differ from their median, to be small. A yet more accurate control on the approximation error can be induced by the constraints k (Q) + k (Q) for every k , each of which can be expressed as an LMI constraint. The corresponding constrained relaxation to the maximum-likelihood problem has the form min log d s.t. k n ck exp[t+ k\n=0 1+ 2 k t+ k 1- k 2 t- k\n\n- + k + k + k 2 + ] k TrQS 0 , k = 0, . . . , n, 0 , k = 0 , . . . , n,\n\nt, , ,Q\n\n+ iag(k ) + + I + + 11T - Q k k 1+ 2 k Q - - diag(k ) - - I - - 11T k k 1- 2 k\n\nt + - t - , k = 0 , . . . , n. k k Using this model instead of ones we saw previously, we sacrifice less on the front of the approximation to the true likelihood, at the expense of increased computational effort.\n\n4 Links with the Log-Determinant Bound\n4.1 The log-determinant bounds The bound in Wainwright and Jordan [2] is based on an upper bound on the (differential) entropy of a continuous random variable, which is attained for a Gaussian distribution. It has the form Z (Q) Zld (Q), with Zld (Q) := n + max TrQX + 1 1 log det(X - xxT + I ) 2 12 (18)\n\n(X,x)X+\n\nwhere := (1/2) log(2 e) 1.42. Wainwright and Jordan suggest to further relax this bound to one which is easier to compute: Zld (Q) Zrld (Q) := n + max TrQX + 1 1 log det(X - xxT + I ). 2 12 (19)\n\n(X,x)X\n\nLike Z and the bounds examined previously, the bound Zld and Zrld are Lipschitz-continuous, with constant 1 with respect to the l1 norm. The proof starts with the representations above, and exploits the fact that Q 1 is an upper bound on TrQX when (X, x) X+ .\n\n\f\nThe dual of the log-determinant bound has the form (see appendix (??)) Zld (Q) = n 1 log - log 2+ 2s 2 1 1 min t + Tr(D( ) - Q - F ) - log det t,,F ,g ,h 12 2 F 0 g .t. . gh D\n\n( ) - Q - F - 1 T - gT 2\n\n-1 - g 2 t-h (20) .\n\nThe relaxed counterpart Zrld (Q) is obtained upon setting F, g , h to zero in the dual above: D n 1 1 1 ( ) - Q - 1 2 Zrld (Q) = log - log 2 + min t + Tr(D( ) - Q) - log det - 1 T t t, 2 2 12 2 2 Using Schur complements to eliminate the variable t, we further obtain Zrld (Q) = n 1 log + + 2 2 1T 1 1 min (D( ) - Q)-1 + Tr(D( ) - Q) - log det(D( ) - Q). 4 12 2\n\n(21)\n\n4.2 Comparison with the maximum bound We first note the similarity in structure between the dual problem (5) defining Zmax (Q) and that of the relaxed logdeterminant bound. Despite these connections, the log-determinant bound is neither better nor worse than the cardinality or maximum bounds. Actually, for some special choices of Q (e.g. when Q is diagonal), the cardinality bound is exact, while the log-determinant one is not. Conversely, one can choose Q so that Zcard (Q) > Zld (Q), so no bound dominates the other. The same can be said for Zmax (Q) (see section 4.4 for numerical examples). However, when we impose an extra condition on Q, namely a bound on its l1 norm, more can be said. The analysis is based on the case Q = 0, and exploits the Lipschitz continuity of the bounds with respect to the l1 -norm. First notice (although not shown in this paper because of space limitation) that, for Q = 0, the relaxed log-determinant bound writes n 2 e 1 Zrld (0) = log + 2 3 2 n e 1 = Zmax (0) + log +. 2 6 2 Now invoke the Lipschitz continuity properties of the bounds Zrld (Q) and Zmax (Q), and obtain that Zrld (Q) - Zmax (Q) = (Zrld (Q) - Zrld (0)) + (Zrld (0) - Zmax (0)) + (Zmax (0) - Zmax (Q)) -2 Q 1 + (Zrld (0) - Zmax (0)) n e 1 = -2 Q 1 + + log +. 2 6 2 This proves that if Q 1 n log e + 1 , then the relaxed log-determinant bound Zrld (Q) is worse (larger) than the 4 6 4 maximum bound Zmax (Q). We can strengthen the above condition to Q 1 0.08n. 4.3 Summary of comparison results To summarize our findings: Theorem 2 (Comparison) We have for every Q: Z (Q) Zcard (Q) Zmax (Q) n log 2 + Q 1 . In addition, we have Zmax (Q) Zrld (Q) whenever Q\n1\n\n 0.08n.\n\n\f\n4.4 A numerical experiment We now illustrate our findings on the comparison between the log-determinant bounds and the cardinality and maximum bounds. We set the size of our model to be n = 20, and for a range of values of a parameter , generate N = 10 random instances of Q with Q 1 = . Figure ?? shows the average values of the bounds, as well as the associated error bars. Clearly, the new bound outperforms the log-determinant bounds for a wide range of values of . Our predicted threshold value of Q 1 for which the new bound becomes worse, namely = 0.08n 1.6 is seen to be very conservative, with respect to the observed threshold of 30. On the other hand, we observe that for large values of Q 1 , the log-determinant bounds do behave better. Across the range of , we note that the log-determinant bound is indistinguishable from its relaxed counterpart.\n\n5 Conclusion and Remarks\nWe have introduced a new upper bound (the cardinality bound) for the log-partition function corresponding to secondorder Ising models for binary distribution. We have shown that such a bound can be computed via convex optimization, and, when compared to the log-determinant bound introduced by Wainwright and Jordan (2006), the cardinality bound performs better when the l1 -norm of the model parameter vector is small enough. Although not shown in the paper, the cardinality bound becomes exact in the case of standard Ising model, while the maximum bound (for example) is not exact for such model. As was shown in section 2, the cardinality bound was computed by defining a partition of {0, 1}. This idea can be generalized to form a class of bounds which we call partition bounds. It turns out that partitions bound are closely linked to the more general class bounds that are based on worst-case probability analysis. We acknowledge the importance of applying our bound to real-word data. We hope to include such results in subsequent versions of this paper.\n\nReferences\n[1] P. Ravikumar and J. Lafferty. Variational Chernoff bounds for graphical models. In Proc. Advances in Neural Information Processing Systems (NIPS), December 2007. [2] Martin J. Wainwright and Michael I. Jordan. Log-determinant relaxation for approximate inference in discrete Markov random fields. IEEE Trans. Signal Processing, 2006.\n\n\f\n", "award": [], "sourceid": 3422, "authors": [{"given_name": "Laurent", "family_name": "Ghaoui", "institution": null}, {"given_name": "Assane", "family_name": "Gueye", "institution": null}]}