{"title": "A Convergence Proof for the Softassign Quadratic Assignment Algorithm", "book": "Advances in Neural Information Processing Systems", "page_first": 620, "page_last": 626, "abstract": null, "full_text": "A Convergence Proof for the Softassign \n\nQuadratic Assignment Algorithm \n\nAnand Rangarajan \n\nDepartment of Diagnostic Radiology \nYale University School of Medicine \n\nNew Haven, CT 06520-8042 \n\nAlan Yuille \n\nSmith-Kettlewell Eye Institute \n\n2232 Webster Street \n\nSan Francisco, CA 94115 \n\ne-mail: anand IE \nQai +- 'Ebj Cai;bjMbj + ')'Mai \n\nBegin Softassign: \nMai +- exp ((3Qai) \n\nBegin C: Sinkhorn. Do C until all Mai converge \nUpdate Mai by normalizing the rows: \nMai +- 2:Mt- . \n'u . \nUpdate Mai by normalizing the columns: \nMai +- 2::J:ta, \nEnd C \n\n. \n\nEnd Soft assign \n\nEnd B \n\nEnd A \n\n\f622 \n\nA. Rangarajan, A. Yuille, S. Gold and E. Mjolsness \n\nThe softassign is used for constraint satisfaction. The softassign is based on \nSinkhorn's theorem [4] but can be independently derived as coordinate ascent on the \nLagrange parameters 11 and 1/. Sinkhorn's theorem ensures that we obtain a doubly \nstochastic matrix by the simple process of alternating row and column normaliza(cid:173)\ntions. The QAP algorithm above was developed using the graduated assignment \nheuristic [1] with no proof of convergence until now. \n\nWe simplify the objective function in (1) by collecting together all terms quadratic \nin M ai . This is achieved by defining \n\n(2) \n\nThen we use an algebraic transformation [3] to transform the quadratic form into \na more manageable linear form: \n\n(3) \n\n1) \n- - --t min -X(J + _(J2 \n\n( \n\n2 \n\nX2 \n2 \n\nu \n\nApplication of the algebraic transformation (in a vectorized form) to the quadratic \nterm in (1) yields: \n\nEqap(M, (J, 11, 1/) = - L \n\naibj \n\nciI;~jMai(Jbj + ~ L Cinj(Jai(Jbj \n\naibj \n\n+ L l1a(~ Mai - 1) + ~ l/i(L Mai - 1) + fJ L Mai log Mai \n\n1 \n\na \n\nt \n\nt \n\na \n\nat \n\nExtremizing (4) w.r.t. (J, we get \n\n' \" c(-y) Mb' - ' \" c(-y) (Jb ' ->.. (J \n~ ai;bj \nbj \n\nJ - ~ ai;bj \n\nJ --\n\nbj \n\n. - M . \nat -\nat \n\n(4) \n\n(5) \n\nis a minimum, provided certain conditions hold which we specify below. \n\nIn the first part of the proof, we show that setting (Jai = Mai is guaranteed to \ndecrease the energy function. Restated, we require that \n\n(Jai = \n\nM \n\nai - argmin \n\n. \n\n-\n\n( \n\n- ~ ai;bj \n\n'\" C(-y) M \naibj \n\naiCJbj + 2 ~ ai;bj(Jai(Jbj \n) \n\n1 '\" C(-y) \n\n(6) \n\naibj \n\nIf C~l~j is positive definite in the subspace spanned by M, then (Jai = Mai is a \nminimum of the energy function - :Eaibj C~I~jMaiCJbj + ! :Eaibj Cil~j(JaiCJbj . \nAt this juncture, we make a crucial assumption that considerably Simplifies the \nproof. Since this assumption is central, we formally state it here: \"M is always \nconstrained to be a doubly stochastic matrix.\" In other words, for our proof of con(cid:173)\nvergence, we require the soft assign algorithm to return a doubly stochastic matrix \n(as Sinkhorn's theorem guarantees that it will) instead of a matrix which is merely \nclose to being doubly stochastic (based on some reasonable metric). We also require \nthe variable (J to be a doubly stochastic matrix. \nSince M is always constrained to be a doubly stochastic matrix, cilL is required \nto be positive definite in the linear subspace of rows and columns of M summing to \n\n\fA Convergence Proof for the Softassign Quadratic Assignment Algorithm \n\n623 \n\none. The value of \"f should be set high enough such that ciJ;~j does not have any \nnegative eigenvalues in the subspace spanned by the row and column constraints. \nThis is the same requirement imposed in [5] to ensure that we obtain a permutation \nmatrix at zero temperature. \n\nTo derive a more explicit criterion for \"f, we first define a matrix r in the following \nmanner: \n\ndef \n\n1 , \nr == IN - -ee \nN \n\n(7) \n\nwhere IN is the N x N identity matrix, e is the vector of all ones and the \"prime\" \nindicates a transpose operation. The matrix r has the property that any vector \nrs with s arbitrary will sum to zero. We would like to extend such a property to \ncover matrices whose row and column sums stay fixed. To achieve this, take the \nKronecker product of r with itself: \n\nR def \n\n10. \n\n=r'6Jr \n\n(8) \n\nR has the property that it will annihilate all row and column sums. Form a vector \nm by concatenating all the columns of the matrix M together into a single column \n[m = vec(M)]. Then the vector Rm has the equivalent property of the \"rows\" and \n\"columns\" summing to zero. Hence the matrix RC(-Y) R (where C(\"'() is the matrix \nequivalent of ciJ;~j) satisfies the criterion of annihilated row and column sums in \nany quadratic form; m'RC(-Y)Rm = (Rm)'C(\"'()(Rm). \nThe parameter \"f is chosen such that all eigenvalues of RC(-Y) R are positive: \n\n\"f = - min >'(RCR) + \u20ac \n\nA \n\n(9) \n\nwhere \u20ac > 0 is a small quantity. Note that C is the original QAP benefit matrix \nwhereas C(\"'() is the augmented matrix of (2). We cannot always efficiently compute \nthe largest negative eigenvalue of the matrix RCR. Since the original Cai;bj is \nfour dimensional, the dimensions of RC Rare N 2 x N 2 where N is the number of \nelements in one set. Fortunately, as we show later, for specific problems it's possible \nto break up RC R into its constituents thereby making the calculation of the largest \nnegative eigenvalue of RCR more efficient. We return to this point in Section 3. \n\nThe second part of the proof involves demonstrating that the softassign operation \nalso decreases the objective in (4). (Note that the two Lagrange parameters J-t and \n1/ are specified by the softassign algorithm [4]). \n\nM = Softassign(Q,,B) where Qai = 2: Ci7;~jabj \n\nbj \n\n(10) \n\nRecall that the step immediately preceding the softassign operation sets a ai \nMai. We are therefore justified in referring to aai as the \"old\" value of M ai . For \nconvergence, we have to show that Eqap(a, a) 2: Eqap(M, a) in (4). \nMinimizing (4) w.r.t. M ai , we get \n\n(11) \n\n\f624 \n\nA. Rangarajan, A. Yuille, S. Gold and E. Mjolsness \n\nFrom (11), we see that \n\n~ 2;: Mai log Mai = ~ C~IitjMai(Jbj - 2: /la 2: Mai - 2: IIi 2: Mai - ~ 2: Mai \n\nat \n\natbJ \n\na \n\nt \n\nt \n\na \n\nat \n\nand \n\n(12) \n\n~ 2;: (Jai log Mai = ~ C~J;tj(JaWbj - 2: /la ~ (Jai - ~ IIi 2: (Jai - ~ ~ (Jai (13) \n\nat \n\natbJ \n\na \n\nt \n\nt \n\na \n\nat \n\nFrom (12) and (13), we get (after some algebraic manipulations) \n\nEqap((J,(J) - Eqap(M,(J) = \n\n- 2: C~Iitj(JaWbj -\n\naibj \n\n1 \n\n1 \n\n+ ~ ~ (Jai log (Jai - ~ ~ Mai log Mai \n\nat \n\nat \n\nby the non-negativity of the Kullback-Leibler measure. We have shown that the \nchange in energy after (J has been initialized with the \"old\" value of M is non(cid:173)\nnegative. We require that (J and M are always doubly stochastic via the action of the \nsoftassign operation. Consequently, the terms involving the Lagrange parameters \n/l and II can be eliminated from the energy function (4). Setting (J = M followed \nby the softassign operation decreases the objective in (4) after excluding the terms \ninvolving the Lagrange parameters. \n\nWe summarize the essence of the proof to bring out the salient points. At each \ntemperature, the quadratic assignment algorithm executes the following steps until \nconvergence is established. \n\nStep 1: (Jai +- M ai . \n\nStep 2: \n\nStep 2a: Qai +- L:bj C~Iiti(Jbj. \nStep 2b: M +- Softassign(Q,,8). \n\nReturn to Step 1 until convergence. \n\nOur proof is based on demonstrating that an appropriately designed energy function \ndecreases in both Step 1 and Step 2 (at fixed temperature). This energy function \nis Equation (4) after excluding the Lagrange parameter terms. \n\nStep 1: Energy decreases due to the positive definiteness of C~Iitj in the \nlinear subspace spanned by the row and column constraints. 1 has to be \nset high enough for this statement to be true. \n\nStep 2: Energy decreases due to the non-negativity of the Kullback-Leibler \nmeasure and due to the restriction that M (and (J) are doubly stochastic. \n\n\fA Convergence Prooffor the Softassign Quadratic Assignment Algorithm \n\n625 \n\n3 Applications \n\n3.1 Quadratic Assignment \n\nThe QAP benefit matrix is chosen such that the softassign algorithm will not con(cid:173)\nverge without adding the, term in (1). To achieve this, we randomly picked a unit \nvector v of dimension N2. The benefit matrix C is set to -vv'. Since C has only one \nnegative eigenvalue, the softassign algorithm cannot possibly converge. We ran the \nsoftassign algorithm with f30 = 1, (3r = 0.9 and, = O. The energy difference plot \non the left in Figure 1 shows the energy never decreasing with increasing iteration \nnumber. Next, we followed the recipe for setting, exactly as in Section 2. After \nprojecting C into the subspace of the row and column constraints, we calculated \nthe largest negative eigenvalue of the matrix RCR which turned out to be -0.8152. \nWe set, to 0.8162 (\u20ac = 0.001) and reran the softassign algorithm. The energy \ndifference plot shows (Figure 1) that the energy never increases. We have shown \nthat a proper choice of , leads to a convergent algorithm. \n\ng-O.5 \n~ -1 \n:s:: \nU \n~-1 .5 \n2! \n\n-2 \n\nQl \n\no \n\n-2.5'-----~--~--......J \n30 \n\n20 \n\n10 \n\niterations \n\n0.8 \n\ngO.6 \n!!! \nQl ;g 0.4 \n\n>-\n~ \nQl Iii 0.2 \n\n0 \n0 \n\n20 \n\n40 \n\niterations \n\n60 \n\n80 \n\nFigure 1: Energy difference plot. Left: , = 0 and Right: , = 0.8162. While \nthe change in energy is always negative when, = 0, it is always non-negative when \n, = 0.8162. The negative energy difference (on the left) implies that the energy \nfunction increases whereas the non-negative energy difference (on the right) implies \nthat the energy function never increases. \n\n3.2 TSP \n\nThe TSP objective function is written as follows: Given N cities, \n\nEtsp(M) = L dijMaiM(aEIH)j = trace(DM'T M) \n\naij \n\n(14) \n\nwhere the symbol EB is used to indicate that the summation in (14) is taken modulo \nN, dij (D) is the inter-city distance matrix and M is the desired permutation \nmatrix. T is a matrix whose (i,j)th entry is 6(i$1)j (6ij is the Kronecker delta \nfunction). Equation (14) is transformed into the m'Cm form: \n\nEtsp(m) = trace(m'(D (9 T)m) \n\n(15) \n\nwhere m = vec(M). We identify our general matrix C with -2D (9 T. \n\n\f626 \n\nA. Rangarajan, A. Yuille, S. Gold and E. Mjolsness \n\nFor convergence, we require the largest eigenvalue of \n\n-RCR = 2(r 0 r)(D 0 T)(r 0 r) = 2(rDr) 0 (rTr) = 2(rDr) 0 (rT) \n\n(16) \nThe eigenvalues of rT are bounded by unity. The eigenvalues of r Dr will depend \non the form of D. Even in Euclidean TSP the values will depend on whether the \nEuclidean distance or the distance squared between the cities is used. \n\n3.3 Graph Matching \n\nThe graph matching objective function is written as follows: Given Nl and N2 node \ngraphs with adjacency matrices G and 9 respectively, \n\nEgm(M) = -2 L Caii bjMai Mbj \n\n1 \n\naibj \n\n(17) \n\nwhere Caiibj = 1 - 31Gab - gijl is the compatibility matrix [1]. The matching \nconstraints are somewhat different from TSP due to the presence of slack variables \n[1]. This makes no difference however to our projection operators. We add an extra \nrow and column of zeros to 9 and G in order to handle the slack variable case. Now \nGis (Nl + 1) X (Nl + 1) and 9 is (N2 + 1) X (N2 + 1). Equation (17) can be readily \ntransformed into the m'Cm form. Our projection apparatus remains unchanged. \nFor convergence, we require the largest negative eigenvalue of RC R. \n\n4 Conclusion \n\nWe have derived a convergence proof for the softassign quadratic assignment al(cid:173)\ngorithm and specialized to the cases of TSP and graph matching. An extension \nto graph partitioning follows along the same lines as graph matching. Central to \nour proof is the requirement that the QAP matrix M is always doubly stochas(cid:173)\ntic. As a by-product, the convergence proof yields a criterion by which the free \nself-amplification parameter I is set. We believe that the combination of good the(cid:173)\noretical properties and experimental success of the softassign algorithm make it the \ntechnique of choice for quadratic assignment neural optimization. \n\nReferences \n\n[1] S. Gold and A. Rangarajan. A graduated assignment algorithm for graph \nmatching. IEEE Transactions on Pattern Analysis and Machine Intelligence, \n18(4):377~388, 1996. \n\n[2] S. Gold and A. Rangarajan. Softassign versus softmax: Benchmarks in combi(cid:173)\n\nnatorial optimization. In Advances in Neuml Information Processing Systems \n8, pages 626-632. MIT Press, 1996. \n\n[3] E. Mjolsness and C. Garrett. Algebraic transformations of objective functions. \n\nNeuml Networks, 3:651-669, 1990. \n\n[4] A. Rangarajan, S. Gold, and E. Mjolsness. A novel optimizing network archi(cid:173)\n\ntecture with applications. Neuml Computation, 8(5):1041~1060, 1996. \n\n[5] A. L. Yuille and J. J. Kosowsky. Statistical physics algorithms that converge. \n\nNeuml Computation, 6(3):341~356, May 1994. \n\n\f", "award": [], "sourceid": 1214, "authors": [{"given_name": "Anand", "family_name": "Rangarajan", "institution": null}, {"given_name": "Alan", "family_name": "Yuille", "institution": null}, {"given_name": "Steven", "family_name": "Gold", "institution": null}, {"given_name": "Eric", "family_name": "Mjolsness", "institution": null}]}