{"title": "Generalizing Graph Matching beyond Quadratic Assignment Model", "book": "Advances in Neural Information Processing Systems", "page_first": 853, "page_last": 863, "abstract": "Graph matching has received persistent attention over decades, which can be formulated as a quadratic assignment problem (QAP). We show that a large family of functions, which we define as Separable Functions, can approximate discrete graph matching in the continuous domain asymptotically by varying the approximation controlling parameters. We also study the properties of global optimality and devise convex/concave-preserving extensions to the widely used Lawler's QAP form. Our theoretical findings show the potential for deriving new algorithms and techniques for graph matching. We deliver solvers based on two specific instances of Separable Functions, and the state-of-the-art performance of our method is verified on popular benchmarks.", "full_text": "Generalizing graph matching beyond quadratic\n\nassignment model\n\nTianshu Yu\n\nArizona State University\n\ntianshuy@asu.edu\n\nJunchi Yan\n\nShanghai Jiao Tong University\n\nyanjunchi@sjtu.edu.cn\n\nYilin Wang\n\nArizona State University\nyilwang@adobe.com\n\nWei Liu\n\nTecent AI Lab\n\nwl2223@columbia.edu\n\nBaoxin Li\n\nArizona State University\nbaoxin.li@asu.edu\n\nAbstract\n\nGraph matching has received persistent attention over several decades, which can\nbe formulated as a quadratic assignment problem (QAP). We show that a large\nfamily of functions, which we de\ufb01ne as Separable Functions, can approximate\ndiscrete graph matching in the continuous domain asymptotically by varying the\napproximation controlling parameters. We also study the properties of global\noptimality and devise convex/concave-preserving extensions to the widely used\nLawler\u2019s QAP form. Our theoretical \ufb01ndings show the potential for deriving new\nalgorithms and techniques for graph matching. We deliver solvers based on two\nspeci\ufb01c instances of Separable Functions, and the state-of-the-art performance of\nour method is veri\ufb01ed on popular benchmarks.\n\n1\n\nIntroduction\n\nGiven two graphs, graph matching algorithms (GM) seek to \ufb01nd node-to-node correspondences by\noptimizing a pre-de\ufb01ned af\ufb01nity score function. This problem falls into the category of quadratic\nassignment problem (QAP) [1], and has wide applications from object categorization [2] to protein\nalignment [3]. While a line of works using combinatorial heuristics [4, 5] attempt to solve graph\nmatching, relaxation of original problem into the continuous domain is mostly employed and solved\nwith different optimization techniques e.g. gradient [6] or multiplication [7, 8] based methods. The\ndominance of continuous relaxation may be partly because it is easier to analyze the local behavior of\ncontinuous functions, and one can often \ufb01nd a local optimum. In this paper, we focus on continuous\nrelaxation of graph matching.\nGraph matching seeks the solution to the quadratic assignment problem maxX vec(X)(cid:62)Avec(X),\nwhere vec(X) \u2208 {0, 1}n2 is the column-wise vectorized version of the binary (partial) assignment\nmatrix X \u2208 {0, 1}n\u00d7n and the so-called af\ufb01nity matrix A \u2208 (cid:60)n2\u00d7n2 in the real domain consists of\nthe af\ufb01nity score measuring how one edge in one graph is similar to another from the other graph.\nTraditionally, the common practice is relaxing vec(X) into the continuous real domain vec(X) \u2208 (cid:60)n2\n[9, 7, 10].\nIn this paper, we show that a large family of functions, de\ufb01ned as Separable Functions, can asymptoti-\ncally approximate the discrete matching problem by varying the approximation controlling parameters.\nWith this function family, there exist in\ufb01nite modelings of graph matching problem, thereby providing\nthe feasibility of adapting different practical problems with different models. This provides a new\nperspective of considering graph matching. We also give analysis on the conditions based on which\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fthese approximations have good properties. Novel solvers on instances of Separable Functions are\nproposed based on the path-following and multiplicative techniques respectively.\nNotations We use bold lower-case x and upper-case A to represent vector and matrix, respectively.\nFunction vec(\u00b7) transforms a matrix to its column-wise vectorized replica. Conversely, function\nmat(\u00b7) transfers a vector back to its matrix form. Denote (cid:60)+, S as non-negative real numbers and\nsymmetric matrices respectively. Function K = diag(k) transforms a vector k into a diagonal matrix\nK such that Kij = ki if i = j, and Kij = 0 otherwise.\n\n2 Related Work\n\nDifferent from linear assignment [11], the quadratic assignment problem (QAP) in terms of graph\nmatching in the literature is often formulated in two forms: i) the Koopmans-Beckmann\u2019s QAP [12]:\ntr(X(cid:62)AiXAj)+tr(A(cid:62)\np X) where X is the assignment matrix, Ai and Aj are the weighted adjacency\nmatrices, and Ap is the node-to-node similarity matrix. Methods based on this formula include\n[13, 14, 15], to name a few; ii) the more general Lawler\u2019s QAP [16] by vec(X)(cid:62)Avec(X). Note\nthat the Koopmans-Beckmann\u2019s QAP can always be represented as a special case of the Lawler\u2019s by\nsetting A = Aj \u2297 Ai, and many previous works [9, 10, 6, 17] adopt the Lawler\u2019s form, which is also\nthe main focus of this paper for its generality. A recent survey [18] provides a more comprehensive\nliterature review.\nThough there are a few (quasi-)discrete methods [5, 19, 20] that directly work in the binary domain,\nthe major line of research falls into the following tracks in the continuous domain. Our relaxation\ntechniques do not fall into any of these categories and opens up new possibility for new algorithms.\nSpectral relaxation: The authors of the seminal work in [7] proposed to relax X to be of unit length\n(cid:107)vec(X)(cid:107)2\n2 = 1, and the resulting optimization problem can be ef\ufb01ciently solved by computing the\nleading eigen-vector of the af\ufb01nity matrix A. Better approximation has been made in [21] by adding\nan af\ufb01ne constraint. In contrast to the above Lawler\u2019s QAP based models, there are a few earlier\nmethods [13, 22] based on the Koopmans-Beckmann\u2019s QAP, and the relxation is often ful\ufb01lled by\nsetting X(cid:62)X = I, where I is the identify matrix. In general, spectral relaxation is ef\ufb01cient while not\ntight, which hinders the matching accuracy.\nSemi-de\ufb01nite programming relaxation: SDP has been a standard tool for combinatorial problem,\nand it has been adopted to tackle the graph matching problem. In existing work, a variable Y subject\nto Y = vec(X)vec(X)(cid:62) is introduced. As a result, the raw problem is approximated by SDP which\nrelaxes the non-convex constraint on Y into a semi-de\ufb01nite one: Y (cid:23) vec(X)vec(X)(cid:62). The \ufb01nal\nmatching X is then recovered by different heuristics such as winner-take-all [23] or randomization\n[24]. However, the SDP solver is not very popular in graph matching, as the variable Y squares the\nraw variable size, resulting in high complexity.\nDoubly-stochastic relaxation: Based on the fact that doubly-stochastic matrix is the convex hull of\nthe permutation matrix, various methods have been proposed in this line to formulate the relaxed\nproblem into a non-convex quadratic programming problem for both Koopmans-Beckmann\u2019s and\nLawler\u2019s QAP. Linear programming is adopted in [25] to approximate the quadratic problem, followed\nby more complex path following methods [14, 15] to approximate the relaxed quadratic problem \u2013\nall are based on the Koopmans-Beckmann\u2019s QAP. For the more general Lawler\u2019s QAP, the seminal\nwork termed graduated assignment [9] approximates the relaxed QAP via solving a series of linear\napproximations via iterative Taylor expansions. A random walk perspective to the graph matching\nproblem is adopted in [10], whereby the method can also be seen as a weighting solution by [9] and\nthe multiplication method [7]. More recently factorized graph matching is devised in [17], which\nalso follows the doubly-stochastic relaxation on top of other relaxations on the objective function.\nFinally we also brie\ufb02y review recent advances on hyper and multiple graph matching. There are\nstudies addressing the more general hypergraph matching problem, whereby the third-order or higher\nis considered in the objective and usually an af\ufb01nity tensor is adopted. Many current hypergraph\nmatching methods approximate the third-order objective via iterative approximations.\nIn each\niteration, often a Lawler\u2019s QAP is involved [26, 27]. Moreover, the Lawler\u2019s QAP model solvers can\nalso be used in matching a batch of graphs beyond two graphs. In [28], an alternating optimization\nmethod was proposed, whereby in each iteration a Lawler\u2019s QAP problem is derived and solved.\n[29] further extends multi-graph matching problem to an online version. These connections further\n\n2\n\n\fhighlight the importance of the Lawler\u2019s QAP for not only traditional graph matching, but also\nhypergraph matching and multiple graph matching.\n\n3 Generalizing the QAP for Graph Matching\n\nWe re-visit the graph matching problem in this section. We propose an equivalent model to the\ndiscrete one over continuous domain [0, 1], provided the relaxation gap is 0. This gives rise to the\npossibility to relax graph matching with much tighter ways. Mathematically, graph matching can be\nformulated as the following quadratic assignment problem which is also called Lawler\u2019s QAP1 [16]:\n\nvec(X)(cid:62)Avec(X)\n\nmax\ns.t. X1 = 1, X(cid:62)1 = 1, xia \u2208 {0, 1}\n\nX\n\n(1)\n\n+\n\nwhere A \u2208 (cid:60)n2\u00d7n2\nis a non-negative af\ufb01nity matrix, which encodes node similarities on diagonal\nelements and edge similarities on the rest. Note xia denotes the element of X indexed by row i and\ncolumn a indicating the matching status of node i to node a from the other graph. If we break down\nproblem (1) into element-wise product, it becomes:\n\n(cid:88)\n\nmax\n\nx\n\ni,j,a,b\n\nAij:abxiaxjb\n\ns.t. Hx = 1, x \u2208 {0, 1}n2\n\nwhere Aij:ab corresponds to the edge similarity between edge (i, j) \u2208 G1 and (a, b) \u2208 G2. Here\nH \u2208 {0, 1}2n\u00d7n2 is a selection matrix over the elements of x suf\ufb01cing assignment constraints\naccording to (1).\nIn particular, we relax x into continuous domain and let fprod(xia, xjb) = xiaxjb:\n\n(2)\n\n(3)\n\n(4)\n\nWe generalize problem (3) by replacing fprod with f\u03b4:\n\n(cid:88)\n\nmax\n\nx\n\ni,j,a,b\n\n(cid:88)\n\nmax\n\nx\n\ni,j,a,b\n\nAij:abfprod(xia, xjb)\n\ns.t. Hx = 1, x \u2208 [0, 1]n2\n\nAij:abf\u03b4(xia, xjb)\n\ns.t. Hx = 1, x \u2208 [0, 1]n2\n\nwhere f\u03b4 is a 2D quasi-delta function in the continuous domain (f\u03b4(x, y) = 1 if x = 1 and y = 1,\nand f\u03b4 = 0 otherwise). We have the following theorem that establishes the connection between (2)\nand (4):\nTheorem 1 The optimal objective p\u2217 to problem (2) is equal to the optimal objective p\u2217\n(4).\n\n\u03b4 to problem\n\nRemark Based on Theorem 1, one can devise a sampling procedure to \ufb01nd the optimal solution to\nproblem (2) from the solution to problem (4): Given optimal x\u2217\n\u03b4 to problem (4), if all the elements\nare in the set {0, 1}, then x\u2217\n\u03b4 is automatically optimal to problem (2). If not, we \ufb01rst remove all 1\nelements and corresponding columns and rows, yielding a subvector (submatrix) x\u2020 with all elements\nin range [0, 1). Then any sampling subject to one-to-one constraint on x\u2020, together with the removed\ndiscrete values, forms the optimal solution to problem (2).\nFor the time being, a discrete assignment problem (2) is relaxed into (4) with continuous feasible\ndomain. However, function f\u03b4 is not continuous as there is a jump at value (1, 1), ending up with\nmuch dif\ufb01culty to solve (recall (4) is equivalent to (2)). In the next section, we will show some\napproximate techniques to tackle problem (4).\n\n1Here the number of nodes in two graphs are assumed the same. In case m (cid:54)= n one can add dummy nodes\n\nas a standard technique as in literature [10, 17].\n\n3\n\n\f(a) hLap\n\n(b) hGauss\n\n(c) hPoly\n\nFigure 1: Three examples of approximations (Laplacian, Gaussian, Polynomial) to function f\u03b4 with\nvarying \u03b8. The closer for \u03b8 \u2192 0 (from red to green), the better approximation to f\u03b4.\n\n4 Separable Functions\n\n4.1 Separable Approximation Function Family\n\nIt is important to \ufb01nd an appropriate approximate function for f\u03b4, otherwise it may lead to intractable\nmodels to solve. To avoid high computational cost, we narrow our focus on a speci\ufb01c family of\nfunctions, called Separable Functions.\n\nDe\ufb01nition 1 A function f\u03b8(x, y) is called Separable Function (SF)2 if it satis\ufb01es the following\nproperties:\n1. f\u03b8(x, y) = h\u03b8(x) \u00d7 h\u03b8(y) where h\u03b8 is de\ufb01ned on [0, 1].\n2. h\u03b8(0) = 0 and h\u03b8(1) = 1. h\u03b8 \u2208 C1.\n3. h\u03b8 is non-decreasing and lim\u03b8\u21920 h\u03b8(x) \u2212 h\u03b4(x) = 0 for any x \u2208 [0, 1], where h\u03b4 is de\ufb01ned on\n[0, 1], h\u03b4(x) = 1 if x = 1 and h\u03b4(x) = 0 otherwise.\n\nWe also call such a function h\u03b8 univariate SF, where \u03b8 is a controlling parameter. Being seemingly\na simple formulation, SF has three \ufb01ne properties for computation.\nFirstly, SF shows similar behavior as a probabilistic distribution on two independent variables. That\nis, if two nodes are impossible to match, then any pair of edges containing the two nodes will never\nmatch neither. Mathematically, assuming the matching score of node pair (cid:104)i, a(cid:105) is h\u03b8(xia), we have\nf\u03b8(xia, xjb) = 0 for any (cid:104)j, b(cid:105) if h\u03b8(xia) = 0.\nSecondly, the de\ufb01nition of SF eases gradient computing. For a given SF f\u03b8(x, y) = h\u03b8(x)h\u03b8(y), the\napproximate version of problem (4) can be expressed in matrix form as:\n\nh(cid:62)\n\u03b8 Ah\u03b8\n\nx\n\nmax\ns.t. Hx = 1, x \u2208 [0, 1]n2\n\n(5)\n\nwhere h\u03b8 = [h\u03b8(x1), ..., h\u03b8(xn2 )](cid:62). The gradient of objective (5) with respect to x is \u2207x = 2GAh,\nwhere G is a diagonal matrix with the ith element \u2202h\u03b8(xi)/\u2202xi.\nThe third advantage of SF is that we can construct new approximation functions via reweighted\nsummation and multiplication of existing ones, e.g. if h1 and h2 are two univariate SFs, it can be\ntrivially veri\ufb01ed that \u03b1h1 + (1 \u2212 \u03b1)h2 for 0 \u2264 \u03b1 \u2264 1 and h1 \u00d7 h2 are also univariate SFs.\nIf we keep the constraints on x intact as in problem (5), and let p\u2217\nh\u03b8(x) = [h\u03b8(x1), ..., h\u03b8(xn)](cid:62), we have the following theorem:\nTheorem 2 lim\u03b8\u21920 p\u2217\n\n\u03b8 = maxx h\u03b8(x)(cid:62)Ah\u03b8(x), where\n\n\u03b8 = p\u2217\n\n\u03b4\n\nSee supplementary material for proof details. The above theorem guarantees that, if we approximate\nthe quasi-delta function by letting \u03b8 \u2192 0, problem (4) can also be approximated asymptotically. As\nh\u03b8 \u2208 C1, gradient-base algorithms can be applied to such approximations.\n\n2In fact separable function has its traditional meanings in mathematics, we re-de\ufb01ne it in the graph matching\n\ncontext.\n\n4\n\n\f4.2 Approximations to Function f\u03b4\n\nThough we have proved that using f\u03b4 can derive an equivalent problem i.e. (4), \ufb01nding its optimal\nsolution is still notoriously dif\ufb01cult. Instead of solving (4) directly, based on the analysis in Sec.\n4.1, we introduce approximation functions to f\u03b4. To simplify the expression, we only present the\nunivariate SF h, and the SF f can be obtained using de\ufb01nition (1). It is trivial to show that the SFs\nderived from the following functions approximate f\u03b4 when \u03b8 \u2192 0+ under the properties in de\ufb01nition\n(1):\n\n(6a)\n\n(6b)\n\n(cid:26)\n(cid:26)\n\n1\nm\n\n(cid:19)\n\n(cid:18) x \u2212 1\n(cid:18)\n\u2212 (x \u2212 1)2\n\n\u03b8\n\n\u03b8\n\n\u2212 d\n\n(cid:19)\n\n(cid:27)\n\n(cid:27)\n\n\u2212 d\n\nhLap(x) =\n\nexp\n\nhGauss(x) =\n\n1\nm\n\nexp\n\n1\n\u03b8\n\nhPoly(x) = x\n\n(6c)\nwhere d = exp(\u2212 1\n\u03b8 ) and m = 1 \u2212 d. The usage of m and d is to normalize the SFs to satisfy the\nsecond property. Figure 1 shows some examples of such functions with varying \u03b8 values. Note that\ntraditional quadratic graph matching model in fact is a special case of our model, which seeks to\noptimize a model where the SF is derived from hPoly and \u03b8 = 1. Speci\ufb01cally, for the univariate SFs\n(6a) and (6c), we also have the following proposition.\nProposition 1 For univariate SF hLap, hPoly, suppose p\u2217\nwith \u03b81 and \u03b82, respectively. Then we have p\u2217\nTogether with Theorem 2, this claim means that, given univariate SF hLap or hPoly, the optimal\nobjective of (5) will converge as \u03b8 \u2192 0+ monotonically.\n\n2 are the optimal objectives for (5)\n\n2 if 0 < \u03b82 < \u03b81.\n\n1 and p\u2217\n\n1 \u2265 p\u2217\n\n4.3 Convexity/Concavity Analysis\n\nSection 4.1 and 4.2 show that original problem (4) can be asymptotically approximated using SFs as\n\u03b8 \u2192 0. In this section, we analyze the properties of convexity/concavity under such approximations.\nWe believe this effort is worthwhile as one can employ techniques e.g. self-ampli\ufb01cation [30], to\nconvert non-convex/non-concave problems into convex/concave ones with the bene\ufb01cial properties\nof convexity/concavity. We \ufb01rst show the equivalence of problem (3) and (5) under global convexity.\nTheorem 3 Assume that af\ufb01nity A is positive de\ufb01nite. If the univariate SF h\u03b8(x) \u2264 x on [0, 1], then\nthe global maxima of problem (2), which is discrete, must also be the global maxima of problem (5).\n\nThe above theorem builds up a link from problem (2) to problem (5) when A is positive de\ufb01nite. In\nthis case, we \ufb01rst conclude that the optimum to problem (3) is discrete, hence also optimal to (2).\nThen as long as h\u03b8(x) < x on [0, 1] and h\u03b8 satis\ufb01es the second property in De\ufb01nition 1, this solution\nis also optimal to problem (5). In this case the optimal objective gap of these three problems becomes\n0. We give the following proposition showing under mild conditions, the generalized problem (5) is\nconvexity/concavity-preserving.\n\nProposition 2 Assume af\ufb01nity maxtrix A is positive/negative semi-de\ufb01nite, then as long as the\nunivariate SF h\u03b8 is convex, the objective of (5) is convex/concave.\n\nAny matrix A can be transformed to positive de\ufb01nite by adding up a diagonal matrix \u03bbI. The lower\nbound of \u03bb is \u03bb \u2265 |\u03bb\u2020|, where \u03bb\u2020 is the smallest eigenvalue of A below 0. We de\ufb01ne A\u2020 = A + \u03bbI.\nProposition 3 Assume af\ufb01nity matrix A is positive de\ufb01nite and univariate SF h\u03b8 is convex. The\noptimal value to the following problem is:\n\n(7)\nThen there exists a permutation x\u2217, s.t. Econv \u2212 E(x\u2217) \u2264 n\u03bb where E(x\u2217) is the objective value w.r.t.\nproblem (5).\n\nEconv = max\n\nx\n\nh(cid:62)\n\u03b8 A\u2020h\u03b8\n\n5\n\n\f5 Two Optimization Strategies for Generalized GM\n\nAlgorithm 1 Path following for GGM\n\nInput: A, h\u03b8, \u03b80, 0 < \u03b1 < 1, initial x0, k;\nOutput: x\nx \u2190 x0, \u03b8 \u2190 \u03b80\nrepeat\n\nmake problem according to (5) with \u03b8\nrepeat\n\ncompute V using formula (8)\nx = x + \u0001vec(V)\n\nuntil Converge\n\u03b8 \u2190 \u03b1\u03b8\nuntil \u03b8 < k\n\nAlgorithm 2 Multiplicative strategy for GGM\n\nInput: A, h\u03b8, initial x0; Output: x\nx \u2190 x0\nrepeat\n\nh \u2190 h\u03b8(x)\nh \u2190 Ah\nx \u2190 h\u22121\nuntil Converge\n\n\u03b8 (h)\n\n5.1 Path Following Strategy\n\nIt is observed that solving the problem when \u03b8 is too close to 0 is highly non-convex, suggesting\nthe existence of many local optima. Instead, moderate smoothness is desired when we initiate the\noptimization. This naturally leads to the path following strategy. Such optimization is involved\nin [9, 17, 31]. In our implementation, we start by obtaining a local optimum x\u2217\n1 from a relatively\ntractable problem P\u03b81, then we shrink the value of \u03b81 by letting \u03b82 = \u03b1\u03b81 where 0 < \u03b1 < 1. Let the\n1, we solve the updated problem P\u03b82. The iteration continues until\nstarting point for next iteration be x\u2217\nconvergence condition is satis\ufb01ed. To verify the convergence, we calculate the energy gap between\ntwo consecutive iterations. Formally, for current x(t) at iteration t, we calculate the corresponding\nenergy E (t) = x(t)(cid:62)Ax(t). The energy at previous iteration t \u2212 1 is analogously calculated as\n\nE (t\u22121) = x(t\u22121)(cid:62)Ax(t\u22121). Then if(cid:12)(cid:12)E (t) \u2212 E (t\u22121)(cid:12)(cid:12) < \u03b7, where \u03b7 is a small positive value, we\n\nidentify the convergence of the iteration. If there is no such t, the algorithm stops when reaching the\npre-de\ufb01ned maximal iteration number. In all the following experiments, we let \u03b7 = 10\u22128.\nNote the problem P\u03b8 is a general objective with af\ufb01ne constraints. For any gradient-based strategy,\nprojection is necessary to mapping the current solution back to the feasible set. As discussed in [8],\nprojection in variable domain may lead to weak optima. Instead, we use Iterative Bregmann Gradient\nProjection (IBGP), which is performed in the gradient domain and the convergence is guaranteed\n[32]. Given current gradient U = mat(\u2207x), previous matching X and step length \u0001, IBGP performs\nthe following calculations iteratively to obtain V until convergence:\n\n2\n\nV = U \u2212 1\nn\n\nn2 11(cid:62)U11(cid:62)\n\nU11T \u2212 1\n11(cid:62)U +\nn\nif Vij < \u2212Xij/\u0001\nif Vij > (1 \u2212 Xij)/\u0001\n\nVij = \u2212Xij/\u0001\nVij = (1 \u2212 Xij)/\u0001\n\n(8a)\n\n(8b)\n\n(8c)\nwhere V is the update direction within the feasible set. Note the iterative procedure in the above\nequation is a projection. As the constraint set is convex (af\ufb01nity set), the projection convergence is\nensured. Thus in each iteration of update, the algorithm seeks a direction V with ascending guarantee\nand proceeds a \ufb01xed length \u0001. This procedure iterates until convergence or maximal step number.\nThe path following method is summarized in Algorithm 1.\n\n5.2 Multiplication Strategy\n\nMultiplicative strategy on optimizing quadratic objective proved to be convergent under the assump-\ntion that A is positive semi-de\ufb01nite [33]. In this strategy, each step amounts to a multiplication\nx(t+1) = Ax(t) and the objective score over the solution path is non-decreasing. There are works\n[10, 9, 6] falling into this category. However, in general af\ufb01nity A is barely positive semi-de\ufb01nite.\nWhile some methods handle this circumstance by adding reweighted identity matrix to A [34],\nothers simply neglect the non-decreasing constraint including some popular algorithms [10, 9]. The\nempirical success of such methods suggests pursuing objective ascending and enhancing matching\n\n6\n\n\faccuracy sometimes are paradox. Moreover, the recent study [35] further shows due to noise and the\nparametric modeling limitation of the af\ufb01nity function, high accuracy may even corresponds to lower\naf\ufb01nity score. Inspired by these observations, we devise a simple yet effective multiplicative strategy\nby omitting the non-decreasing check. The procedure is shown in Algorithm 2. In this strategy, the\nupdate rule involves calculating inverse function of h\u03b8. While it is found the multiplicative method\nconverges much faster and hence the overall run time is less compared with the path following\nmethod.\n\n6 Experiments\n\nThree popular benchmarks are used including Random Graph Matching [10], CMU house sequence\n[36] and Caltech-101/MSRC object matching [10]. accuracy, score and ratio are evaluated, where ac-\ncuracy measures the portion of correctly matched nodes with respect to all nodes, score represents the\nvalue of the objective function and ratio emphasizes the ratio between current objective value and the\nmaximal one. The algorithms for comparison include Spectral Matching (SM) [7], Integer Projected\nFixed Point (IPFP) [19], Graduated Assignment (GAGM) [9], Reweighted Random Walk (RRWM)\n[10], Soft-restricted Graduated Assignment (SRGA) [6], Factorized Graph Matching (FGM) [17]\nand Branching Path Following Matching (BPM) [31]. We term our algorithm Generalized Graph\nMatching (GGM) with a subscript indicating the corresponding Separable Function and optimization\nstrategy. Namely, GGMxy represents the method with Separable Function x \u2208 {l : hLap; p : hPoly}\nand optimizing strategy y \u2208 {p : Path following; m : Multiplication}. In all the experiments, the\nalgorithms with any updating rules are initialized with a uniform matching. For path following\nstrategy of GGM , we set \u03b80 = 2, \u03b1 = 0.5, k = 0.2.\n\nFigure 3: Results on C-\nMU house. Upper: con-\nvergence speed vs itera-\ntion. Lower: accuracy\nby frame gap.\n\nFigure 2: Performance on random graphs. Note BPM [31]\u2019s runtime is\nsigni\ufb01cantly more expensive than other methods (empirically an order\nhigher than ours using the public source code) as it simultaneously seeks\nmultiple paths for the best score (though accuracy is similar to ours). In\ncontrast, our method focus on one path no matter the path following or\nmultiplicative strategy is used.\nRandom Graph Matching This test is performed following the protocol in [10]. For each trial,\nsource graph GS and destination graph GD with nin inlier nodes are generated, consisting of vector\nattributes aS\nij for both nodes and edges (note aii is a node attribute and aij is an edge attribute\nwhen i (cid:54)= j.). In the initial setting, GD is the replica of GS. Three types of sub-experiments are\nconducted with varying graph deformation \u03c3, number of outliers nout and edge density \u03c1. To deform\na graph, we add an independent Gaussian noise \u03b5 \u223c N (0, \u03c3) to attribute aD\nij + \u03b5.\nab|2/\u03c32\nThus the resulting af\ufb01nity is calculated by Aij:ab = exp(\u2212|aS\ns ). The parameter \u03c3s is\nempirically set to be 0.15. In outlier test, we generate the same number of outlier nodes for both\ngraph. In edge density test, we randomly sample \u03c1 portion of corresponding edges from two fully\nconnected graphs. Each type of sub-experiment is independently carried out 500 times, while average\naccuracy and score are calculated.\nResults are shown in Fig 2. In the deformation and the edge density tests, GGMpp and GGMlp\nachieve competitive performance compared to state-of-the-art algorithms. Especially when there\n\nij \u2212 aD\n\nij and aD\n\nij such that aD\n\nij = aS\n\n7\n\n00.030.060.090.120.150.180.210.24Deformation noise \u03c30.10.20.30.40.50.60.70.80.91Accuracy# of inliers nin=20# of outliers nout=0edge density \u03c1=1GAGMIPFPSRGARRWMSMFGMBPMGGMppGGMlpGGMpmGGMlm012345678910# of outliers nout0.20.30.40.50.60.70.80.91Accuracy# of inliers nin=20deformation \u03c3=0edge density \u03c1=1GAGMIPFPSRGARRWMSMFGMBPMGGMppGGMlpGGMpmGGMlm0.20.40.60.81Edge density \u03c100.10.20.30.40.50.60.70.80.91Accuracy# of inliers nin=20# of outliers nout=5deformation \u03c3=0.2GAGMIPFPSRGARRWMSMFGMBPMGGMppGGMlpGGMpmGGMlm00.030.060.090.120.150.180.210.24Deformation noise \u03c3400450500550600650700750800Score# of inliers nin=20# of outliers nout=0edge density \u03c1=1GAGMIPFPSRGARRWMSMFGMBPMGGMppGGMlpGGMpmGGMlm012345678910# of outliers nout60070080090010001100120013001400Score# of inliers nin=20deformation \u03c3=0edge density \u03c1=1GAGMIPFPSRGARRWMSMFGMBPMGGMppGGMlpGGMpmGGMlm0.20.40.60.81Edge density \u03c10100200300400500600700800Accuracy# of inliers nin=20# of outliers nout=5deformation \u03c3=0.2GAGMIPFPSRGARRWMSMFGMBPMGGMppGGMlpGGMpmGGMlm16111621263136414651566166717681869196Iteration00.10.20.30.40.50.60.70.80.91Accuracy/Ratio# of inliers nin=30# of outliers nout=0GGMlp accuracyGGMlm accuracyGGMlp score ratioGGMlm score ratio102030405060708090100Sequence gap0.20.30.40.50.60.70.80.9Accuracy# of inliers nin=20# of outliers nout=10GAGMIPFPSRGARRWMSMFGMBPMGGMlp\f(a) RRWM: 13/20\n\n(c) FGM: 14/20\n\n(e) GGMlp: 16/20\n\n(b) RRWM: 4/20\n\n(d) FGM: 11/20\n\n(f) GGMlp: 18/20\n\nFigure 4: Top and bottom row shows examples on CMU house sequence with gap 20 and 80\nrespectively, by setting (nS = 30, nD = 20).\n\nis combination of severe deformation and edge density is low, GGMpp and GGMlp outperform the\nselected counterparts. On the other hand, GGMpm and GGMlm reach signi\ufb01cant performance close\nto state-of-the-art e.g. BPM [31]. Though multiplicative strategies cannot guarantee ascending\nobjective in each iteration, GGMpm and GGMlm are still effective. This supports the discussion of\nthe paradox between matching accuracy and objective score in Section 5.2. We only show results of\nGGMlp in the following experiments, as we see no notable performance gap compared to the other\nsettings.\nTo examine the algorithm sensitivity to the parameters, we also conduct an extra Random Graph\nMatching experiment with SFs SFs hPoly and hLap on Algorithm 1. In this test, we let deformation\nnoise 0.15 and edge density 0.8, 20 inliers and 5 outliers. Test is carried out independently for 20\ntimes and the average accuracy is reported. For both the SFs, we observe that k = 0.2 is suf\ufb01cient\nto produce satisfying matching accuracy. Thus we conduct the test by varying the values of \u03b80 and\n\u03b1. The results are demonstrated in Table 1. As larger \u03b80 and \u03b1 indicate more iterations, and \u03b80 < 2\nand \u03b1 < 0.5 result in decreasing behavior, we employ the setting \u03b80 = 2 and \u03b1 = 0.5 throughout all\nexperiments.\n\nTable 1: Sensitivity test on hPoly and hLap\n\nhPoly\n\n\u03b80\n\nhLap\n\n\u03b80\n\n0.7\n0.842\n0.905\n0.910\n0.823\n\n0.6\n0.839\n0.905\n0.905\n0.814\n\n0.7\n0.912\n0.911\n0.904\n0.853\n\n0.6\n0.909\n0.907\n0.904\n0.844\n\n3\n2\n1\n0.5\n\n4\n3\n2\n1\n\n\u03b1\n\n0.5\n0.841\n0.904\n0.908\n0.770\n\u03b1\n0.5\n0.910\n0.903\n0.906\n0.810\n\n0.4\n0.721\n0.848\n0.851\n0.652\n\n0.3\n0.610\n0.725\n0.717\n0.422\n\n0.4\n0.872\n0.836\n0.811\n0.728\n\n0.3\n0.685\n0.672\n0.567\n0.472\n\nCMU House Sequence We perform feature point matching on widely used CMU house sequence\ndataset following the settings in [36, 10]. The dataset consists of 111 house images with gradually\nchanging view points. There are 30 landmark points in each frame. Following the protocol in\n[10, 31], matching test is conducted on totally 560 pairs of images, spaced by varying frame gaps\n(10, 20, ..., 100). We use 2 settings of nodes (nS, nD) = (30, 30) and (20, 30). In case nS < 30,\n\nTable 2: Performance on natural images from Caltech-101 and MSRC dataset.\n\nMethod\n\naccuracy (%)\nscore ratio\n\nGAGM IPFP\n75.77\n73.66\n0.933\n0.942\n\nSRGA RRWM SM FGM BPM GGMlp\n76.69\n72.86\n0.940\n0.972\n\n76.35\n0.969\n\n72.95\n0.946\n\n65.78\n0.735\n\n75.14\n\n1\n\n8\n\n\f(a) car pair\n\n(c) RRWM: 27/36\n\n(e) FGM: 29/36\n\n(g) GGMlp: 30/36\n\n(b) face pair\nFigure 5: Examples of matchings on selected Caltech-101 and MSRC dataset.\n\n(d) RRWM: 32/40\n\n(f) FGM: 33/40\n\n(h) GGMlp: 35/40\n\nab|2/\u03c32\n\nij \u2212 aD\n\ns ), where aS\n\nnS nodes are randomly sampled from the source graph. The af\ufb01nity is conducted by Aij:ab =\nexp(\u2212|aS\nij measures the Euclidean distance between point i and j, and\ns = 2500. The edge density is set by \u03c1 = 1. One can see when there is no outlier, all methods\n\u03c32\nexcept for IPFP and SM achieve perfect matching on any gap setting, and we only show the results\nwith outliers. Figure 4 and Figure 3 depict the matching samples and performance curve, respectively.\nWe also show typical converging behavior of GGMlp and GGMlm on the upper of Figure 3. We note\nour path following strategy (Alg. 1) converges slower than multiplicative one (Alg. 2) and they\nobtain similar \ufb01nal accuracy. One can see when there exist outlier points, GAGM and RRWM suffer\nnotable degraded performance. Our algorithm, on the other hand, achieves competitive performance\nto state-of-the-arts and behaves stably even under severe degradations.\nNatural Image Matching This is a challenging dataset as it includes natural images in arbitrary\nbackgrounds. In line with the protocol in [10], 30 pairs of images are included in this test collected\nfrom Caltech-101 [37] and MSRC3. In each pair of images, MSER detector [38] and SIFT descriptor\n[39] are used to obtain the key points and the corresponding node feature. Mutual projection error\nfunction [40] is further adopted to calculate the edge af\ufb01nity. The ground-truth are manually labeled.\nThe results are shown in Table 2 and matching examples are shown in Fig. 5. Our method outperforms\nselected algorithms w.r.t. accuracy regardless of objective score. This also suggests the paradox\nbetween accuracy and score under complex af\ufb01nity modeling as discussed in [35].\n\n7 Conclusion\n\nBy using Separable Functions, we present a family of continuous approximations to the vanilla QAP\nformulation widely used in graph matching. We explore the relation of such approximations to the\noriginal discrete matching problem, and show convergence properties under mild conditions. Based\non the theoretical anslysis, we propose a novel solver GGM, which achieves remarkable performance\nin both synthetic and real-world image tests. This gives rise to the possibility of solving graph\nmatching with many alternative approximations with different solution paths.\n\nAcknowledgement\n\nThis work was supported in part by a grant from ONR. Junchi Yan is supported in part by NSFC\n61602176 and Tencent AI Lab Rhino-Bird Joint Research Program (No. JR201804). Any opinions\nexpressed in this material are those of the authors and do not necessarily re\ufb02ect the views of ONR.\n\nReferences\n[1] E. M. Loiola, N. M. de Abreu, P. O. Boaventura-Netto, P. Hahn, and T. Querido. A survey for the quadratic\n\nassignment problem. European Journal of Operational Research, 176(2):657\u2013690, 2007.\n\n[2] O. Duchenne, A. Joulin, and J. Ponce. A graph-matching kernel for object categorization. In ICCV, 2011.\n\n[3] M. Zaslavskiy, F. R. Bach, and J.-P. Vert. Global alignment of proteinprotein interaction networks by graph\n\nmatching methods. Bioinformatics, 25(12), 2009.\n\n3http://research.microsoft.com/vision/cambridge/recognition/\n\n9\n\n\f[4] Z. Zhao, Y. Qiao, J. Yang, and L. Bai. From dense subgraph to graph matching: A label propagation\n\napproach. In ICALIP, 2014.\n\n[5] K. Adamczewski and Y. Suh. Discrete tabu search for graph matching. In ICCV, 2015.\n\n[6] Y. Tian, J. Yan, H. Zhang, Y. Zhang, X. Yang, and H. Zha. On the convergence of graph matching:\n\nGraduated assignment revisited. In ECCV, 2012.\n\n[7] M. Leordeanu and M. Hebert. A spectral technique for correspondence problems using pairwise constraints.\n\nIn ICCV, 2005.\n\n[8] B. Jiang, J. Tang, C. Ding, Y. Gong, and B. Luo. Graph matching via multiplicative update algorithm. In\n\nNIPS, pages 3190\u20133198. 2017.\n\n[9] S. Gold and A. Rangarajan. A graduated assignment algorithm for graph matching. IEEE Transactions on\n\nPattern Analysis and Machine Intelligence, 18(4):377\u2013388, 1996.\n\n[10] M. Cho, J. Lee, and K. M. Lee. Reweighted random walks for graph matching. In ECCV, 2010.\n\n[11] H. Kuhn. The hungarian method for the assignment problem. In Naval Research Logistics Quarterly,\n\nvolume 2, pages 83\u201397, 1955.\n\n[12] T. C. Koopmans and M. Beckmann. Assignment problems and the location of economic activities.\n\nEconometrica, (1):53\u201376, 1957.\n\n[13] S. Umeyama. An eigendecomposition approach to weighted graph matching problems. TPAMI, 1988.\n\n[14] M. Zaslavskiy, F. R. Bach, and J.-P. Vert. A path following algorithm for the graph matching problem.\n\nIEEE Transactions on Pattern Analysis and Machine Intelligence, 31(12):2227\u20132242, 2009.\n\n[15] Z. Liu, H. Qiao, and L. Xu. An extended path following algorithm for graph-matching problem. IEEE\n\nTransactions on Pattern Analysis and Machine Intelligence, (7):1451\u20131456, 2012.\n\n[16] E. Lawler. The quadratic assignment problem. Management Science, 9(4):586\u2013599, 1963.\n\n[17] F. Zhou and F. D. Torre. Factorized graph matching. In CVPR, 2012.\n\n[18] J. Yan, X. Yin, W. Lin, C. Deng, H. Zha, and X. Yang. A short survey of recent advances in graph matching.\n\nIn ICMR, 2016.\n\n[19] M. Leordeanu, M. Hebert, and R Sukthankar. An integer projected \ufb01xed point method for graph matching\n\nand map inference. In NIPS, 2009.\n\n[20] J. Lee, M. Cho, and K. Lee. A graph matching algorithm using data-driven markov chain monte carlo\n\nsampling. In ICPR, 2010.\n\n[21] P. Srinivasan T. Cour and J. Shi. Balanced graph matching. In NIPS, 2006.\n\n[22] T. Caelli and S. Kosinov. An eigenspace projection clustering method for inexact graph matching. IEEE\n\ntransactions on Pattern Analysis and Machine Intelligence, 26(4):515\u2013519, 2004.\n\n[23] C. Schellewald and C. Schn\u00f6rr. Probabilistic subgraph matching based on convex relaxation. In EMMCVPR,\n\n2005.\n\n[24] P. H. S. Torr. Solving markov random \ufb01elds using semide\ufb01nite programming. In AISTATS, 2003.\n\n[25] H. A. Almohamad and S. O. Duffuaa. A linear programming approach for the weighted graph matching\n\nproblem. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(5):522\u2013525, 1993.\n\n[26] J. Lee, M. Cho, and K. M. Lee. Hyper-graph matching via reweighted randomwalks. In CVPR, 2011.\n\n[27] M. Chang and B. Kimia. Measuring 3d shape similarity by graph-based matching of the medial scaffolds.\n\nComputer Vision and Image Understanding, (5):707\u2013720, 2011.\n\n[28] J. Yan, J. Wang, H. Zha, and X. Yang. Consistency-driven alternating optimization for multigraph matching:\n\nA uni\ufb01ed approach. IEEE Transactions on Image Processing, 24(3):994\u20131009, 2015.\n\n[29] T. Yu, J. Yan, W. Liu, and B. Li. Incremental multi-graph matching via diversity and randomness based\n\ngraph clustering. In ECCV, 2018.\n\n10\n\n\f[30] A. Rangarajan, A. Yuille, and E. Mjolsness. Statistical physics algorithms that converge. Neural Computa-\n\ntion, 11:1455\u20131474, 1999.\n\n[31] T. Wang, H. Ling, C. Lang, and J. Wu. Branching path following for graph matching. In ECCV, 2016.\n\n[32] T. Yu, J. Yan, J. Zhao, and B. Li. Joint cuts and matching of partitions in one graph. In CVPR, 2018.\n\n[33] A. Yuille and J. Kosowsky. Statistical physics algorithms that converge. Neural Computation, 6:341\u2013356,\n\n1994.\n\n[34] B. Jiang, J. Tang, C. Ding, and B. Luo. Binary constraint preserving graph matching. In CVPR, 2017.\n\n[35] J. Yan, M. Cho, H. Zha, X. Yang, and S. Chu. Multi-graph matching via af\ufb01nity optimization with\ngraduated consistency regularization. IEEE Transactions on Pattern Analysis and Machine Intelligence,\n38(6):1228\u20131242, 2016.\n\n[36] T. Caetano, T. Caelli, D. Schuurmans, and D. Barone. Graphical models and point pattern matching. IEEE\n\nTransactions on Pattern Analysis and Machine Intelligence, 28(10):1646\u20131663, 2006.\n\n[37] L. Fei-Fei, R. Fergus, and P. Perona. Learning generative visual models from few training examples: An\nincremental bayesian approach tested on 101 object categories. Computer Vision and Image Understanding,\n106(1):59\u201370, 2007.\n\n[38] J. Matas, O. Chum, M. Urban, and T. Pajdla. Robust wide-baseline stereo from maximally stable extremal\n\nregions. Image and Vision Computing, 22(10):761\u2013767, 2004.\n\n[39] D. Lowe. Object recognition from local scale-invariant features. In ICCV, 1999.\n\n[40] M. Cho, J. Lee, and K. Lee. Feature correspondence and deformable object matching via agglomerative\n\ncorrespondence clustering. In ICCV, 2009.\n\n11\n\n\f", "award": [], "sourceid": 462, "authors": [{"given_name": "Tianshu", "family_name": "Yu", "institution": "Arizona State University"}, {"given_name": "Junchi", "family_name": "Yan", "institution": "Shanghai Jiao Tong University"}, {"given_name": "Yilin", "family_name": "Wang", "institution": "Adobe"}, {"given_name": "Wei", "family_name": "Liu", "institution": "Tencent AI Lab"}, {"given_name": "baoxin", "family_name": "Li", "institution": "Arizona State University"}]}