{"title": "KerGM: Kernelized Graph Matching", "book": "Advances in Neural Information Processing Systems", "page_first": 3335, "page_last": 3346, "abstract": "Graph matching plays a central role in such fields as computer vision, pattern recognition, and bioinformatics. Graph matching problems can be cast as two types of quadratic assignment problems (QAPs): Koopmans-Beckmann's QAP or Lawler's QAP. In our paper, we provide a unifying view for these two problems by introducing new rules for array operations in Hilbert spaces. Consequently, Lawler's QAP can be considered as the Koopmans-Beckmann's alignment between two arrays in reproducing kernel Hilbert spaces (RKHS), making it possible to efficiently solve the problem without computing a huge affinity matrix. Furthermore, we develop the entropy-regularized Frank-Wolfe (EnFW) algorithm for optimizing QAPs, which has the same convergence rate as the original FW algorithm while dramatically reducing the computational burden for each outer iteration. We conduct extensive experiments to evaluate our approach, and show that our algorithm significantly outperforms the state-of-the-art in both matching accuracy and scalability.", "full_text": "KerGM: Kernelized Graph Matching\n\nZhen Zhang1, Yijian Xiang1, Lingfei Wu2, Bing Xue1, Arye Nehorai1\n\n1Washington University in St. Louis\n\n2IBM Research\n\n1{zhen.zhang, yijian.xiang, xuebing, nehorai}@wustl.edu\n\n2lwu@email.wm.edu\n\nAbstract\n\nGraph matching plays a central role in such \ufb01elds as computer vision, pattern recog-\nnition, and bioinformatics. Graph matching problems can be cast as two types of\nquadratic assignment problems (QAPs): Koopmans-Beckmann\u2019s QAP or Lawler\u2019s\nQAP. In our paper, we provide a unifying view for these two problems by introduc-\ning new rules for array operations in Hilbert spaces. Consequently, Lawler\u2019s QAP\ncan be considered as the Koopmans-Beckmann\u2019s alignment between two arrays in\nreproducing kernel Hilbert spaces (RKHS), making it possible to ef\ufb01ciently solve\nthe problem without computing a huge af\ufb01nity matrix. Furthermore, we develop the\nentropy-regularized Frank-Wolfe (EnFW) algorithm for optimizing QAPs, which\nhas the same convergence rate as the original FW algorithm while dramatically\nreducing the computational burden for each outer iteration. We conduct extensive\nexperiments to evaluate our approach, and show that our algorithm signi\ufb01cantly\noutperforms the state-of-the-art in both matching accuracy and scalability.\n\n1\n\nIntroduction\n\nGraph matching (GM), which aims at \ufb01nding the optimal correspondence between nodes of two given\ngraphs, is a longstanding problem due to its nonconvex objective function and binary constraints.\nIt arises in many applications, ranging from recognizing actions [3, 13] to identifying functional\northologs of proteins [11, 41]. Typically, GM problems can be formulated as two kinds of quadratic\nassignment problems (QAPs): Koopmans-Beckmann\u2019s QAP [18] or Lawler\u2019s QAP [22]. Koopman-\nBeckmann\u2019s QAP is the structural alignment between two weighted adjacency matrices, which, as a\nresult, can be written as the standard Frobenius inner product between two n \u00d7 n matrices, where n\ndenotes the number of nodes. However, Koopmans-Beckmann\u2019s QAP cannot incorporate complex\nedge attribute information, which is usually of great importance in characterizing the relation between\nnodes. Lawler\u2019s QAP can tackle this issue, because it attempts to maximize the overall similarity that\nwell encodes the attribute information. However, the key concern of the Lawler\u2019s QAP is that it needs\nto estimate the n2 \u00d7 n2 pairwise af\ufb01nity matrix, limiting its application to very small graphs.\nIn our work, we derive an equivalent formulation of Lawler\u2019s QAP, based on a very mild assumption\nthat edge af\ufb01nities are characterized by kernels [15, 34]. After introducing new rules for array\noperations in Hilbert spaces, named as H\u2212operations, we rewrite Lawler\u2019s QAP as the Koopmann-\nBeckmann\u2019s alignment between two arrays in a reproducing kernel Hilbert space (RKHS), which al-\nlows us to solve it without computing the huge af\ufb01nity matrix. Taking advantage of the H\u2212operations,\nwe develop a path-following strategy for mitigating the local maxima issue of QAPs. In addition to\nthe kernelized graph matching (KerGM) formulation, we propose a numerical optimization algorithm,\nthe entropy-regularized Frank-Wolfe (EnFW) algorithm, for solving large-scale QAPs. The EnFW\nhas the same convergence rate as the original Frank-Wolfe algorithm, with far less computational\nburden in each iteration. Extensive experimental results show that our KerGM, together with the\nEnFW algorithm, achieves superior performance in both matching accuracy and scalability.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fRelated Work: In the past forty years, a myriad of graph matching algorithms have been proposed\n[8], most of which focused on solving QAPs. Previous work [2, 14, 21] approximated the quadratic\nterm with a linear one, which consequently can be solved by standard linear programming solvers.\nIn [36], several convex relaxation methods are proposed and compared. It is known that convex\nrelaxations can achieve global convergence, but usually perform poorly because the \ufb01nal projection\nstep separates the solution from the original QAP. Concave relaxations [29, 28] can avoid this problem\nsince their outputs are just permutation matrices. However, concave programming [4] is NP-hard,\nwhich limits its applications. In [45], a seminal work termed the \"path-following algorithm\" was\nproposed, which leverages both the above relaxations via iteratively solving a series of optimization\nproblems that gradually changed from convex to concave. In [27, 38, 39, 44], the path following\nstrategy was further extended and improved. However, all the above algorithms, when applied\nto Lawler\u2019s QAP, need to compute the n2 \u00d7 n2 af\ufb01nity matrix. To tackle the challenge, in [48],\nthe authors elegantly factorized the af\ufb01nity matrix into the Kronecker product of smaller matrices.\nHowever, it still cannot be well applied to large dense graphs, since it scales cubically with the\nnumber of edges. Beyond solving the QAP, there are interesting works on doing graph matching from\nother perspectives, such as probabilistic matching[46], hypergraph matching [24], and multigraph\nmatching [42]. We refer to [43] for a survey of recent advances.\nOrganization: In Section 2, we introduce the background, including Koopmans-Beckmann\u2019s and\nLawler\u2019s QAPs, and kernel functions and its reproducing kernel Hilbert space. In Section 3, we\npresent the proposed rules for array operations in Hilbert space. Section 4 and Section 5 form\nthe core of our work, where we develop the kernelized graph matching, together with the entropy-\nregularized Frank-Wolfe optimizaton algorithm. In Section 6, we report the experimental results. In\nthe supplementary material, we provide proofs of all mathematical results in the paper, along with\nfurther technical discussions and more experimental results.\n\n2 Background\n\n2.1 Quadratic Assignment Problems for Graph Matching\nLet G = {A,V, P ,E, Q} be an undirected, attributed graph of n nodes and m edges, where\nA \u2208 Rn\u00d7n is the adjacency matrix, V = {vi}n\ni=1 and P = [(cid:126)p1, (cid:126)p2, ..., (cid:126)pn] \u2208 RdN\u00d7n are the\nrespective node set and node attributes matrix, and E = {eij|vi and vj are connected} and Q =\n[(cid:126)qij|eij \u2208 E] \u2208 RdE\u00d7m are the respective edge set and edge attributes matrix. Given two graphs\nG1 = {A1,V1, P1,E1, Q1} and G2 = {A2,V2, P2,E2, Q2} of n nodes1, the GM problem aims to\n\ufb01nd a correspondence between nodes in V1 and V2 which is optimal in some sense.\nFor Koopmans-Beckmann\u2019s QAP [18], the optimality refers to the Frobenius inner product maxi-\nmization between two adjacency matrices after permutation, i.e.,\n\nmax (cid:104)A1X, XA2(cid:105)F\n\ns.t. X \u2208 P = {X \u2208 {0, 1}n\u00d7n|X(cid:126)1 = (cid:126)1, X T (cid:126)1 = (cid:126)1},\n\n(1)\nwhere (cid:104)A, B(cid:105)F = tr(AT B) is the Frobenius inner product. The issue with (1) is that it ignores the\ncomplex edge attributes, which are usually of particular importance in characterizing graphs.\nFor Lawler\u2019s QAP [22], the optimality refers to the similarity maximization between the node\nattribute sets and between the edge attribute sets, i.e.,\n\nkN ((cid:126)p1\n\ni , (cid:126)p2\n\na)Xia +\n\nkE((cid:126)q1\n\nij, (cid:126)q2\n\nab)XiaXjb\n\ns.t. X \u2208 P,\n\n(2)\n\n(cid:88)\n\nij\u2208E1,e2\ne1\n\nab\u2208E2\n\n(cid:88)\n\nmax\n\ni \u2208V1,v2\nv1\n\na\u2208V2\n\nwhere kN and kE are the node and edge similarity measurements, respectively. Furthermore, (2) can\nbe rewritten in compact form:\n\nmax (cid:104)KN , X(cid:105)F + vec(X)T Kvec(X)\n\nab) if i (cid:54)= j, a (cid:54)= b, e1\n\n(3)\nwhere KN \u2208 Rn\u00d7n is the node af\ufb01nity matrix, K is an n2 \u00d7 n2 matrix, de\ufb01ned such that Kia,jb =\nab \u2208 E2, otherwise, Kia,jb = 0. It is well known\nkE((cid:126)q1\nthat Koopmans-Beckmann\u2019s QAP is a special case of Lawler\u2019s QAP if we set K = A2 \u2297 A1 and\nKN = 0n\u00d7n. The issue of (3) is that the size of K scales quadruply with respect to n, which\nprecludes its applications to large graphs. In our work, we will show that Lawler\u2019s QAP can be\nwritten in the Koopmans-Beckmann\u2019s form, which can avoid computing K.\n\nij \u2208 E1, and e2\n\ns.t. X \u2208 P,\n\nij, (cid:126)q2\n\n1We assume G1 and G2 have the same number of nodes. If not, we add dummy nodes.\n\n2\n\n\fFigure 1: Visualization of the operation \u03a8 (cid:12) X.\n\n2.2 Kernels and reproducing kernel Hilbert spaces\nGiven any set X , a kernel k : X \u00d7 X \u2192 R is a function for quantitatively measuring the af\ufb01nity\nbetween objects in X . It satis\ufb01es that there exist a Hilbert space, H, and an (implicit) feature map\n\u03c8 : X \u2192 H, such that k(q1, q2) = (cid:104)\u03c8(q1), \u03c8(q2)(cid:105)H, \u2200q1, q2 \u2208 X . The space H is the reproducing\nkernel Hilbert space associated with k.\nNote that if X is a Euclidean space i.e., X = Rd, many similarity measurement functions are kernels,\nsuch as exp(\u2212(cid:107)(cid:126)q1 \u2212 (cid:126)q2(cid:107)2\n3 H-operations for arrays in Hilbert spaces\nLet H be any Hilbert space, coupled with the inner product (cid:104)\u00b7,\u00b7(cid:105)H taking values in R. Let Hn\u00d7n\nbe the set of all n \u00d7 n arrays in H, and let \u03a8, \u039e \u2208 Hn\u00d7n, i.e., \u03a8ij, \u039eij \u2208 H, \u2200i, j = 1, 2, ..., n.\nAnalogous to matrix operations in Euclidean spaces, we make the following addition, transpose, and\nmultiplication rules (H-operations), i.e.,\u2200X \u2208 Rn\u00d7n, and we have\n\n2), exp(\u2212(cid:107)(cid:126)q1 \u2212 (cid:126)q2(cid:107)2), and (cid:104)(cid:126)q1, (cid:126)q2(cid:105), \u2200(cid:126)q1, (cid:126)q2 \u2208 Rd.\n\n1. \u03a8 + \u039e, \u03a8T \u2208 Hn\u00d7n, where [\u03a8 + \u039e]ij (cid:44) \u03a8ij + \u039eij \u2208 H and [\u03a8T ]ij (cid:44) \u03a8ji \u2208 H.\n\n2. \u03a8 \u2217 \u039e \u2208 Rn\u00d7n, where [\u03a8 \u2217 \u039e]ij (cid:44)(cid:80)n\n3. \u03a8 (cid:12) X, X (cid:12) \u03a8 \u2208 Hn\u00d7n, where [\u03a8 (cid:12) X]ij (cid:44)(cid:80)n\nand [X (cid:12) \u03a8]ij (cid:44)(cid:80)n\n\nk=1 Xik\u03a8kj \u2208 H.\n\nk=1(cid:104)\u03a8ik, \u039ekj(cid:105)H \u2208 R.\n\nk=1 \u03a8ikXkj =(cid:80)n\n\nk=1 Xkj\u03a8ik \u2208 H\n\nNote that if H = R, all the above degenerate to the common operations for matrices in Euclidean\nspaces. In Fig. 1, we visualize the operation \u03a8 (cid:12) X, where we let H = Rd, let \u03a8 be a 3 \u00d7 3\narray in Rd, and let X be a 3 \u00d7 3 permutation matrix. It is easy to see that \u03a8 (cid:12) X is just \u03a8 after\ncolumn-permutation.\nAs presented in the following corollary, the multiplication (cid:12) satisfy the combination law.\nCorollary 1. \u2200X, Y \u2208 Rn\u00d7n, \u03a8 (cid:12) X (cid:12) Y = \u03a8 (cid:12) (XY ), and Y (cid:12) (X (cid:12) \u03a8) = (Y X) (cid:12) \u03a8.\nBased on the H-operations, we can construct the Frobenius inner product on Hn\u00d7n.\nProposition 1. De\ufb01ne the function (cid:104)\u00b7,\u00b7(cid:105)FH : Hn\u00d7n \u00d7 Hn\u00d7n \u2192 R such that (cid:104)\u03a8, \u039e(cid:105)FH (cid:44) tr(\u03a8T \u2217\n\n\u039e) =(cid:80)n\nAs an immediate result, the function (cid:107)\u00b7(cid:107)FH : Hn\u00d7n \u2192 R, de\ufb01ned such that (cid:107)\u03a8(cid:107)FH =(cid:112)(cid:104)\u03a8, \u03a8(cid:105)FH,\n\ni,j=1(cid:104)\u03a8ij, \u039eij(cid:105)H, \u2200\u03a8, \u039e \u2208 Hn\u00d7n. Then (cid:104)\u00b7,\u00b7(cid:105)FH is an inner product on Hn\u00d7n.\n\nis the Frobenius norm on Hn\u00d7n. Next, we introduce two properties of (cid:104)\u00b7,\u00b7(cid:105)FH, which play important\nroles for developing the convex-concave relaxation of the Lawler\u2019s graph matching problem.\nCorollary 2. (cid:104)\u03a8 (cid:12) X, \u039e(cid:105)FH = (cid:104)\u03a8, \u039e (cid:12) X T(cid:105)FH and (cid:104)X (cid:12) \u03a8, \u039e(cid:105)FH = (cid:104)\u03a8, X T (cid:12) \u039e(cid:105)FH.\n\n4 Kernelized graph matching\n\nBefore deriving kernelized graph matching, we \ufb01rst present an assumption.\nAssumption 1. We assume that the edge af\ufb01nity function kE : RdE \u00d7 RdE \u2192 R is a kernel.\nThat is, there exist both an RKHS, H, and an (implicit) feature map, \u03c8 : RdE \u2192 H, such that\nkE((cid:126)q1, (cid:126)q2) = (cid:104)\u03c8((cid:126)q1), \u03c8((cid:126)q2)(cid:105)H, \u2200(cid:126)q1, (cid:126)q2 \u2208 RdE .\n\n3\n\n\f(cid:88)\n\nn(cid:88)\n\nn(cid:88)\n\n(cid:104)\n\nn(cid:88)\n\n(cid:26)\u03c8((cid:126)qij) \u2208 H,\n\nNote that Assumption 1 is rather mild, since kernel functions are powerful and popular in quantifying\nthe similarity between attributes [47], [19].\nFor any graph G = {A,V, P ,E, Q}, we can construct an array, \u03a8 \u2208 Hn\u00d7n:\n\n\u03a8ij =\n\n0H \u2208 H,\n\n(4)\nGiven two graphs G1 and G2, let \u03a8(1) and \u03a8(2) be the corresponding Hilbert arrays of G1 and G2,\nrespectively. Then the edge similarity term in Lawler\u2019s QAP (see (2)) can be written as\n\n, where 0H is the zero vector in H.\n\nif (vi, vj) \u2208 E\notherwise\n\nkE((cid:126)q1\n\nij , (cid:126)q2\n\nab)XiaXjb =\n\n\u03a8(1)\n\nij Xjb,\n\nXia\u03a8(2)\n\nab (cid:105)HK = (cid:104)\u03a8(1)(cid:12)X, X(cid:12)\u03a8(2)(cid:105)FH ,\n\nab\u2208E2\n\nij\u2208E1,e2\ne1\nwhich shares a similar form with (1), and can be considered as the Koopmans-Beckmann\u2019s alignment\nbetween the Hilbert arrays \u03a8(1) and \u03a8(2). The last term in (4) is just the Frobenius inner product\nbetween two Hilbert arrays after permutation. Adding the node af\ufb01nity term, we write Laweler\u2019s\nQAP as2:\n\ni,b=1\n\na=1\n\nj=1\n\nmin Jgm(X) = \u2212(cid:104)KN , X(cid:105)F \u2212 (cid:104)\u03a8(1) (cid:12) X, X (cid:12) \u03a8(2)(cid:105)FH s.t. X \u2208 P.\n\n(5)\n\n4.1 Convex and concave relaxations\n\nThe form (5) inspires an intuitive way to develop convex and concave relaxations. To do this, we \ufb01rst\n2(cid:104)X(cid:12)\u03a8(2), X(cid:12)\u03a8(2)(cid:105)FH.\nintroduce an auxiliary function Jaux(X) = 1\nApplying Corollary 1 and 2, for any X \u2208 P, which satis\ufb01es XX T = X T X = I, we have\n1\n2\n\n2(cid:104)\u03a8(1)(cid:12)X, \u03a8(1)(cid:12)X(cid:105)FH + 1\n\n(cid:104)\u03a8(2), (X T X)(cid:12)\u03a8(2)(cid:105)FH =\n\n(cid:104)\u03a8(1), \u03a8(1)(cid:12)(XX T )(cid:105)FH+\n\n(cid:107)\u03a8(1)(cid:107)2\n\n(cid:107)\u03a8(2)(cid:107)2\n\nJaux(X) =\n\nFH +\n\n1\n2\n\n1\n2\n\n1\n2\n\nFH ,\n\nwhich is always a constant. Introducing Jaux(X) to (5), we obtain convex and concave relaxations:\n\nJvex(X) = Jgm(X) + Jaux(X) = \u2212(cid:104)KN , X(cid:105)F +\n1\n2\nJcav(X) = Jgm(X) \u2212 Jaux(X) = \u2212(cid:104)KN , X(cid:105)F \u2212 1\n2\n\n(cid:107)\u03a8(1) (cid:12) X \u2212 X (cid:12) \u03a8(2)(cid:107)2\n(cid:107)\u03a8(1) (cid:12) X + X (cid:12) \u03a8(2)(cid:107)2\n\nFH ,\n\nFH .\n\n(6)\n\n(7)\n\nThe convexity of Jvex(X) is easy to conclude, because the composite function of the squared norm,\n(cid:107) \u00b7 (cid:107)2\nFH, and the linear transformation, \u03a8(1) (cid:12) X \u2212 X (cid:12) \u03a8(2), is convex. We have similarity\ninterpretation for the concavity of Jcav(X).\n2(cid:107)\u03a8(1) (cid:12) X \u2212 X (cid:12) \u03a8(2)(cid:107)FH in (6) is just the distance between\nIt is interesting to see that the term 1\nHilbert arrays. If we set the map \u03c8(x) = x, then the convex relaxation of (1) is recovered (see [1]).\nPath following strategy: Leveraging these two relaxations [45], we minimize Jgm by successively\noptimizing a series of sub-problems parameterized by \u03b1 \u2208 [0, 1]:\nmin J\u03b1(X) = (1 \u2212 \u03b1)Jvex(X) + \u03b1Jcav(X)\n\n(8)\nwhere D is the double stochastic relaxation of the permutation matrix set, P. We start at \u03b1 = 0 and\n\ufb01nd the unique minimum. Then we gradually increase \u03b1 until \u03b1 = 1. That is, we optimize J\u03b1+(cid:52)\u03b1\nwith the local minimizer of J\u03b1 as the initial point. Finally, we output the local minimizer of J\u03b1=1.\nWe refer to [45], [48], and [39] for detailed descriptions and improvements.\nGradients computation: If we use the \ufb01rst-order optimization methods, we need only the gradients:\n(9)\nkj); \u2200a, b = 1, 2, ..., n,\ncb); and \u2200i, a = 1, 2, ..., n, [(\u03a8(1) (cid:12) X) \u2217 \u03a8(2)]ia =\nca). In the supplementary material, we provide compact matrix multi-\n\n(cid:79)J\u03b1(X) = (1 \u2212 2\u03b1)(cid:2)(\u03a8(1) \u2217 \u03a8(1))X + X(\u03a8(2) \u2217 \u03a8(2))(cid:3) \u2212 2(\u03a8(1) (cid:12) X) \u2217 \u03a8(2) \u2212 KN ,\n[\u03a8(2) \u2217 \u03a8(2)]ab = (cid:80)\n(cid:80)\n\n[\u03a8(1) \u2217 \u03a8(1)]ij = (cid:80)\n\nwhere \u2200i, j = 1, 2, ..., n,\n\n+ |X1 = 1, XT 1 = 1},\n\ns.t. X \u2208 D = {X \u2208 Rn\u00d7n\n\nac,e2\ne2\nXkckE((cid:126)q1\n\ncb\u2208E2\nik, (cid:126)q2\n\nac, (cid:126)q2\n\nik, (cid:126)q1\n\nkj\u2208E1\n\nkE((cid:126)q2\n\nkE((cid:126)q1\n\ne1\nik,e1\n\nik\u2208E1,e2\ne1\n\nca\u2208E2\n\nplication forms for computing (9).\n\n2For convenience in developing the path-following strategy, we write it in the minimization form.\n\n4\n\n\f4.2 Approximate explicit feature maps\n\nBased on the above discussion, we signi\ufb01cantly reduce the space cost of Lawler\u2019s QAP by avoiding\ncomputing the af\ufb01nity matrix K \u2208 Rn2\u00d7n2. However, the time cost of computing gradient with (9)\nis O(n4), which can be further reduced by employing the approximate explicit feature maps [33, 40].\nFor the kernel kE : RdE \u00d7 RdE \u2192 R, we may \ufb01nd an explicit feature map \u02c6\u03c8 : RdE \u2192 RD, such that\n(10)\n\n\u2200 (cid:126)q1, (cid:126)q2 \u2208 RdE , (cid:104) \u02c6\u03c8((cid:126)q1), \u02c6\u03c8((cid:126)q2)(cid:105) = \u02c6kE((cid:126)q1, (cid:126)q2) \u2248 kE((cid:126)q1, (cid:126)q2).\n\nFor example, if kE((cid:126)q1, (cid:126)q2) = exp(\u2212\u03b3(cid:107)(cid:126)q1 \u2212 (cid:126)q2(cid:107)2\n\n2), then \u02c6\u03c8 is the Fourier random feature map [33]:\n\n\u02c6\u03c8((cid:126)q) =\n\n1 (cid:126)q + b1), ..., cos(\u03c9T\n\n, where \u03c9i \u223c N ((cid:126)0, \u03b32I) and bi \u223c U [0, 1].\n\n(11)\n\nD (cid:126)q + bD)(cid:3)T\n\nNote that in practice, the performance of \u02c6\u03c8 is good enough for relatively small values of D [47]. By\nthe virtue of explicit feature maps, we obtain a new graph representation \u02c6\u03a8 \u2208 (RD)n\u00d7n:\n\n\u02c6\u03a8ij =\n\n(cid:126)0\n\nif (vi, vj) \u2208 E\n\n\u2208 RD, otherwise\n\n, where (cid:126)0 is the zero vector in RD.\n\n(12)\n\nIts space cost is O(Dn2). Now computing the gradient-related terms \u02c6\u03a8(cid:52) \u2217 \u02c6\u03a8(cid:52), (cid:52) = (1), (2), and\n( \u02c6\u03a8(1) (cid:12) X) \u2217 \u02c6\u03a8(2) in (9) becomes rather simple. We \ufb01rst slice \u02c6\u03a8(cid:52) into D matrices \u02c6\u03a8(cid:52)(:, :, i) \u2208\nRn\u00d7n, i = 1, 2, ..., D. Then it can be easily shown that\n\n(cid:114)\n\n2\nD\n\n(cid:2) cos(\u03c9T\n(cid:40) \u02c6\u03c8((cid:126)qij) \u2208 RD,\n\nD(cid:88)\n\ni=1\n\nD(cid:88)\n\ni=1\n\n\u02c6\u03a8(cid:52) \u2217 \u02c6\u03a8(cid:52) =\n\n\u02c6\u03a8(cid:52)(:, :, i) \u02c6\u03a8(cid:52)(:, :, i), and ( \u02c6\u03a8(1) (cid:12) X) \u2217 \u02c6\u03a8(2) =\n\n\u02c6\u03a8(1)(:, :, i)X \u02c6\u03a8(2)(:, :, i),\n\n(13)\nwhose the \ufb01rst and second term respectively involves D and 2D matrix multiplications of the size\nn \u00d7 n. Hence, the time complexity is reduced to O(Dn3). Moreover, gradient computations with\n(13) are highly parallizable, which also contributes to scalability.\n\n5 Entropy-regularized Frank-Wolfe optimization algorithm\n\nThe state-of-the-art method for optimizing problem (8) is the Frank-Wolfe algorithm [29, 25, 37, 49],\nwhose every iteration involves linear programming to obtain optimal direction Y \u2217, i.e.,\n\nY \u2217 = argminY \u2208D (cid:104)(cid:79)J\u03b1(X), Y (cid:105)F,\n\n(14)\nwhich is usually solved by the Hungarian algorithm [20]. Optimizing J\u03b1 may need to\ncall the Hungarian algorithm many times, which is quite time-consuming for large graphs.\nIn this work, instead of minimizing J\u03b1(X) in (8), we\nconsider the following problem,\n\nn 1, X T 1 = 1\n\nH(X) = (cid:80)n\n\nmin\nX\nwhere Dn = {X \u2208 Rn\u00d7n\n\nF\u03b1(X) = J\u03b1(X)+\u03bbH(X)\n+ |X1 = 1\n\ns.t. X \u2208 Dn,\n(15)\nn 1},\ni,j=1 Xij log Xij is the negative en-\ntropy, and the node af\ufb01nity matrix KN in J\u03b1(X)\n(see (5) and (8)) is normalized as KN \u2192 1\nn KN to\nbalance the node and edge af\ufb01nity terms. The ob-\nservation is that if \u03bb is set to be small enough, the\nsolution of (15), after being multiplied by n, will ap-\nproximate that of the original QAP (8) as much as\npossible. We design the entropy-regularized Frank-\nWolfe algorithm (\"EnFW\" for short) for optimizing (15), in each outer iteration of which we solve\nthe following nonlinear problem.\n\nFigure 2: Hungarian vs Sinkhorn.\n\nmin (cid:104)(cid:79)J\u03b1(X), Y (cid:105)F + \u03bbH(Y )\n\ns.t. Y \u2208 Dn.\n\n(16)\n\n5\n\n5001000150020002500300035004000n10-1100101102103time (s)HungarianSinkhorn=0.001\fNote that (16) can be extremely ef\ufb01ciently solved by the Sinkhorn-Knopp algorithm [10]. Theo-\nretically, the Sinkhorn-Knopp algorithm converges at the linear rate, i.e., 0 < lim sup(cid:107)Yk+1 \u2212\nY \u2217(cid:107)/(cid:107)Yk \u2212 Y \u2217(cid:107) < 1. An empirical comparison between the runtimes of these two algorithms is\nshown in Fig. 2, where we can see that the Sinkhorn-Knopp algorithm for solving (16) is much faster\nthan the Hungarian algorithm for solving (14).\nThe EnFW algorithm description: We \ufb01rst give necessary de\ufb01nitions. Write the quadratic function\nJ\u03b1(X +s(Y \u2212X)) = J\u03b1(X)+s(cid:104)(cid:79)J\u03b1(X), Y \u2212X(cid:105)F + 1\n2 vec(Y \u2212X)T\u22072J\u03b1(X)vec(Y \u2212X)s2.\nThen, we de\ufb01ne the coef\ufb01cient of the quadratic term as\n\nQ(X, Y ) (cid:44) 1\n2\n\nvec(Y \u2212 X)T \u22072J\u03b1(X)vec(Y \u2212 X) =\n\n(17)\nwhere the second equality holds because J\u03b1 is a quadratic function. Next, similar to the original FW\nalgorithm, we de\ufb01ne the nonnegative gap function g(X) as\n\n(cid:104)(cid:79)J\u03b1(Y \u2212 X), Y \u2212 X(cid:105)F,\n\n1\n2\n\ng(X) (cid:44) (cid:104)(cid:79)J\u03b1(X), X(cid:105)F + \u03bbH(X) \u2212 min\nY \u2208Dn\n\n(cid:104)(cid:79)J\u03b1(X), Y (cid:105)F + \u03bbH(Y ).\n\n(18)\n\nProposition 2. If X\u2217 is an optimal solution of (15), then g(X\u2217) = 0.\nTherefore, the gap function characterize the necessary condition for optimal solutions. Note that\nfor any X \u2208 Dn, if g(X) = 0, then we say \"X is a \ufb01rst-order stationary point\". Now with the\nde\ufb01nitions of Q(X, Y ) and g(X), we detail the EnFW procedure in Algorithm 1.\n\nAlgorithm 1 The EnFW optimization algorithm for minimizing F\u03b1 (15)\n1: Initialize X0 \u2208 Dn\n2: while not converge do\n3: Compute the gradient \u2207J\u03b1(Xt) based on (9) or (13),\n4: Obtain the optimal direction Yt by solving (16), i.e., Yt = argminY \u2208Dn (cid:104)(cid:79)J\u03b1(Xt), Y (cid:105)F + \u03bbH(Y ),\n5: Compute Gt = g(Xt) and Qt = Q(Xt, Yt),\n6: Determine the stepsize st: If Qt \u2264 0; st = 1, else st = min{Gt/(2Qt), 1},\n7: Update Xt+1 = Xt + st(Yt \u2212 Xt).\n8: end\n9: Output the solution X\u2217\n\u03b1.\n\n\u03b1, \u03b1 = 0 : (cid:52)\u03b1 : 1, we discretize nX\u2217\n\nAfter obtaining the optimal solution path X\u2217\n1 by the Hungarian\n[20] or the greedy discretization algorithm [5] to get the binary matching matrix. We next highlight\nthe differences between the EnFW algorithm and the original FW algorithm: (i) We \ufb01nd the optimal\ndirection by solving a nonlinear convex problem (16) with the ef\ufb01cient Sinkhorn-Knopp algorithm,\ninstead of solving the linear problem (14). (ii) We give an explicit formula for computing the stepsize\ns, instead of making a linear search on [0, 1] for optimizing F\u03b1(X + s(Y \u2212 X)) or estimating the\nLipschitz constant of \u2207F\u03b1 [32].\n\n5.1 Convergence analysis\n\nIn this part, we present the convergence properties of the proposed EnFW algorithm, including the\nsequentially decreasing property of the objective function and the convergence rates.\nTheorem 1. The generated objective function value sequence, {F\u03b1(Xt)}t=0, will decreasingly\nconverge. The generated points sequence, {Xt}t=0 \u2286 Dn \u2286 Rn\u00d7n, will weakly converge to the\n\ufb01rst-order stationary point, at the rate O(\n\n), i.e,\n\n1\u221a\n\ng(Xt) \u2264 2 max{(cid:52)0,(cid:112)L(cid:52)0/n}\n\n\u221a\n\nt+1\n\n,\n\nT + 1\n\nmin\n1\u2264t\u2264T\n\n(19)\nwhere (cid:52)0 = F\u03b1(X0) \u2212 minX\u2208Dn F\u03b1(X), and L is the largest absolute eigenvalue of \u22072J\u03b1(X).\nIf J\u03b1(X) is convex, which happens when \u03b1 is small (see (8)), then we have a tighter bound O( 1\nT +1 ).\nTheorem 2. If J\u03b1(X) is convex, we have F\u03b1(XT ) \u2212 F\u03b1(X\u2217) \u2264 4L\nNote that in both cases, convex and non-convex, our EnFW achieves the same (up to a constant coef-\n\ufb01cient) convergence rate with the original FW algorithm (see [17] and [32]). Thanks to the ef\ufb01ciency\nof the Sinkhorn-Knopp algorithm, we need much less time to \ufb01nish each iteration. Therefore, our\noptimization algorithm is more computationally ef\ufb01cient than the original FW algorithm.\n\nn(T +1) .\n\n6\n\n\f6 Experiments\n\nIn this section, we conduct extensive experiments to demonstrate the matching performance and\nscalability of our kernelized graph matching framework. We implement all the algorithms using\nMatlab on an Intel i7-7820HQ, 2.90 GHz CPU with 64 GB RAM.\nNotations: We use KerGMI to denote our algorithm when we use exact edge af\ufb01nity kernels, and\nuse KerGMII to denote it when we use approximate explicit feature maps.\nBaseline methods: We compare our algorithm with many state-of-the-art graph (network) matching\nalgorithms: (i) Integer projected \ufb01xed point method (IPFP) [25], (ii) Spectral matching with af\ufb01ne\nconstraints (SMAC) [9], (iii) Probabilistic graph matching (PM) [46] , (iv) Re-weighted random\nwalk matching (RRWM) [5], (v) Factorized graph matching (FGM) [48], (vi) Branch path following\nfor graph matching (BPFG) [39], (vii) Graduated assignment graph matching (GAGM) [14], (viii)\nGlobal network alignment using multiscale spectral signatures (GHOST) [31], (ix) Triangle alignment\n(TAME) [30], and (x) Maximizing accuracy in global network alignment (MAGNA) [35]. Note that\nGHOST, TAME, and MAGNA are popular protein-protein interaction (PPI) networks aligners.\nSettings: For all the baseline methods, we used the parameters recommended in the public code. For\nour method, if not speci\ufb01ed, we set the regularization parameter (see (15)) \u03bb = 0.005 and the path\nfollowing parameters \u03b1 = 0 : 0.1 : 1. We use the Hungarian algorithm for \ufb01nal discretization. We\nrefer to the supplementary material for other implementation details.\n\n6.1 Synthetic datasets\n\nij \u2212 q2\n\nij and q2\n\nij is computed as kE(q1\n\nij, q2\n\np(i)p(j) = q1\n\np(i)p(j) \u2208 E2 has the attribute q2\n\nij \u223c U[0, 1]. The corresponding edge e2\n\nWe evaluate algorithms on the synthetic Erdos\u2013R\u00e9nyi [12] random graphs, following the experimental\nprotocol in [14, 48, 5]. For each trial, we generate two graphs: the reference graph G1 and the\nperturbed graph G2, each of which has nin inlier nodes and nout outlier nodes. Each edge in G1 is\nij \u2208 E1 are associated with the edge\nrandomly generated with probability \u03c1 \u2208 [0, 1]. The edges e1\nattributes q1\nij + \u0001,\nwhere p is a permutation map for inlier nodes, and \u0001 \u223c N (0, \u03c32) is the Gaussian noise. For\nthe baseline methods, the edge af\ufb01nity value between q1\nij) =\nexp(\u2212(q1\nij)2/0.15). For our method, we use the Fourier random features (11) to approximate\nthe Gaussian kernel, and represent each graph by an (nin + nout) \u00d7 (nin + nout) array in RD. We\nset the parameter \u03b3 = 5 and the dimension D = 20.\nComparing matching accuracy. We perform the comparison under three parameter settings, in all\nof which we set nin = 50. Note that different from the standard protocol where nin = 20 [48], we\nuse relatively large graphs to highlight the advantage of our KerGMII. (i) We change the number\nof outlier nodes, nout, from 0 to 50 while \ufb01xing the noise, \u03c3 = 0, and the edge density, \u03c1 = 1. (ii)\nWe change \u03c3 from 0 to 0.2 while \ufb01xing nout = 0 and \u03c1 = 1. (iii) We change \u03c1 from 0.3 to 1 while\n\ufb01xing nout = 5 and \u03c3 = 0.1. For all cases in these settings, we repeat the experiments 100 times and\nreport the average accuracy and standard error in Fig. 3 (a). Clearly, our KerGMII outpeforms all the\nbaseline methods with statistical signi\ufb01cance.\nComparing scalability. To fairly compare the scalability of different algorithms, we consider the\nexact matching between fully connected graphs, i.e., nout = 0, \u03c3 = 0, and \u03c1 = 1. We change the\nnumber of nodes, n (= nin), from 50 to 2000, and report the CPU time of each algorithm in Fig. 3\n(b). We can see that all the baseline methods can handle only graphs with fewer than 200 nodes\nbecause of the expensive space cost of matrix K (see (3)). However, KerGMII can \ufb01nish Lawler\u2019s\ngraph matching problem with 2000 nodes in reasonable time.\nAnalyzing parameter sensitivity. To analyze the parameter sensitivity of KerGMII, we vary the\nregularization parameter, \u03bb, and the dimension, D, of Fourier random features. We conduct large\nsubgraph matching experiments by setting nin = 500, nout = 0 : 100 : 500, \u03c1 = 1, and \u03c3 = 0. We\nrepeat the experiments 50 times and report the average accuracies and standard errors. In Fig. 4,\nwe show the results under different \u03bb and different D. We can see that (i) smaller \u03bb leads to better\nperformance, which can be easily understood because the entropy regularizer will perturb the original\noptimal solution, and (ii) the dimension D does not much affect on KerGMII, which implies that in\npractice, we can use relatively small D for reducing the time and space complexity.\n\n7\n\n\fFigure 3: Comparison of graph matching on synthetic datasets.\n\n(a)\n\n(b)\n\nFigure 4: (a) Parameter sensitivity study of the regularizer \u03bb. (b) Parameter sensitivity study of the\ndimension, D, of the random Fourier feature.\n\nij \u2212 q2\n\nImage datasets\n\n6.2\nThe CMU House Sequence dataset has 111 frames of a house, each of which has 30 labeled\nlandmarks. We follow the experimental protocol in [48, 39]. We match all the image pairs, spaced by\n0:10:90 frames. We consider two node settings: (n1, n2) = (30, 30) and (n1, n2) = (20, 30). We\nbuild graphs by using Delaunay triangulation [23] to connect landmarks. The edge attributes are the\npairwise distances between nodes. For all methods, we compute the edge af\ufb01nity as kE(q1\nab) =\nexp(\u2212(q1\nab)2/2500). In Fig. 5, we report the average matching accuracy and objective function\n(3) value ratio for every gap. It can be seen that on this dataset, KerGMI and FGM achieve the best\nperformance, and are slightly better than BPFG when outliers exist, i.e., (n1, n2) = (20, 30).\nThe Pascal dataset [26] has 20 pairs of motorbike images and 30 pairs of car images. For each\npair, the detected feature points and manually labeled correspondences are provided. Following\n[48, 39], we randomly select 0:2:20 outliers from the background to compare different methods.\nFor each node, vi, its attribute, pi, is assigned as its orientation of the normal vector at that point\nto the contour where the point was sampled. Nodes are connected by Delaunay triangulation [23].\nFor each edge, eij, its attribute (cid:126)qij equals [dij, \u03b8ij]T , where dij is the distance between vi and\nvj, and \u03b8ij is the absolute angle between the edge and the horizontal line. For all methods, the\nnode af\ufb01nity is computed as kN (pi, pj) = exp(\u2212|pi \u2212 pj|). The edge af\ufb01nity is computed as\nab|/2). Fig. 6 (a) shows a matching result of KerGMI.\nkE((cid:126)q1\n\nab) = exp(\u2212|d1\n\nij, q2\n\nij, (cid:126)q2\n\nij \u2212 d2\n\nab|/2\u2212|\u03b81\n\nij \u2212 \u03b82\n\nFigure 5: Comparison of graph matching on the CMU house dataset.\n\n8\n\n01020304050outliers00.20.40.60.81Accuracy00.040.080.120.160.2noise00.20.40.60.81Accuracy0.30.40.50.60.70.80.91density00.20.40.60.81Accuracy5010020050010002000# nodes10-1100101102103104time (s)(a)(b)0100200300400500outliers00.20.40.60.81Accuracy=0.005=0.006=0.007=0.008=0.009=0.01=0.012=0.016=0.02=0.03=0.04=0.05D=200100200300400500outliers00.20.40.60.81AccuracyD=10D=20D=50=0.005102030405060708090100gap0.20.40.60.8Accuracy102030405060708090100gap0.80.850.90.951objective ratio102030405060708090100gap0.850.90.951Accuracy102030405060708090100gap0.9250.950.9751objective ratio(a)(b)(20, 30)(20, 30)(30, 30)(30, 30)\f(a)\n\n(b)\n\nFigure 6: (a) A matching example for a pair of motorbike images generated by KerGMI, where\ngreen and red lines respectively indicate correct and incorrect matches. (b) Comparison of graph\nmatching on the Pascal dataset.\n\nPPI\n\nnetwork\n\nhas\n\naligners\n\nbecause\n\ndataset\n\nknown\n\nis\n\npopularly\nnode\n\ntrue\n\nto\n\nused\neval-\ncorrespondences.\n\n(yeast) PPI network [7]\n\nIn Fig. 6 (b), we report the matching accuracies and CPU running time. From the perspective of\nmatching accuracy, KerGMI, BPFG, and FGM consistently outperforms other methods. When the\nnumber of outliers increases, KerGMI and BPFG perform slightly better than FGM. However, from\nthe perspective of running time, the time cost of BPFG is much higher than that of the others.\n6.3 The protein-protein interaction network dataset\nThe S.cerevisiae\nuate\nit\nIt consists of an unweighted high-con\ufb01dence\nPPI network with 1004 proteins (nodes) and\n8323 PPIs (edges), and \ufb01ve noisy PPI net-\nworks generated by adding 5%, 10%, 15%,\n20%, 25% low-con\ufb01dence PPIs. We do graph\nmatching between the high-con\ufb01dence network\nwith every noisy network. To apply KerGM,\nwe generate edge attributes by the heat dif-\nfusion matrix [16, 6], Ht = exp(\u2212tL) =\ni \u2208 Rn\u00d7n, where L\nis the normalized Laplacian matrix [6], and\n{(\u03bbi, (cid:126)ui)}n\ni=1 are eigenpairs of L. The edge\nattributes vector (cid:126)qij is assigned as (cid:126)qij =\n[H5(i, j), H10(i, j), H15(i, j), H20(i, j)]T \u2208\nR4. We use the Fourier random features (11), and set D = 50 and \u03b3 = 200. We compare\n3 with the state-of-the-art PPI aligners: TAME, GHOST, and MAGNA. In Fig. 7, we report\nKerGMII\nthe matching accuracies. Clearly, KerGMII signi\ufb01cantly outperforms the baselines. Especially when\nthe noise level are 20% or 25%, KerGMII\u2019s accuracies are more than 50 percentages higher than\nthose of other algorithms.\n\n(cid:80)n\ni=1 exp(\u2212\u03bbit)(cid:126)ui(cid:126)uT\n\nFigure 7: Results on PPI networks.\n\n7 Conclusion\n\nIn this work, based on a mild assumption regarding edge af\ufb01nity values, we provided KerGM, a\nunifying framework for Koopman-Beckmann\u2019s and Lawler\u2019s QAPs, within which both two QAPs\ncan be considered as the alignment between arrays in RKHS. Then we derived convex and concave\nrelaxations and the corresponding path-following strategy. To make KerGM more scalable to large\ngraphs, we developed the computationally ef\ufb01cient entropy-regularized Frank-Wolfe optimization\nalgorithm. KerGM achieved promising performance on both image and biology datasets. Thanks to\nits scalability, we believe KerGM can be potentially useful for many applications in the real world.\n\n8 Acknowledgment\n\nThis work was supported in part by the AFOSR grant FA9550-16-1-0386.\n\n3To the best our knowledge, KerGM is the \ufb01rst one that uses Lawler\u2019s graph matching formulation to solve\n\nthe PPI network alignment problem.\n\n9\n\n02468101214161820outliers0.20.40.60.8Accuracy02468101214161820outliers0100200300400500600700800900time (s)5%10%15%20%25%Noise level00.20.40.60.81Node accuracy\fReferences\n[1] Yonathan A\ufb02alo, Alexander Bronstein, and Ron Kimmel. On convex relaxation of graph\n\nisomorphism. Proceedings of the National Academy of Sciences, 112(10):2942\u20132947, 2015.\n\n[2] HA Almohamad and Salih O Duffuaa. A linear programming approach for the weighted\ngraph matching problem. IEEE Transactions on pattern analysis and machine intelligence,\n15(5):522\u2013525, 1993.\n\n[3] William Brendel and Sinisa Todorovic. Learning spatiotemporal graphs of human activities. In\n\n2011 International Conference on Computer Vision, pages 778\u2013785. IEEE, 2011.\n\n[4] Altannar Chinchuluun, Enkhbat Rentsen, and Panos M Pardalos. A numerical method for\nconcave programming problems. In Continuous Optimization, pages 251\u2013273. Springer, 2005.\n\n[5] Minsu Cho, Jungmin Lee, and Kyoung Mu Lee. Reweighted random walks for graph matching.\n\nIn European conference on Computer vision, pages 492\u2013505. Springer, 2010.\n\n[6] Fan RK Chung and Fan Chung Graham. Spectral graph theory. Number 92. American\n\nMathematical Soc., 1997.\n\n[7] Sean R Collins, Patrick Kemmeren, Xue-Chu Zhao, Jack F Greenblatt, Forrest Spencer,\nFrank CP Holstege, Jonathan S Weissman, and Nevan J Krogan. Toward a comprehensive atlas\nof the physical interactome of saccharomyces cerevisiae. Molecular & Cellular Proteomics,\n6(3):439\u2013450, 2007.\n\n[8] Donatello Conte, Pasquale Foggia, Carlo Sansone, and Mario Vento. Thirty years of graph\nmatching in pattern recognition. International journal of pattern recognition and arti\ufb01cial\nintelligence, 18(03):265\u2013298, 2004.\n\n[9] Timothee Cour, Praveen Srinivasan, and Jianbo Shi. Balanced graph matching. In Advances in\n\nNeural Information Processing Systems, pages 313\u2013320, 2007.\n\n[10] Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances\n\nin neural information processing systems, pages 2292\u20132300, 2013.\n\n[11] Ahed Elmsallati, Connor Clark, and Jugal Kalita. Global alignment of protein-protein interaction\nnetworks: A survey. IEEE/ACM transactions on computational biology and bioinformatics,\n13(4):689\u2013705, 2015.\n\n[12] P ERDdS and A R&wi. On random graphs i. Publ. Math. Debrecen, 6:290\u2013297, 1959.\n\n[13] Utkarsh Gaur, Yingying Zhu, Bi Song, and A Roy-Chowdhury. A \u201cstring of feature graphs\u201d\nmodel for recognition of complex activities in natural videos. In 2011 International Conference\non Computer Vision, pages 2595\u20132602. IEEE, 2011.\n\n[14] Steven Gold and Anand Rangarajan. A graduated assignment algorithm for graph matching.\n\nIEEE Transactions on pattern analysis and machine intelligence, 18(4):377\u2013388, 1996.\n\n[15] Thomas Hofmann, Bernhard Sch\u00f6lkopf, and Alexander J Smola. Kernel methods in machine\n\nlearning. The annals of statistics, pages 1171\u20131220, 2008.\n\n[16] Nan Hu, Raif M Rustamov, and Leonidas Guibas. Stable and informative spectral signatures\nfor graph matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern\nRecognition, pages 2305\u20132312, 2014.\n\n[17] Martin Jaggi. Revisiting frank-wolfe: Projection-free sparse convex optimization. In ICML (1),\n\npages 427\u2013435, 2013.\n\n[18] Tjalling C Koopmans and Martin Beckmann. Assignment problems and the location of economic\n\nactivities. Econometrica: journal of the Econometric Society, pages 53\u201376, 1957.\n\n[19] Nils M. Kriege and Petra Mutzel. Subgraph matching kernels for attributed graphs. In ICML,\n\n2012.\n\n10\n\n\f[20] Harold W Kuhn. The hungarian method for the assignment problem. Naval research logistics\n\nquarterly, 2(1-2):83\u201397, 1955.\n\n[21] Yam Kushinsky, Haggai Maron, Nadav Dym, and Yaron Lipman. Sinkhorn algorithm for lifted\n\nassignment problems. SIAM Journal on Imaging Sciences, 12(2):716\u2013735, 2019.\n\n[22] Eugene L Lawler. The quadratic assignment problem. Management science, 9(4):586\u2013599,\n\n1963.\n\n[23] D. T. Lee and B. J. Schachter. Two algorithms for constructing a delaunay triangulation.\n\nInternational Journal of Computer & Information Sciences, 9(3):219\u2013242, Jun 1980.\n\n[24] Jungmin Lee, Minsu Cho, and Kyoung Mu Lee. Hyper-graph matching via reweighted random\n\nwalks. In CVPR 2011, pages 1633\u20131640. IEEE, 2011.\n\n[25] Marius Leordeanu, Martial Hebert, and Rahul Sukthankar. An integer projected \ufb01xed point\nmethod for graph matching and map inference. In Advances in neural information processing\nsystems, pages 1114\u20131122, 2009.\n\n[26] Marius Leordeanu, Rahul Sukthankar, and Martial Hebert. Unsupervised learning for graph\n\nmatching. International journal of computer vision, 96(1):28\u201345, 2012.\n\n[27] Zhi-Yong Liu and Hong Qiao. Gnccp\u2014graduated nonconvexityand concavity procedure. IEEE\n\ntransactions on pattern analysis and machine intelligence, 36(6):1258\u20131267, 2014.\n\n[28] Jo\u00e3o Maciel and Jo\u00e3o P Costeira. A global solution to sparse correspondence problems. IEEE\n\nTransactions on Pattern Analysis & Machine Intelligence, (2):187\u2013199, 2003.\n\n[29] Haggai Maron and Yaron Lipman. (probably) concave graph matching. In Advances in Neural\n\nInformation Processing Systems, pages 408\u2013418, 2018.\n\n[30] Shahin Mohammadi, David F Gleich, Tamara G Kolda, and Ananth Grama. Triangular alignment\ntame: A tensor-based approach for higher-order network alignment. IEEE/ACM Transactions\non Computational Biology and Bioinformatics (TCBB), 14(6):1446\u20131458, 2017.\n\n[31] Rob Patro and Carl Kingsford. Global network alignment using multiscale spectral signatures.\n\nBioinformatics, 28(23):3105\u20133114, 2012.\n\n[32] Fabian Pedregosa, Armin Askari, Geoffrey Negiar, and Martin Jaggi. Step-size adaptivity in\n\nprojection-free optimization. arXiv preprint arXiv:1806.05123, 2018.\n\n[33] Ali Rahimi and Benjamin Recht. Random features for large-scale kernel machines. In Advances\n\nin neural information processing systems, pages 1177\u20131184, 2008.\n\n[34] Carl Edward Rasmussen. Gaussian processes in machine learning. In Summer School on\n\nMachine Learning, pages 63\u201371. Springer, 2003.\n\n[35] Vikram Saraph and Tijana Milenkovi\u00b4c. Magna: maximizing accuracy in global network\n\nalignment. Bioinformatics, 30(20):2931\u20132940, 2014.\n\n[36] Christian Schellewald, Stefan Roth, and Christoph Schn\u00f6rr. Evaluation of convex optimization\ntechniques for the weighted graph-matching problem in computer vision. In Joint Pattern\nRecognition Symposium, pages 361\u2013368. Springer, 2001.\n\n[37] Joshua T Vogelstein, John M Conroy, Vince Lyzinski, Louis J Podrazik, Steven G Kratzer,\nEric T Harley, Donniell E Fishkind, R Jacob Vogelstein, and Carey E Priebe. Fast approximate\nquadratic programming for graph matching. PLOS one, 10(4):e0121002, 2015.\n\n[38] Tao Wang, Haibin Ling, Congyan Lang, and Songhe Feng. Graph matching with adaptive and\nbranching path following. IEEE transactions on pattern analysis and machine intelligence,\n40(12):2853\u20132867, 2018.\n\n[39] Tao Wang, Haibin Ling, Congyan Lang, and Jun Wu. Branching path following for graph\n\nmatching. In European Conference on Computer Vision, pages 508\u2013523. Springer, 2016.\n\n11\n\n\f[40] Lingfei Wu, Ian EH Yen, Jie Chen, and Rui Yan. Revisiting random binning features: Fast con-\nvergence and strong parallelizability. In Proceedings of the 22nd ACM SIGKDD International\nConference on Knowledge Discovery and Data Mining, pages 1265\u20131274. ACM, 2016.\n\n[41] Lingfei Wu, Ian En-Hsu Yen, Zhen Zhang, Kun Xu, Liang Zhao, Xi Peng, Yinglong Xia,\nand Charu Aggarwal. Scalable global alignment graph kernel using random features: From\nnode embedding to graph embedding. In Proceedings of the 25th ACM SIGKDD International\nConference on Knowledge Discovery & Data Mining, pages 1418\u20131428, 2019.\n\n[42] Junchi Yan, Jun Wang, Hongyuan Zha, Xiaokang Yang, and Stephen Chu. Consistency-driven\nalternating optimization for multigraph matching: A uni\ufb01ed approach. IEEE Transactions on\nImage Processing, 24(3):994\u20131009, 2015.\n\n[43] Junchi Yan, Xu-Cheng Yin, Weiyao Lin, Cheng Deng, Hongyuan Zha, and Xiaokang Yang.\nA short survey of recent advances in graph matching. In Proceedings of the 2016 ACM on\nInternational Conference on Multimedia Retrieval, pages 167\u2013174. ACM, 2016.\n\n[44] Tianshu Yu, Junchi Yan, Yilin Wang, Wei Liu, et al. Generalizing graph matching beyond\nquadratic assignment model. In Advances in Neural Information Processing Systems, pages\n853\u2013863, 2018.\n\n[45] Mikhail Zaslavskiy, Francis Bach, and Jean-Philippe Vert. A path following algorithm for the\ngraph matching problem. IEEE Transactions on Pattern Analysis and Machine Intelligence,\n31(12):2227\u20132242, 2009.\n\n[46] Ron Zass and Amnon Shashua. Probabilistic graph and hypergraph matching. In 2008 IEEE\n\nConference on Computer Vision and Pattern Recognition, pages 1\u20138. IEEE, 2008.\n\n[47] Zhen Zhang, Mianzhi Wang, Yijian Xiang, Yan Huang, and Arye Nehorai. Retgk: Graph\nkernels based on return probabilities of random walks. In Advances in Neural Information\nProcessing Systems, pages 3964\u20133974, 2018.\n\n[48] Feng Zhou and Fernando De la Torre. Factorized graph matching. In 2012 IEEE Conference on\n\nComputer Vision and Pattern Recognition, pages 127\u2013134. IEEE, 2012.\n\n[49] Feng Zhou and Fernando De la Torre. Factorized graph matching. IEEE transactions on pattern\n\nanalysis and machine intelligence, 38(9):1774\u20131789, 2015.\n\n12\n\n\f", "award": [], "sourceid": 1849, "authors": [{"given_name": "Zhen", "family_name": "Zhang", "institution": "WASHINGTON UNIVERSITY IN ST.LOUIS"}, {"given_name": "Yijian", "family_name": "Xiang", "institution": "Washington University in St. Louis"}, {"given_name": "Lingfei", "family_name": "Wu", "institution": "IBM Research AI"}, {"given_name": "Bing", "family_name": "Xue", "institution": "Washington University in St. Louis"}, {"given_name": "Arye", "family_name": "Nehorai", "institution": "WASHINGTON UNIVERSITY IN ST.LOUIS"}]}