{"title": "Convex Relaxations for Permutation Problems", "book": "Advances in Neural Information Processing Systems", "page_first": 1016, "page_last": 1024, "abstract": "Seriation seeks to reconstruct a linear order between variables using unsorted similarity information. It has direct applications in archeology and shotgun gene sequencing for example. We prove the equivalence between the seriation and the combinatorial 2-sum problem (a quadratic minimization problem over permutations) over a class of similarity matrices. The seriation problem can be solved exactly by a spectral algorithm in the noiseless case and we produce a convex relaxation for the 2-sum problem to improve the robustness of solutions in a noisy setting. This relaxation also allows us to impose additional structural constraints on the solution, to solve semi-supervised seriation problems. We present numerical experiments on archeological data, Markov chains and gene sequences.", "full_text": "Convex Relaxations for Permutation Problems\n\nFajwel Fogel\n\nPalaiseau, France\n\nRodolphe Jenatton\n\nPalaiseau, France\n\nC.M.A.P., \u00b4Ecole Polytechnique,\n\nCRITEO, Paris & C.M.A.P., \u00b4Ecole Polytechnique,\n\nfogel@cmap.polytechnique.fr\n\njenatton@cmap.polytechnique.fr\n\nFrancis Bach\n\nINRIA, SIERRA Project-Team & D.I.,\n\u00b4Ecole Normale Sup\u00b4erieure, Paris, France.\n\nfrancis.bach@ens.fr\n\nAlexandre d\u2019Aspremont\nCNRS & D.I., UMR 8548,\n\n\u00b4Ecole Normale Sup\u00b4erieure, Paris, France.\n\naspremon@ens.fr\n\nAbstract\n\nSeriation seeks to reconstruct a linear order between variables using unsorted sim-\nilarity information. It has direct applications in archeology and shotgun gene se-\nquencing for example. We prove the equivalence between the seriation and the\ncombinatorial 2-SUM problem (a quadratic minimization problem over permuta-\ntions) over a class of similarity matrices. The seriation problem can be solved\nexactly by a spectral algorithm in the noiseless case and we produce a convex re-\nlaxation for the 2-SUM problem to improve the robustness of solutions in a noisy\nsetting. This relaxation also allows us to impose additional structural constraints\non the solution, to solve semi-supervised seriation problems. We present numeri-\ncal experiments on archeological data, Markov chains and gene sequences.\n\n1\n\nIntroduction\n\nWe focus on optimization problems written over the set of permutations. While the relaxation tech-\nniques discussed in what follows are applicable to a much more general setting, most of the paper\nis centered on the seriation problem: we are given a similarity matrix between a set of n variables\nand assume that the variables can be ordered along a chain, where the similarity between variables\ndecreases with their distance within this chain. The seriation problem seeks to reconstruct this linear\nordering based on unsorted, possibly noisy, similarity information.\nThis problem has its roots in archeology [1]. It also has direct applications in e.g. envelope re-\nduction algorithms for sparse linear algebra [2], in identifying interval graphs for scheduling [3], or\nin shotgun DNA sequencing where a single strand of genetic material is reconstructed from many\ncloned shorter reads, i.e. small, fully sequenced sections of DNA [4, 5]. With shotgun gene sequenc-\ning applications in mind, many references focused on the Consecutive Ones Problem (C1P) which\nseeks to permute the rows of a binary matrix so that all the ones in each column are contiguous. In\nparticular, [3] studied further connections to interval graphs and [6] crucially showed that a solution\nto C1P can be obtained by solving the seriation problem on the squared data matrix. We refer the\nreader to [7, 8, 9] for a much more complete survey of applications.\nOn the algorithmic front, the seriation problem was shown to be NP-Complete by [10]. Archeo-\nlogical examples are usually small scale and earlier references such as [1] used greedy techniques\nto reorder matrices. Similar techniques were, and are still used to reorder genetic data sets. More\ngeneral ordering problems were studied extensively in operations research, mostly in connection\nwith the Quadratic Assignment Problem (QAP), for which several convex relaxations were studied\nin e.g. [11, 12]. Since a matrix is a permutation matrix if and only if it is both orthogonal and\n\n1\n\n\fdoubly stochastic, much work also focused on producing semide\ufb01nite relaxations to orthogonal-\nity constraints [13, 14]. These programs are convex hence tractable but the relaxations are usually\nvery large and scale poorly. More recently however, [15] produced a spectral algorithm that exactly\nsolves the seriation problem in a noiseless setting, in results that are very similar to those obtained\non the interlacing of eigenvectors for Sturm Liouville operators. They show that for similarity ma-\ntrices computed from serial variables (for which a total order exists), the ordering of the second\neigenvector of the Laplacian (a.k.a. the Fiedler vector) matches that of the variables.\nHere, we show that the solution of the seriation problem explicitly minimizes a quadratic function.\nWhile this quadratic problem was mentioned explicitly in [15], no connection was made between\nthe combinatorial and spectral solutions. Our result shows in particular that the 2-SUM minimiza-\ntion problem mentioned in [10], and de\ufb01ned below, is polynomially solvable for matrices coming\nfrom serial data. This result allows us to write seriation as a quadratic minimization problem over\npermutation matrices and we then produce convex relaxations for this last problem. This relaxation\nappears to be more robust to noise than the spectral or combinatorial techniques in a number of\nexamples. Perhaps more importantly, it allows us to impose additional structural constraints to solve\nsemi-supervised seriation problems. We also develop a fast algorithm for projecting on the set of\ndoubly stochastic matrices, which is of independent interest.\nThe paper is organized as follows.\nIn Section 2, we show a decomposition result for similarity\nmatrices formed from the C1P problem. This decomposition allows to make the connection be-\ntween the seriation and 2-SUM minimization problems on these matrices. In Section 3 we use these\nresults to write convex relaxations of the seriation problem by relaxing permutation matrices as dou-\nbly stochastic matrices in the 2-SUM minimization problem. We also brie\ufb02y discuss algorithmic\nand computational complexity issues. Finally Section 4 discusses some applications and numerical\nexperiments.\n\nNotation. We write P the set of permutations of {1, . . . , n}. The notation \u21e1 will refer to a per-\nmuted vector of {1, . . . , n} while the notation \u21e7 (in capital letter) will refer to the correspond-\ning matrix permutation, which is a {0, 1} matrix such that \u21e7ij = 1 iff \u21e1(j) = i. For a vector\ny 2 Rn, we write var(y) its variance, with var(y) =Pn\ni=1 yi/n)2, we also write\ny[u,v] 2 Rvu+1 the vector (yu, . . . , yv)T . Here, ei 2 Rn is i-th Euclidean basis vector and 1 is\nthe vector of ones. We write Sn the set of symmetric matrices of dimension n, k\u00b7k F denotes the\nFrobenius norm and i(X) the ith eigenvalue (in increasing order) of X.\n\ni /n (Pn\n\ni=1 y2\n\n2 Seriation & consecutive ones\n\nGiven a symmetric, binary matrix A, we will focus on variations of the following 2-SUM combina-\ntorial minimization problem, studied in e.g. [10], and written\n\nminimize Pn\nsubject to \u21e1 2P .\n\ni,j=1 Aij(\u21e1(i) \u21e1(j))2\n\nThis problem is used for example to reduce the envelope of sparse matrices and is shown in [10,\nTh. 2.2] to be NP-Complete. When A has a speci\ufb01c structure, [15] show that a related matrix re-\nordering problem used for seriation can be solved explicitly by a spectral algorithm. However, the\nresults in [15] do not explicitly link spectral ordering and the optimum of (1). For some instances\nof A related to seriation and consecutive one problems, we show below that the spectral ordering\ndirectly minimizes the objective of problem (1). We \ufb01rst focus on binary matrices, then extend our\nresults to more general unimodal matrices.\n\n(1)\n\n(2)\n\n2.1 Binary matrices\nLet A 2 Sn and y 2 Rn, we focus on a generalization of the 2-SUM minimization problem\n\nminimize\nsubject to \u21e1 2P .\n\nf (y\u21e1) ,Pn\n\ni,j=1 Aij(y\u21e1(i) y\u21e1(j))2\n\nThe main point of this section is to show that if A is the permutation of a similarity matrix formed\nfrom serial data, then minimizing (2) recovers the correct variable ordering. We \ufb01rst introduce a few\nde\ufb01nitions following the terminology in [15].\n\n2\n\n\fDe\ufb01nition 2.1 We say that the matrix A 2 Sn is an R-matrix (or Robinson matrix) iff it is symmetric\nand satis\ufb01es Ai,j \uf8ff Ai,j+1 and Ai+1,j \uf8ff Ai,j in the lower triangle, where 1 \uf8ff j < i \uf8ff n.\nAnother way to write the R-matrix conditions is to impose Aij \uf8ff Akl if |ij|\uf8ff| kl| off-diagonal,\ni.e. the coef\ufb01cients of A decrease as we move away from the diagonal (cf. Figure 1).\n\nFigure 1: A Q-matrix A (see Def. 2.7), which has unimodal columns (left), its \u201ccircular square\u201d\nA AT (see Def. 2.8) which is an R-matrix (center), and a matrix a aT where a is a unimodal\nvector (right).\n\nDe\ufb01nition 2.2 We say that the {0, 1}-matrix A 2 Rn\u21e5m is a P-matrix (or Petrie matrix) iff for each\ncolumn of A, the ones form a consecutive sequence.\n\nAs in [15], we will say that A is pre-R (resp. pre-P) iff there is a permutation \u21e7 such that \u21e7A\u21e7T is\nan R-matrix (resp. \u21e7A is a P-matrix). We now de\ufb01ne CUT matrices as follows.\nDe\ufb01nition 2.3 For u, v 2 [1, n], we call CU T (u, v) the matrix such that\n\nCU T (u, v) =\u21e2 1\n\n0\n\nif u \uf8ff i, j \uf8ff v\notherwise,\n\ni.e. CU T (u, v) is symmetric, block diagonal and has one square block equal to one.\nThe motivation for this de\ufb01nition is that if A is a {0, 1} P-matrix, then AAT is a sum of CUT\nmatrices (with blocks generated by the columns of A). This means that we can start by studying\nproblem (2) on CUT matrices. We \ufb01rst show that the objective of (2) has a natural interpretation in\nthis case, as the variance of a subset of y under a uniform probability measure.\n\nLemma 2.4 Let A = CU T (u, v), then f (y) =Pn\ni,j=1 Aij(yi yj)2 = (v u + 1)2 var(y[u,v]).\nProof. We can writePij Aij(yi yj)2 = yT LAy where LA = diag(A1)A is the Laplacian of\nmatrix A, which is a block matrix equal to (v u + 1){i=j} 1 for u \uf8ff i, j \uf8ff v.\nThis last lemma shows that solving (2) for CUT matrices amounts to \ufb01nding a subset of y of size\n(u v + 1) with minimum variance. The next lemma characterizes optimal solutions of problem (2)\nfor CUT matrices and shows that its solution splits the coef\ufb01cients of y in two disjoint intervals.\n\nLemma 2.5 Suppose A = CU T (u, v), and write z = y\u21e1 the optimal solution to (2). If we call\nI = [u, v] and I c its complement in [1, n], then zj /2 [min(zI), max(zI)],\nfor all j 2 I c, in other\nwords, the coef\ufb01cients in zI and zI c belong to disjoint intervals.\n\nWe can use these last results to show that, at least for some vectors y, when A is an R-matrix, then\nthe solution y\u21e1 to (2) is monotonic.\nProposition 2.6 Suppose C 2 Sn is a {0, 1} pre-R matrix, A = C2, and yi = ai + b for i =\n1, . . . , n and a, b 2 R with a 6= 0. If \u21e7 is such that \u21e7C\u21e7T (hence \u21e7A\u21e7T ) is an R-matrix, then the\ncorresponding permutation \u21e1 solves the combinatorial minimization problem (2) for A = C2.\n\n3\n\n\fProof. Suppose C is {0, 1} pre-R, then C2 is pre-R and Lemma 5.2 shows that there exists \u21e7\nsuch that \u21e7C\u21e7T and \u21e7A\u21e7T are R-matrices, so we can write \u21e7A\u21e7T as a sum of CUT matrices.\nFurthermore, Lemmas 2.4 and 2.5 show that each CUT term is minimized by a monotonic sequence,\nbut yi = ai+b means here that all monotonic subsets of y of a given length have the same (minimal)\nvariance, attained by \u21e7y. So the corresponding \u21e1 also solves problem (2).\n\n2.2 Unimodal matrices\n\nHere, based on [6], we \ufb01rst de\ufb01ne a generalization of P-matrices called (appropriately enough) Q-\nmatrices, i.e. matrices with unimodal columns. We now show that minimizing (2) also recovers the\ncorrect ordering for these more general matrix classes.\nDe\ufb01nition 2.7 We say that a matrix A 2 Rn\u21e5m is a Q-matrix if and only if each column of A is\nunimodal, i.e. its coef\ufb01cients increase to a maximum, then decrease.\n\nNote that R-matrices are symmetric Q-matrices. We call a matrix A pre-Q iff there is a permutation\n\u21e7 such that \u21e7A is a Q-matrix. Next, again based on [6], we de\ufb01ne the circular product of two\nmatrices.\nDe\ufb01nition 2.8 Given A, BT 2 Rn\u21e5m, and a strictly positive weight vector w 2 Rm, their circular\nproduct A B is de\ufb01ned as (A B)ij =Pm\nk=1 wk min{Aik, Bkj}, i, j = 1, . . . , n, note that when\nA is a symmetric matrix, A A is also symmetric.\nRemark that when A, B are {0, 1} matrices and w = 1, min{Aik, Bkj} = AikBkj, so the circle\nproduct matches the regular matrix product ABT . In the appendix we \ufb01rst prove that when A is a\nQ-matrix, then A AT is a sum of CUT matrices. This is illustrated in Figure 1.\nLemma 2.9 Let A 2 Rn\u21e5m a Q-matrix, then A AT is a conic combination of CUT matrices.\nThis last result also shows that AAT is a R-matrix when A is a Q matrix, as a sum of CUT matrices.\nThese de\ufb01nitions are illustrated in Figure 1. We now recall the central result in [6, Th. 1].\nTheorem 2.10 [6, Th. 1] Suppose A 2 Rn\u21e5m is pre-Q, then \u21e7A is a Q-matrix iff \u21e7(A AT )\u21e7T is\na R-matrix.\n\nWe are now ready to show the main result of this section linking permutations which order R-\nmatrices and solutions to problem (2).\nProposition 2.11 Suppose C 2 Rn\u21e5m is a pre-Q matrix and yi = ai + b for i = 1, . . . , n and\na, b 2 R with a 6= 0. Let A = CCT , if \u21e7 is such that \u21e7A\u21e7T is an R-matrix, then the corresponding\npermutation \u21e1 solves the combinatorial minimization problem (2).\nProof. If C 2 Rn\u21e5m is pre-Q, then Lemma 2.9 and Theorem 2.10 show that there is a permutation\n\u21e7 such that \u21e7(C CT )\u21e7T is a sum of CUT matrices (hence a R-matrix). Now as in Propostion 2.6,\nall monotonic subsets of y of a given length have the same variance, hence Lemmas 2.4 and 2.5\nshow that \u21e1 solves problem (2).\n\nThis result shows that if A is pre-R and can be written A = C CT with C pre-Q, then the\npermutation that makes A an R-matrix also solves (2). Since [15] show that sorting the Fiedler\nvector also orders A as an R-matrix, Prop. 2.11 gives a polynomial time solution to problem (2)\nwhen A = C CT is pre-R with C pre-Q.\n3 Convex relaxations for permutation problems\n\nIn the sections that follow, we will use the combinatorial results derived above to produce convex\nrelaxations of optimization problems written over the set of permutation matrices. Recall that the\nFiedler value of a symmetric non negative matrix is the smallest non-zero eigenvalue of its Laplacian.\nThe Fiedler vector is the corresponding eigenvector. We \ufb01rst recall the main result from [15] which\nshows how to reorder pre-R matrices in a noise free setting.\n\n4\n\n\fProposition 3.1 [15, Th. 3.3] Suppose A 2 Sn is a pre-R-matrix, with a simple Fiedler value whose\nFiedler vector v has no repeated values. Suppose that \u21e7 2P is such that the permuted Fielder\nvector \u21e7v is monotonic, then \u21e7A\u21e7T is an R-matrix.\n\nThe results in [15] provide a polynomial time solution to the R-matrix ordering problem in a noise-\nless setting. While [15] also show how to handle cases where the Fiedler vector is degenerate, these\nscenarios are highly unlikely to arise in settings where observations on A are noisy and we do not\ndiscuss these cases here.\nThe results in the previous section made the connection between the spectral ordering in [15] and\nproblem (2). In what follows, we will use (2) to produce convex relaxations to matrix ordering\nproblems in a noisy setting. We also show in Section 3 how to incorporate a priori knowledge in\nthe optimization problem. Numerical experiments in Section 4 show that semi-supervised seriation\nsolutions are sometimes signi\ufb01cantly more robust to noise than the spectral solutions ordered from\nthe Fiedler vector.\n\nPermutations and doubly stochastic matrices. We write Dn the set of doubly stochastic matrices\nin Rn\u21e5n, i.e. Dn = {X 2 Rn\u21e5n : X > 0, X1 = 1, X T 1 = 1}. Note that Dn is convex and\npolyhedral. Classical results show that the set of doubly stochastic matrices is the convex hull of the\nset of permutation matrices. We also have P = D\\O , i.e. a matrix is a permutation matrix if and\nonly if it is both doubly stochastic and orthogonal. This means that we can directly write a convex\nrelaxation to the combinatorial problem (2) by replacing P with its convex hull Dn, to get\n\ngT \u21e7T LA\u21e7g\n\nminimize\nsubject to \u21e7 2D n,\n\n1 \u21e7g + 1 \uf8ff eT\n\nwhere g = (1, . . . , n). By symmetry, if a vector \u21e7y minimizes (3), then the reverse vector also\nminimizes (3). This often has a signi\ufb01cant negative impact on the quality of the relaxation, and\nn \u21e7g to break symmetries, which means that we always\nwe add the linear constraint eT\npick monotonically increasing solutions. Because the Laplacian LA is always positive semide\ufb01nite,\nproblem (3) is a convex quadratic program in the variable \u21e7 and can be solved ef\ufb01ciently. To\nprovide a solution to the combinatorial problem (2), we then generate permutations from the doubly\nstochastic optimal solution to (3) (we will describe an ef\ufb01cient procedure to do so in \u00a73).\nThe results of Section 2 show that the optimal solution to (2) also solves the seriation problem in\nthe noiseless setting when the matrix A is of the form C CT with C a Q-matrix and y is an af\ufb01ne\ntransform of the vector (1, . . . , n). These results also hold empirically for small perturbations of the\nvector y and to improve robustness to noisy observations of A, we can average several values of the\nobjective of (3) over these perturbations, solving\n\nminimize Tr(Y T \u21e7T LA\u21e7Y )/p\nsubject to eT\n\n1 \u21e7g + 1 \uf8ff eT\n\nn \u21e7g, \u21e71 = 1, \u21e7T 1 = 1, \u21e7 0,\n\nin the variable \u21e7 2 Rn\u21e5n, where Y 2 Rn\u21e5p is a matrix whose columns are small perturbations\nof the vector g = (1, . . . , n)T . Note that the objective of (4) can be rewritten in vector format as\nVec(\u21e7)T (Y Y T \u2326 LA)Vec(\u21e7)/p. Solving (4) is roughly p times faster than individually solving p\nversions of (3).\n\n(3)\n\n(4)\n\n(5)\n\nRegularized convex relaxation. As the set of permutation matrices P is the intersection of the set\nof doubly stochastic matrices D and the set of orthogonal matrices O, i.e. P = D\\O we can add\na penalty to the objective of the convex relaxed problem (4) to force the solution to get closer to the\nset of orthogonal matrices.\nAs a doubly stochastic matrix of Frobenius norm pn is necessarily orthogonal, we would ideally\nlike to solve\n\nminimize\n1\nsubject to eT\n\np Tr(Y T \u21e7T LA\u21e7Y ) \u00b5\n1 \u21e7g + 1 \uf8ff eT\n\npk\u21e7k2\n\nF\n\nn \u21e7g, \u21e71 = 1, \u21e7T 1 = 1, \u21e7 0,\n\nwith \u00b5 large enough to guarantee that the global solution is indeed a permutation. However, this\nproblem is not convex for any \u00b5 > 0 since its Hessian is not positive semi-de\ufb01nite (the Hessian\nY Y T \u2326 LA \u00b5I \u2326 I is never positive semide\ufb01nite when \u00b5 > 0 since the \ufb01rst eigenvalue of LA\nis 0). Instead, we propose a slightly modi\ufb01ed version of (5), which has the same objective function\n\n5\n\n\fup to a constant, and is convex for some values of \u00b5. Remember that the Laplacian matrix LA is\nalways positive semide\ufb01nite with at least one eigenvalue equal to zero (strictly one if the graph is\nconnected). Let P = I 1\nProposition 3.2 The optimization problem\n\nn 11T .\n\n(6)\n\nminimize\nsubject to eT\n\n1\n\np Tr(Y T \u21e7T LA\u21e7Y ) \u00b5\n1 \u21e7g + 1 \uf8ff eT\n\npkP \u21e7k2\n\nF\n\nn \u21e7g, \u21e71 = 1, \u21e7T 1 = 1, \u21e7 0,\n\nis equivalent to problem (5) and their objectives differ by a constant. When \u00b5 \uf8ff 2(LA)1(Y Y T ),\nthis problem is convex.\n\n1\n\nIncorporating structural contraints. The QP relaxation allows us to add convex structural con-\nstraints in the problem. For instance, in archeological applications, one may specify that obser-\nvation i must appear before observation j, i.e. \u21e1(i) <\u21e1 (j).\nIn gene sequencing applications,\none may want to constrain the distance between two elements (e.g. mate reads), which would read\na \uf8ff \u21e1(i) \u21e1(j) \uf8ff b and introduce an af\ufb01ne inequality on the variable \u21e7 in the QP relaxation\nof the form a \uf8ff eT\nj \u21e7g \uf8ff b. Linear constraints could also be extracted from a reference\ngene sequence. More generally, we can rewrite problem (6) with nc additional linear constraints as\nfollows\n\ni \u21e7g eT\n\npkP \u21e7k2\n\np Tr(Y T \u21e7T LA\u21e7Y ) \u00b5\n\nminimize\nsubject to DT \u21e7g + \uf8ff 0, \u21e71 = 1, \u21e7T 1 = 1, \u21e7 0,\n\n(7)\nwhere D is a matrix of size n \u21e5 nc and is a vector of size nc. The \ufb01rst column of D is equal to\ne1 en and 1 = 1 (to break symmetry).\nSampling permutations from doubly stochastic matrices. This procedure is based on the fact\nthat a permutation can be de\ufb01ned from a doubly stochastic matrix D by the order induced on a\nmonotonic vector. Suppose we generate a monotonic random vector v and compute Dv. To each v,\nwe can associate a permutation \u21e7 such that \u21e7Dv is monotonically increasing. If D is a permuta-\ntion matrix, then the permutation \u21e7 generated by this procedure will be constant, if D is a doubly\nstochastic matrix but not a permutation, it might \ufb02uctuate. Starting from a solution D to problem (6),\nwe can use this procedure to generate many permutation matrices \u21e7 and we pick the one with lowest\ncost yT \u21e7T LA\u21e7y in the combinatorial problem (2). We could also project \u21e7 on permutations using\nthe Hungarian algorithm, but this proved more costly and less effective.\n\nF\n\nOrthogonal relaxation. Recall that P = D\\O , i.e. a matrix is a permutation matrix if and only\nif it is both doubly stochastic and orthogonal. So far, we have relaxed the orthogonality constraint to\nreplace it by a penalty on the Frobenius norm. Semide\ufb01nite relaxations to orthogonality constraints\nhave been developed in e.g. [12, 13, 14], with excellent approximation bounds, and these could\nprovide alternative relaxation schemes. However, these relaxations form semide\ufb01nite programs of\ndimension O(n2) (hence have O(n4) variables) which are out of reach numerically for most of the\nproblems considered here.\n\nAlgorithms. The convex relaxation in (7) is a quadratic program in the variable \u21e7 2 Rn\u21e5n, which\nhas dimension n2. For reasonable values of n (around a few hundreds), interior point solvers such\nas MOSEK [17] solve this problem very ef\ufb01ciently. Furthermore, most pre-R matrices formed by\nsquaring pre-Q matrices are very sparse, which considerably speeds up linear algebra. However,\n\ufb01rst-order methods remain the only alternative beyond a certain scale. We quickly discuss the im-\nplementation of two classes of methods: the Frank-Wolfe (a.k.a. conditional gradient) algorithm,\nand accelerated gradient methods.\nSolving (7) using the conditional gradient algorithm in [18] requires minimizing an af\ufb01ne function\nover the set of doubly stochastic matrices at each iteration. This amounts to solving a classical\ntransportation (or matching) problem for which very ef\ufb01cient solvers exist [19].\nOn the other hand, solving (7) using accelerated gradient algorithms requires solving a projection\nstep on doubly stochastic matrices at each iteration [20]. Here too, exploiting structure signi\ufb01cantly\nimproves the complexity of these steps. Given some matrix \u21e70, the projection problem is written\n\n1\n\nminimize\nsubject to DT \u21e7g + \uf8ff 0, \u21e71 = 1, \u21e7T 1 = 1, \u21e7 0\n\n2k\u21e7 \u21e70k2\n\nF\n\n(8)\n\n6\n\n\fin the variable \u21e7 2 Rn\u21e5n, with parameter g 2 Rn. The dual is written\n\nmaximize 1\nsubject to\n\n2kx1T + 1yT + DzgT Zk2\n+xT (\u21e701 1) + yT (\u21e7T\nz 0, Z 0\n\nF Tr(ZT \u21e70)\n\n0 1 1) + z(DT \u21e70g + )\n\n(9)\n\nin the variables Z 2 Rn\u21e5n, x, y 2 Rn and z 2 Rnc. The dual is written over decoupled linear\nconstraints in (z, Z) (with x and y are unconstrained). Each subproblem is equivalent to computing\na conjugate norm and can be solved in closed form. In particular, the matrix Z is updated at each\niteration by Z = max{0, x1T + 1yT + DzgT \u21e70}. Warm-starting provides a signi\ufb01cant speed-\nup. This means that problem (9) can be solved very ef\ufb01ciently by block-coordinate ascent, whose\nconvergence is guaranteed in this setting [21], and a solution to (8) can be reconstructed from the\noptimum in (9).\n\n4 Applications & numerical experiments\n\nArcheology. We reorder the rows of the Hodson\u2019s Munsingen dataset (as provided by [22] and\nmanually ordered by [6]), to date 59 graves from 70 recovered artifact types (graves from similar\nperiods containing similar artifacts). The results are reported in Table 1 (and in the appendix). We\nuse a fraction of the pairwise orders in [6] to solve the semi-supervised version.\n\nKendall \u2327\nSpearman \u21e2\nComb. Obj.\n# R-constr.\n\nSol. in [6]\n1.00\u00b10.00\n1.00\u00b10.00\n38520\u00b10\n1556\u00b10\n\nSpectral\n0.75\u00b10.00\n0.90\u00b10.00\n38903\u00b10\n1802\u00b10\n\nQP Reg\n0.73\u00b10.22\n0.88\u00b10.19\n41810\u00b113960\n2021\u00b1484\n\nQP Reg + 0.1% QP Reg + 47.5%\n\n0.76\u00b10.16\n0.91\u00b10.16\n43457\u00b123004\n2050\u00b1747\n\n0.97\u00b10.01\n1.00\u00b10.00\n37602\u00b1775\n1545\u00b143\n\nTable 1: Performance metrics (median and stdev over 100 runs of the QP relaxation, for Kendall\u2019s \u2327,\nSpearman\u2019s \u21e2 ranking correlations (large values are good), the objective value in (2), and the num-\nber of R-matrix monotonicity constraint violations (small values are good), comparing Kendall\u2019s\noriginal solution with that of the Fiedler vector, the seriation QP in (6) and the semi-supervised\nseriation QP in (7) with 0.1% and 47.5% pairwise ordering constraints speci\ufb01ed. Note that the\nsemi-supervised solution actually improves on both Kendall\u2019s manual solution and on the spectral\nordering.\n\nMarkov chains. Here, we observe many disordered samples from a Markov chain. The mutual\ninformation matrix of these variables must be decreasing with |i j| when ordered according to\nthe true generating Markov chain [23, Th. 2.8.1], hence the mutual information matrix of these\nvariables is a pre-R-matrix. We can thus recover the order of the Markov chain by solving the\nseriation problem on this matrix. In the following example, we try to recover the order of a Gaussian\ni ). The results are presented in Table 2\nMarkov chain written Xi+1 = biXi + \u270fi with \u270fi \u21e0 N (0, 2\non 30 variables. We test performance in a noise free setting where we observe the randomly ordered\nmodel covariance, in a noisy setting with enough samples (6000) to ensure that the spectral solution\nstays in a perturbative regime, and \ufb01nally using much fewer samples (60) so the spectral perturbation\ncondition fails.\nGene sequencing. In next generation shotgun gene sequencing experiments, genes are cloned about\nten to a hundred times before being decomposed into very small subsequences called \u201creads\u201d, each\n\ufb01fty to a few hundreds base pairs long. Current machines can only accurately sequence these small\nreads, which must then be reordered by \u201cassembly\u201d algorithms, using the overlaps between reads.\nWe generate arti\ufb01cial sequencing data by (uniformly) sampling reads from chromosome 22 of the\nhuman genome from NCBI, then store k-mer hit versus read in a binary matrix (a k-mer is a \ufb01xed\nsequence of k base pairs).\nIf the reads are ordered correctly, this matrix should be C1P, hence\nwe solve the C1P problem on the {0, 1}-matrix whose rows correspond to k-mers hits for each\nread, i.e. the element (i, j) of the matrix is equal to one if k-mer j is included in read i. This\nmatrix is extremely sparse, as it is approximately band-diagonal with roughly constant degree when\nreordered appropriately, and computing the Fiedler vector can be done with complexity O(n log n),\nas it amounts to computing the second largest eigenvector of n(L)I L, where L is the Laplacian\n\n7\n\n\fTrue\nSpectral\nQP Reg\n\nNo noise\n1.00\u00b10.00\n1.00\u00b10.00\n0.50\u00b10.34\nQP + 0.2% 0.65\u00b10.29\nQP + 4.6% 0.71\u00b10.08\nQP + 54.3% 0.98\u00b10.01\n\nNoise within spectral gap Large noise\n1.00\u00b10.00\n0.41\u00b10.25\n0.45\u00b10.27\n0.60\u00b10.27\n0.68\u00b10.08\n0.97\u00b10.02\n\n1.00\u00b10.00\n0.86\u00b10.14\n0.58\u00b10.31\n0.40\u00b10.26\n0.70\u00b10.07\n0.97\u00b10.01\n\nTable 2: Kendall\u2019s \u2327 between the true Markov chain ordering, the Fiedler vector, the seriation QP\nin (6) and the semi-supervised seriation QP in (7) with varying numbers of pairwise orders speci\ufb01ed.\nWe observe the (randomly ordered) model covariance matrix (no noise), the sample covariance\nmatrix with enough samples so the error is smaller than half of the spectral gap, then a sample\ncovariance computed using much fewer samples so the spectral perturbation condition fails.\n\nof the matrix. In our experiments, computing the Fiedler vector of a million base pairs sequence\ntakes less than a minute using MATLAB\u2019s eigs on a standard desktop machine.\nIn practice, besides sequencing errors (handled relatively well by the high coverage of the reads),\nthere are often repeats in long genomes. If the repeats are longer than the k-mers, the C1P assump-\ntion is violated and the order given by the Fiedler vector is not reliable anymore. On the other hand,\nhandling the repeats is possible using the information given by mate reads, i.e. reads that are known\nto be separated by a given number of base pairs in the original genome. This structural knowledge\ncan be incorporated into the relaxation (7). While our algorithm for solving (7) only scales up to\na few thousands base pairs on a regular desktop, it can be used to solve the sequencing problem\nhierarchically, i.e. to re\ufb01ne the spectral solution. Graph connectivity issues can be solved directly\nusing spectral information.\n\nFigure 2: We plot the reads \u21e5 reads matrix measuring the number of common k-mers between\nread pairs, reordered according to the spectral ordering on two regions (two plots on the left), then\nthe Fiedler and Fiedler+QP read orderings versus true ordering (two plots on the right). The semi-\nsupervised solution contains much fewer misplaced reads.\n\nIn Figure 2, the two \ufb01rst plots show the result of spectral ordering on simulated reads from human\nchromosome 22. The full R matrix formed by squaring the reads \u21e5 kmers matrix is too large to\nbe plotted in MATLAB and we zoom in on two diagonal block submatrices. In the \ufb01rst one, the\nreordering is good and the matrix has very low bandwidth, the corresponding gene segment (or\ncontig.)\nIn the second the reordering is less reliable, and the bandwidth\nis larger, the reconstructed gene segment contains errors. The last two plots show recovered read\nposition versus true read position for the Fiedler vector and the Fiedler vector followed by semi-\nsupervised seriation, where the QP relaxation is applied to the reads assembled by the spectral\nsolution, on 250 000 reads generated in our experiments. We see that the number of misplaced reads\nsigni\ufb01cantly decreases in the semi-supervised seriation solution.\n\nis well reconstructed.\n\nAcknoledgements. AA, FF and RJ would like to acknowledge support from a European Research\nCouncil starting grant (project SIPA) and a gift from Google. FB would like to acknowledge support\nfrom a European Research Council starting grant (project SIERRA). A much more complete version\nof this paper is available as [16] at arXiv:1306.4805.\n\n8\n\n\fReferences\n[1] William S Robinson. A method for chronologically ordering archaeological deposits. American antiquity,\n\n16(4):293\u2013301, 1951.\n\n[2] Stephen T Barnard, Alex Pothen, and Horst Simon. A spectral algorithm for envelope reduction of sparse\n\nmatrices. Numerical linear algebra with applications, 2(4):317\u2013334, 1995.\n\n[3] D.R. Fulkerson and O. A. Gross. Incidence matrices and interval graphs. Paci\ufb01c journal of mathematics,\n\n15(3):835, 1965.\n\n[4] Gemma C Garriga, Esa Junttila, and Heikki Mannila. Banded structure in binary matrices. Knowledge\n\nand information systems, 28(1):197\u2013226, 2011.\n\n[5] Jo\u02dcao Meidanis, Oscar Porto, and Guilherme P Telles. On the consecutive ones property. Discrete Applied\n\nMathematics, 88(1):325\u2013354, 1998.\n\n[6] David G Kendall. Abundance matrices and seriation in archaeology. Probability Theory and Related\n\nFields, 17(2):104\u2013112, 1971.\n\n[7] Chris Ding and Xiaofeng He. Linearized cluster assignment via spectral ordering. In Proceedings of the\n\ntwenty-\ufb01rst international conference on Machine learning, page 30. ACM, 2004.\n\n[8] Niko Vuokko. Consecutive ones property and spectral ordering. In Proceedings of the 10th SIAM Inter-\n\nnational Conference on Data Mining (SDM\u201910), pages 350\u2013360, 2010.\n\n[9] Innar Liiv. Seriation and matrix reordering methods: An historical overview. Statistical analysis and data\n\nmining, 3(2):70\u201391, 2010.\n\n[10] Alan George and Alex Pothen. An analysis of spectral envelope reduction via quadratic assignment\n\nproblems. SIAM Journal on Matrix Analysis and Applications, 18(3):706\u2013732, 1997.\n\n[11] Eugene L Lawler. The quadratic assignment problem. Management science, 9(4):586\u2013599, 1963.\n[12] Qing Zhao, Stefan E Karisch, Franz Rendl, and Henry Wolkowicz. Semide\ufb01nite programming relaxations\n\nfor the quadratic assignment problem. Journal of Combinatorial Optimization, 2(1):71\u2013109, 1998.\n\n[13] A. Nemirovski. Sums of random symmetric matrices and quadratic optimization under orthogonality\n\nconstraints. Mathematical programming, 109(2):283\u2013317, 2007.\n\n[14] Anthony Man-Cho So. Moment inequalities for sums of random matrices and their applications in opti-\n\nmization. Mathematical programming, 130(1):125\u2013151, 2011.\n\n[15] J.E. Atkins, E.G. Boman, B. Hendrickson, et al. A spectral algorithm for seriation and the consecutive\n\nones problem. SIAM J. Comput., 28(1):297\u2013310, 1998.\n\n[16] F. Fogel, R. Jenatton, F. Bach, and A. d\u2019Aspremont. Convex relaxations for permutation problems.\n\narXiv:1306.4805, 2013.\n\n[17] Erling D Andersen and Knud D Andersen. The mosek interior point optimizer for linear programming:\n\nan implementation of the homogeneous algorithm. High performance optimization, 33:197\u2013232, 2000.\n\n[18] M. Frank and P. Wolfe. An algorithm for quadratic programming. Naval research logistics quarterly,\n\n3(1-2):95\u2013110, 1956.\n\n[19] L Portugal, F Bastos, J J\u00b4udice, J Paixao, and T Terlaky. An investigation of interior-point algorithms for\n\nthe linear transportation problem. SIAM Journal on Scienti\ufb01c Computing, 17(5):1202\u20131223, 1996.\n\n[20] Y. Nesterov. Introductory Lectures on Convex Optimization. Springer, 2003.\n[21] D. Bertsekas. Nonlinear Programming. Athena Scienti\ufb01c, 1998.\n[22] Frank Roy Hodson. The La T`ene cemetery at M\u00a8unsingen-Rain: catalogue and relative chronology, vol-\n\nume 5. St\u00a8amp\ufb02i, 1968.\n\n[23] Thomas M Cover and Joy A Thomas. Elements of information theory. Wiley-interscience, 2012.\n\n9\n\n\f", "award": [], "sourceid": 545, "authors": [{"given_name": "Fajwel", "family_name": "Fogel", "institution": "\u00c9cole Polytechnique"}, {"given_name": "Rodolphe", "family_name": "Jenatton", "institution": "CMAP"}, {"given_name": "Francis", "family_name": "Bach", "institution": "INRIA & ENS"}, {"given_name": "Alexandre", "family_name": "D'Aspremont", "institution": "CNRS \u2013 ENS"}]}