{"title": "Fast detection of multiple change-points shared by many signals using group LARS", "book": "Advances in Neural Information Processing Systems", "page_first": 2343, "page_last": 2351, "abstract": "We present a fast algorithm for the detection of multiple change-points when each is frequently shared by members of a set of co-occurring one-dimensional signals. We give conditions on consistency of the method when the number of signals increases, and provide empirical evidence to support the consistency results.", "full_text": "Fast detection of multiple change-points shared by\n\nmany signals using group LARS\n\nJean-Philippe Vert and Kevin Bleakley\n\nMines ParisTech CBIO, Institut Curie, INSERM U900\n\n{firstname.lastname}@mines-paristech.fr\n\nAbstract\n\nWe present a fast algorithm for the detection of multiple change-points when each\nis frequently shared by members of a set of co-occurring one-dimensional signals.\nWe give conditions on consistency of the method when the number of signals\nincreases, and provide empirical evidence to support the consistency results.\n\n1\n\nIntroduction\n\nFinding the place (or time) where most or all of a set of one-dimensional signals (or pro\ufb01les) jointly\nchange in some speci\ufb01c way is an important question in several \ufb01elds. A \ufb01rst common situation is\nwhen we want to \ufb01nd change-points in a multidimensional signal, for instance, we may want to auto-\nmatically detect changes from human speech to other sound in a movie, based on data representation\nof features coming from both the audio and visual tracks [1]. Another important situation is when we\nare confronted with several 1-dimensional signals which we believe share common change-points,\ne.g., genomic pro\ufb01les of several patients. The latter application is increasingly important in biology\nand medicine, in particular for the detection of copy-number variation along the genome, though it\nis also useful for microarray and genetic linkage studies [2]. The common thread in all of these is\nthe search for data patterns shared by a set of patients at precise places on the genome; in particular,\nsudden changes in measurement. As opposed to the segmentation of multi-dimensional signals such\nas speech, the length of the signal (i.e., the number of probes along the genome) is \ufb01xed for a given\ntechnology while the number of signals (i.e., the number of patients) can increase. It is therefore of\ninterest to develop method to identify multiple change-points shared by several signals which can\nbene\ufb01t from increasing the number of pro\ufb01les.\nThere exists a vast literature on the change-point detection problem [3, 4]. Here we focus on the\nproblem of approximating a multidimensional signal by a piecewise-constant one, using quadratic\nerror criteria. It is well-known that the optimal segmentation of a p-dimensional signal of length\nn into k segments can be obtained in O(n2pk) by dynamic programming [5, 6, 7]. The quadratic\ncomplexity in n2 is however prohibitive in applications such as genomics, where n can be in the or-\nder of 105 to 107 with current technology. An alternative to such global procedures, which estimate\nchange-points as solutions of a global optimization problem, are fast local procedures such as binary\nsegmentation [8], which detect breakpoints by iteratively applying a method for single change-point\ndetection to the segments obtained after the previous change-point is detected. While such recursive\nmethods can be extremely fast, in the order of O(np log(k)) when the single change-point detector\nis O(np), quality of segmentation is questionable when compared with global procedures [9].\nFor p = 1 (a single signal), an interesting alternative to these global and local procedures is to\nexpress the optimal segmentation as the solution of a convex optimization problem, using the (con-\nvex) total variation instead of the (non-convex) number of jumps to penalize a piecewise-constant\nfunction, in order to approximate the original signal [10, 11]. The resulting piecewise-constant ap-\nproximation of the signal, de\ufb01ned as the global minimum of the objective function, bene\ufb01ts from\n\n1\n\n\ftheoretical guaranties in terms of correctly detecting change-points [12, 13], and can be implemented\nef\ufb01ciently in O(nk) or O(n log(n)) [14, 12, 15].\nIn this paper we propose an extension of total-variation based methods for single signals to the\nmultidimensional setting, in order to approximate a multidimensional signal by a piecewise con-\nstant signal with multiple change-points. We de\ufb01ne the approximation as the solution of a convex\noptimization problem, which involves a quadratic approximation error penalized by the (cid:96)1 norm of\nincrements of the function. The problem can be reformulated as a group LASSO problem, which we\npropose to solve approximately with a group LARS procedure [16]. Using the particular structure\nof the design matrix, we can \ufb01nd the \ufb01rst k change-points in O(npk), extending the method of [12]\nto the multidimensional setting.\nUnlike most previous theoretical investigations of change-point methods, we are not interested in\nthe case where the dimension p is \ufb01xed and the length of the pro\ufb01les n increases, but in the opposite\nsituation where n is \ufb01xed and p increases. Indeed, this corresponds to the case in genomics where,\nfor example, n would be the \ufb01xed number of probes used to measure a signal along the genome,\nand p the number of samples or patients analyzed. We want to design a method that bene\ufb01ts from\nincreasing p in order to identify shared change-points, even though the signal-to-noise ratio may be\nvery low within each signal. As a \ufb01rst step towards this question, we give conditions under which\nour method is able to consistently identify a single change-point as p increases. We also show by\nsimulation that our method is able to consistently identify multiple change-points, as p \u2192 +\u221e,\nvalidating its relevance in practical settings. To conclude, we present possible applications of the\nmethod in the study of copy number variations in cancer.\n\n(cid:113)(cid:80)u\n\n2 Notation\nFor any two integers u \u2264 v, let [u, v] denote the interval {u, u + 1, . . . , v}. For any u \u00d7 v matrix\nM we note Mi,j its (i, j)-th entry. (cid:107)M(cid:107) =\ni,j is its Frobenius norm (or Euclidean\n\nnorm in the case of vectors). For any subsets of indices A =(cid:8)a1, . . . , a|A|(cid:9) \u2208 [1, u]|A| and B =\n(cid:0)b1, . . . , b|B|(cid:1) \u2208 [1, v]|B|, we denote by MA,B the |A| \u00d7 |B| matrix with entries Mai,bj for (i, j) \u2208\n\n[1,|A|] \u00d7 [1,|B|]. For simplicity we will use \u2022 instead of [1, u] or [1, v], i.e., Ai,\u2022 is the i-th row of\nA and A\u2022,j is the j-th column of A. We note 1u,v the u\u00d7 v matrix of ones, and Ip the p\u00d7 p identity\nmatrix.\n\ni=1\n\nj=1 M 2\n\n(cid:80)v\n\n3 Formulation\nWe consider p pro\ufb01les of length n, stored in an n \u00d7 p matrix Y . The i-th pro\ufb01le Y\u2022,i =\n(Y1,i, . . . , Yn,i) is the i-th column of Y . We assume that each pro\ufb01le is a piecewise-constant signal\ncorrupted by noise, and that change-points locations tend to be shared across pro\ufb01les. Our goal is\nto detect these shared change-points, and bene\ufb01t from the possibly large number p of pro\ufb01les to\nincrease the statistical power of change-point detection.\nWhen p = 1 (single pro\ufb01le), a popular method to \ufb01nd change-points in a signal is to approximate it\nby a piecewise constant function using total variation (TV) denoising [10], i.e., to solve\n\nn\u22121(cid:88)\n\ni=1\n\nn\u22121(cid:88)\n\ni=1\n\n2\n\n(cid:107) Y \u2212 U (cid:107)2 + \u03bb\n\nmin\nU\u2208Rn\n\n| Ui+1 \u2212 Ui | .\n\n(1)\n\nFor a given \u03bb > 0, the solution U \u2208 Rn of (1) is piecewise-constant and its change-points are\npredicted to be those of Y . Adding penalties proportional to the (cid:96)1 ot (cid:96)2 norm of U to (1) does\nnot change the position of the change-points detected [11, 17], and the capacity of TV denoising to\ncorrectly identify change-points when n increases has been investigated in [12, 13].\nHere we propose to generalize TV denoising to multiple pro\ufb01les by considering the following convex\noptimization problem, for Y \u2208 Rn\u00d7p:\n\n(cid:107) Y \u2212 U (cid:107)2 + \u03bb\n\n(cid:107) Ui+1,\u2022 \u2212 Ui,\u2022 (cid:107) .\n\n(2)\n\nmin\n\nU\u2208Rn\u00d7p\n\n\fThe second term in (2) penalizes the sum of Euclidean norms of the increments of U, seen as\na time-dependent multidimensional vector. Intuitively, this penalty will enforce many increments\nUi+1,\u2022 \u2212 Ui,\u2022 to collapse to 0, just like the total variation in (1). As a result the solution of (2)\nprovides an approximation of the pro\ufb01les Y by a n \u00d7 p matrix of piecewise-constant pro\ufb01les U\nwhich share change-points. In the following, we propose a fast algorithm to approximately solve (2)\n(Section 4), discuss theoretically whether the solution identi\ufb01es correctly the change-points (Section\n5), and provide an empirical evaluation of the method (Section 6).\n\n4\n\nImplementation\n\nAlthough (2) is a convex optimization problem that can in principle be solved by general-purpose\nsolvers [18], we are often working in dimensions that can reach millions, making this approach\nimpractical. Moreover, we would ideally like to obtain solutions for various values of \u03bb, corre-\nsponding to various numbers of change-points, in order to be able to select the optimal number of\nchange-points using various statistical criteria. In the single pro\ufb01le case (p = 1), [14] proposed a\nfast coordinate descent-like method, [12] showed how to \ufb01nd the \ufb01rst k change-points iteratively in\nO(nk), and [15] proposed an O(n ln(n)) method to \ufb01nd all change-points. However, none of these\nmethods is applicable directly to the p > 1 setting since they all rely on speci\ufb01c properties of the\np = 1 case, such as the fact that the solution is piecewise-af\ufb01ne in \u03bb and that the set of change-points\nis monotically decreasing with \u03bb.\nIn order to propose a fast method to solve (2) in the p > 1 setting, let us \ufb01rst reformulate it as\na group LASSO regression problem [16]. To this end, we make the change of variables (\u03b2, \u03b3) \u2208\nR(n\u22121)\u00d7p \u00d7 R1\u00d7p given by:\n\n\u03b3 = U1,\u2022 ,\n\n\u03b2i,\u2022 = Ui+1,\u2022 \u2212 Ui,\u2022\n\nfor i = 1, . . . , n \u2212 1 .\n\nIn other words \u03b2i,j is the jump between the i-th and the (i + 1)-th positions of the j-th pro\ufb01le. We\nimmediately get an expression of U as a function of \u03b2 and \u03b3:\n\nU1,\u2022 = \u03b3 ,\n\nUi,\u2022 = \u03b3 +\n\ni\u22121(cid:88)\n\n\u03b2j,\u2022\n\nfor i = 2, . . . , n .\n\nThis can be rewritten in matrix form as\n\nj=1\n\nwhere X is the n \u00d7 (n \u2212 1) matrix with entries Xi,j = 1 for i > j. Making this change of variable,\nwe can re-express (2) as follows:\n\nU = 1n,1\u03b3 + X\u03b2 ,\n\n(3)\nFor any \u03b2 \u2208 R(n\u22121)\u00d7p, the minimum in \u03b3 is reached for \u03b3 = 11,n(Y \u2212 X\u03b2)/n. Plugging this into\n(3), we get that the matrix of jumps \u03b2 is solution of\n\n\u03b2\u2208R(n\u22121)\u00d7p ,\u03b3\u2208R1\u00d7p\n\nmin\n\n(cid:107) Y \u2212 X\u03b2 \u2212 1n,1\u03b3 (cid:107)2 + \u03bb\n\n(cid:107) \u03b2i,\u2022 (cid:107) .\n\nn\u22121(cid:88)\n\ni=1\n\nn\u22121(cid:88)\n\ni=1\n\nmin\n\n\u03b2\u2208R(n\u22121)\u00d7p\n\n(cid:107) \u00afY \u2212 \u00afX\u03b2 (cid:107)2 + \u03bb\n\n(cid:107) \u03b2i,\u2022 (cid:107) ,\n\n(4)\n\nwhere \u00afY and \u00afX are obtained from Y and X by centering each column.\nEquation 4 is a group LASSO problem, with a particular design matrix and particular groups of\nfeatures. Since existing methods to exactly solve group LASSO regression problems remain dif\ufb01cult\nto apply here \u2013 in particular we do not want to store in memory the n \u00d7 (n \u2212 1) design matrix when\nn is in the millions \u2013 we propose to approximate instead the solution of (4) with the group LARS\nstrategy, which was proposed by [16] as a good approximation to the regularization path of the group\nLASSO. More precisely, the group LARS approximates the solution path of (4) with a piecewise-\naf\ufb01ne set of solutions, and iteratively \ufb01nds change-points. While the original group LARS method\nrequires storing and manipulation of the design matrix [16], which we can not afford here, we can\nextend technical results of [12] to show that the particular structure of the design matrix \u00afX allows\nef\ufb01cient computation of matrix inverses and products.\n\n3\n\n\fLemma 1. For any R \u2208 Rn\u00d7p, we can compute C = \u00afX(cid:62)R in O(np) time and memory.\n\nLemma 2. For any A =(cid:0)a1, . . . , a|A|(cid:1), set of distinct indices with 1 \u2264 a1 < . . . < a|A| \u2264 n, the\nmatrix(cid:0) \u00afX(cid:62)\n\n(cid:1) is invertible, and for any |A| \u00d7 p matrix R, the matrix\n\n\u00afX\u2022,A\n\n\u2022,A\n\nC =(cid:0) \u00afX(cid:62)\n\n(cid:1)\u22121\n\n\u00afX\u2022,A\n\n\u2022,A\n\nR\n\ncan be computed in O(|A|p) time and memory.\nProof of these results can be found in Supplementary Materials.\nAlgorithm 1 describes the fast group LARS method to approximately solve (4). At each subse-\nquent iteration to \ufb01nd the next change-point, we follow steps 3\u20138 which have maximum complexity\nO(np), resulting in O(npk) complexity in time and O(np) in memory to \ufb01nd the \ufb01rst k change-\npoints with the fast group LARS algorithm.\n\nAlgorithm 1 Fast group LARS algorithm\nRequire: centered data \u00afY , number of breakpoints k.\n1: Initialize r = \u00afY , A = \u2205.\n2: for i = 1 to k do\n3:\n4:\n5:\n\nDescent direction: compute w = (cid:0) \u00afX(cid:62)\n\nCompute \u02c6c = \u00afX(cid:62)r using Lemma 1.\nIf i = 1, \ufb01nd the \ufb01rst breakpoint : \u02c6a = argmin j\u2208[1,n] (cid:107) \u02c6cj,\u2022 (cid:107), A = {\u02c6a}.\n\nA,\u2022 \u00afXA,\u2022(cid:1)\u22121 \u02c6cA,\u2022 using Lemma 2, then uA = \u00afXAw\n\nwith cumulative sums, then a = \u00afX(cid:62)uA using Lemma 1.\nDescent step: for each u \u2208 [1, n]\\A, \ufb01nd if it exists the smallest positive solution \u03b1u of the\nsecond-order polynomial in \u03b1:\n\n6:\n\n(cid:107) \u02c6cu,\u2022 \u2212 \u03b1au,\u2022 (cid:107)2 = (cid:107) \u02c6cv,\u2022 \u2212 \u03b1av,\u2022 (cid:107)2 ,\n\nwhere v is any element of A.\nFind the next breakpoint: \u02c6u = argmin [1,p]\\A \u03b1u.\nUpdate A = A \u222a {\u02c6u} and r = r \u2212 a\u02c6uuA.\n\n7:\n8:\n9: end for\n\n5 Theoretical analysis\n\nIn this section, we study theoretically to what extent the estimator (2) recovers correct change-points.\nThe vast majority of existing theoretical results for of\ufb02ine segmentation and change-point detection\nconsider the setting where p is \ufb01xed (usually p = 1), and n increases. This typically corresponds to\na setting where we can sample a continuous signal with increasing density, and wish to locate more\nprecisely the underlying change-points as the density increases.\nHere we propose a radically different analysis, motivated by applications in genomics. Here, the\nlength of pro\ufb01les n is \ufb01xed for a given technology, but the number of pro\ufb01les p can increase when\nmore biological samples or patients are analyzed. The property we would like to study is then, for\na given change-point detection method, whether increasing p for \ufb01xed n allows us to locate more\nprecisely the change-points. While this simply translates our intuition that increasing the number of\npro\ufb01les should increase the statistical power of change-point detection, and while this property was\nempirically observed in [2], we are not aware of previous theoretical results in this setting.\n\n5.1 Consistent estimation of a single change-point\n\nAs a \ufb01rst step towards the analysis of this \"\ufb01xed n increasing p\" setting, let us assume that the\nobserved centered pro\ufb01les \u00afY are obtained by adding noise to a set of pro\ufb01les with a single shared\nchange-point between positions u and u + 1, for some u \u2208 [1, n \u2212 1]. In other words, we assume\nthat\nwhere \u03b2\u2217 is an (n\u22121)\u00d7p matrix of zeros except for the u-th row \u03b2\u2217\nu,\u2022, and W is a noise matrix whose\nentries are assumed to be independent and identically distributed with respect to a centered Gaussian\n\n\u00afY = \u00afX\u03b2\u2217 + W ,\n\n4\n\n\f(cid:1)\n\nu,i\n\ni\u22651\n\njumps(cid:0)\u03b2\u2217\n\nk = 1/k(cid:80)k\n\n, and letting \u00af\u03b22\n\ndistribution with variance \u03c32. In this section we study the probability that the \ufb01rst breakpoint found\nby our procedure is the correct one, when p increases. We therefore consider an in\ufb01nite sequence of\nk exists and\nis \ufb01nite. We \ufb01rst show that, as p increases, the \ufb01rst selected change-point is always the given by the\nsame formula.\nLemma 3. Assume, without loss of generality, that u \u2265 n/2. When p \u2192 +\u221e, the \ufb01rst change-point\nselected is\n\nu,i)2, we assume that \u00af\u03b22 = limk\u2192\u221e \u00af\u03b22\n\ni=1(\u03b2\u2217\n\n\u00af\u03b22 i2 (n \u2212 u)2\n\n+ \u03c32 i (n \u2212 i)\n\nn2\n\nn\n\n.\n\n(5)\n\n\u02c6u = argmax\ni\u2208[1,u]\n\nwith probability tending to 1.\n\nFrom this we easily deduce under which condition the correct change point is selected, i.e., when\n\u02c6u = u:\nTheorem 4. Let \u03b1 = u/n and\n\n\u03b1 = n \u00af\u03b22 (1 \u2212 \u03b1)2(\u03b1 \u2212 1\n2n)\n2 \u2212 1\n\n\u03b1 \u2212 1\n\n2n\n\n\u02dc\u03c32\n\n.\n\n(6)\n\n\u03b1, the probability that the \ufb01rst selected change-point is the correct one tends to 1 as\n\nWhen \u03c32 < \u02dc\u03c32\np \u2192 +\u221e. When \u03c32 > \u02dc\u03c32\nThis theorem, whose proof along with that of Lemma 3 can be found in Supplementary Materials,\ndeserves several comments.\n\n\u03b1, it is not the correct one with probability tending to 1.\n\n\u2022 To detect a change-point at position u = \u03b1n, the noise level \u03c32 must not be larger than\nthe critical value \u03c3\u03b1 given by (7), hence the method is not consistent for all positions. \u03c3\u03b1\nincreases monotonically from \u03b1 = 1/2 to 1, meaning that change-points near the boundary\nare more dif\ufb01cult to detect correctly than change-points near the center. The most dif\ufb01cult\nchange point is the last one (u = n \u2212 1) which can only be detected consistently if \u03c32 is\nsmaller than\n\n\u2022 For a given level of noise \u03c32, change-point detection is asymptotically correct for any\n\n\u03b1 \u2208 [\u0001, 1 \u2212 \u0001], where \u0001 satis\ufb01es \u03c32 = \u00af\u03c32\n\n\u00af\u03c32\n1\u22121/n =\n\n2 \u00af\u03b22\nn\n\n+ o(n\u22121).\n\n(cid:115)\n\n\u0001 =\n\n1\u2212\u0001, i.e.,\n2n \u00af\u03b22 + o(n\u22121/2) .\n\n\u03c32\n\nThis shows in particular that increasing the pro\ufb01le length n increases the interval where\nchange-points are correctly identi\ufb01ed, and that we can get as close as possible to the bound-\nary for n large enough.\n\n\u2022 When \u03c32 < \u03c32\n\u03b1 then the correct change-point is found consistently when p increases,\nshowing the bene\ufb01t of the accumulation of many pro\ufb01les.\n\u2022 It is possible to make the detection of the \ufb01rst change-point consistent uniformly over the\nfull signal, by simply subtracting the term p\u03c32i(n\u2212i)/n from (cid:107) \u00afci,\u2022 (cid:107)2, which is maximized\nover i to select the \ufb01rst change-point. Then, a simple modi\ufb01cation of Lemma 3 shows that,\nas p \u2192 +\u221e, any given change-point is a.s. found. However, this modi\ufb01cation, easy to do\nfor the \ufb01rst change-point, is not obvious to extend to successive change-points detected by\ngroup LARS. We consider it an interesting future challenge to develop variants of the group\nLARS iterative segmentation method whose performance does not depend on the position\nof the change points.\n\n5.2 Consistent estimation of a single change-point with \ufb02uctuating position\n\nAn interesting variant of the problem of detecting a change-point common to many pro\ufb01les is that of\ndetecting a change-point with similar location in many pro\ufb01les, allowing \ufb02uctuations in the precise\n\n5\n\n\flocation of the change-point. This can be modeled by assuming that the pro\ufb01les are random, and that\nthe i-th pro\ufb01le has a change-point of value \u03b2i at position Ui, where (\u03b2i, Ui)i=1,...,p are independent\nand identically distributed according to a distribution P = P\u03b2 \u2297 PU (i.e., we assume \u03b2i independent\nfrom Ui). We denote \u00af\u03b22 = EP\u03b2 \u03b22 and pi = PU (U = i) for i \u2208 [1, n \u2212 1]. Assuming that the\nsupport of PU is [a, b] with 1 \u2264 a \u2264 b \u2264 n \u2212 1, the following result extends Theorem 4 by showing\nthat, under a condition on the noise level, the \ufb01rst change-point discovered is indeed in the support\nof PU :\nTheorem 5. Let \u03b1 = U/n be the random position of the change-point on [0, 1] and \u03b1m = a/n and\n\u03b1M = b/n the position of the left and right boundaries of the support of PU scaled to [0, 1]. Let\nalso\n\n(cid:2)(1 \u2212 E\u03b1)2 + var(\u03b1)2(cid:3) (\u03b1m \u2212 1\n\n(7)\nIf 1/2 \u2208 (\u03b1m, \u03b1M ), then for any \u03c32 the probability that the \ufb01rst selected change-point is in the\nsupport of P tends to 1 as p \u2192 +\u221e. If 1/2 < \u03b1m, then the probability that the \ufb01rst selected\nchange-point is in the support of P tends to 1 when \u03c32 < \u02dc\u03c32\n\u03b1, it is not the correct\none with probability tending to 1.\n\n\u03b1, . When \u03c32 > \u02dc\u03c32\n\nPU = n \u00af\u03b22\n\u02dc\u03c32\n\n\u03b1m \u2212 1\n\n2 \u2212 1\n\n2n\n\n2n)\n\n.\n\nThis theorem, whose proof is postponed to Supplementary Materials, illustrates the robustness of\nthe method to handle \ufb02uctuations in the precise position of the change-point shared between the\npro\ufb01les. Although this situation rarely occurs when we are considering classical multidimensional\nsignals such as \ufb01nancial time series or video signals, it is likely to be the rule when we consider\npro\ufb01les coming from different biological samples. Although the theorem only gives a condition\non the noise level to ensure that the selected change-point lies in the support of the distribution of\nchange-point locations, a precise estimate of the location of the selected change-point as a function\nof PU , which generalizes Lemma 3, is given in the proof.\n\n5.3 The case of multiple change-points\n\nWhile the theoretical results presented above focus on the detection of a single change-point, the\nreal interest of the method is to estimate multiple change-points. The extension of Theorem 4 to\nthis setting is beyond the scope of this paper, and is postponed for future efforts. We nevertheless\nconjecture here that we can consistently estimate multiple change-points under conditions on the\nlevel of noise (not too large), the distance between them (not to small), and the correlations between\ntheir jumps (not too large). Indeed, following the ideas in the proof of Theorem 4, we must analyze\nthe path of the vectors (\u02c6ci,...), and check that, for some \u03bb in (2), they reach their maximum norm\nprecisely at the true change-points. The situation is more complicated than in the single change-\npoint case since the vectors (\u02c6ci,...) must hit a hypersphere at each correct change-point, and must\nremain strictly within the hypersphere between consecutive change-points. This can be ensured if the\nnoise level is not too high (like in the single change-point case), and if the positions corresponding\nto successive change-points on the hypersphere are far enough from each other. In practice this\ntranslates to conditions that two successive change-points should not be too close to each other, and\nthat pro\ufb01les should have, if possible, independent jumps (direction, etc.). We provide experimental\nresults below that con\ufb01rm that, when the noise is not too large, we can indeed correctly identify\nseveral change-points, with a probability of success increasing to 1 as p increases.\n\n6 Experiments\n\nIn this section we give experimental evidence both for theoretical O(npk) complexity and Theorem\n4. Figure 1 shows linearity in each of p, n and k respectively whilst \ufb01xing the other two variables,\ncon\ufb01rming the O(npk) complexity.\nTo test Theorem 4, we considered signals of length 100, each with a unique change-point located\nat position u. We \ufb01xed \u03b1 = 0.8; assuming for simplicity that each signal jumps a height of 1 at\nthe change-point, we get \u00af\u03b22 = 1, and it is then easy to calculate the critical value \u02dc\u03c32\n\u03b1 = 10.78. We\nset the variance of the centered Gaussian noise added to each signal to \u02dc\u03c32\n\u03b1, and ran 1000 trials for\neach u. we expect that for 50 \u2264 u < 80 there is convergence in accuracy to 1, and for u > 80,\nconvergence in accuracy to zero. This is indeed what is seen in Figure 2 (left panel), with u = 80\nthe limit case between the two different modes of convergence.\n\n6\n\n\fFigure 1: Speed trials.\n(a) CPU time for \ufb01nding 50 change-points when there are 2000 probes and the\nnumber of pro\ufb01les varies from 1 to 20.\n(b) CPU time when \ufb01nding 50 change-points with the number of\npro\ufb01les \ufb01xed at 20 and the number of probes varying from 1000 to 10000 in intervals of 1000. (c) CPU time\nfor 20 pro\ufb01les and 2000 probes when selecting from 1 to 50 change-points.\n\nFigure 2: Single change-point accuracy. Accuracy as a function of the number of pro\ufb01les p when the\nchange-point is placed in a variety of positions from: u = 50 to u = 90 (left panel), or: u = 50 \u00b1 2 to\nu = 90 \u00b1 2 (right panel), for a signal of length 100.\n\nThe right-hand-side panel of Figure 2 shows results for the same trials except that change-point\nlocations can vary uniformly in the interval u \u00b1 2. As predicted by Theorem 5, we see that the\naccuracy of the method remains extremely robust against \ufb02uctuations in the exact change-point\nlocation.\nTo investigate the potential for extending the results of the article to the case of many shared\nchange-points, we further simulated pro\ufb01les of length 100 with a change-point at all of positions\n10, 20, . . . , 90. The jump at each change-point was drawn from a centered Gaussian with variance\n1. We then \ufb01xed various values of \u03c32 and looked at convergence in accuracy as the number of\nsignals increased. One thousand trials were performed for each \u03c32, and results are presented in\nFigure 3. Denoting \u03b1 the set of change-point locations {10, 20, . . . , 90} , it appears that a critical\nvalue \u02dc\u03c32\n\u03b1 exists and lies close to 0.27; below 0.27 we have convergence in accuracy to 1, and above,\nconvergence to zero.\nAn interesting application of the fast group LARS method is in the joint segmentation of copy-\nnumber pro\ufb01les. For a set of individuals with the same disease (e.g. a type of cancer), we expect\nthere to be regions of the genome which are frequently gained (potentially containing oncogenes) or\nlost (potentially containing tumor suppressor genes) in many or all of the patients. These regions are\nseparated by change-points. Figure 4 shows Chromosome 8 of three bladder cancer copy-number\npro\ufb01les. We see that in the region of probe 60, a copy number change occurs on all three pro\ufb01les.\nThough it is not in exactly the same place on all pro\ufb01les, the sharing of information across pro\ufb01les\n\n7\n\n010200.050.10.150.20.250.30.350.4time(s)p(a)050001000000.20.40.60.811.21.41.61.8ntime(s)(b)0204000.050.10.150.20.250.30.35ktime(s)(c)010020030040000.20.40.60.81pAccuracy u=50u=60u=70u=80u=90010020030040000.20.40.60.81pAccuracy u=50u=60u=70u=80u=90\fFigure 3: Multiple change-point accuracy. Accuracy as a function of the number of pro\ufb01les p when\nchange-points are placed at the nine positions {10, 20, . . . , 90} and the value of \u03c32 is varied from 0.1 to 0.4.\nThe pro\ufb01le length is 100.\n\nallows the approximate location to be found. The bottom right panel shows the smoothed pro\ufb01les\nsuperimposed on the same axes. A promising use of these smoothed signals, beyond visualization\nof many pro\ufb01les simultaneously, is to detect regions of frequent gain of loss by testing the average\npro\ufb01le values on each segment for signi\ufb01cant positive (gain) or negative (loss) values. Preliminary\nexperiments on simulated and real data suggest that our method is more accurate and two orders of\nmagnitude faster than the state-of-the-art H-HMM [19] method for that purpose.\n\nFigure 4: Segmented and smoothed bladder cancer copy-number pro\ufb01les. Probes shown are\nlocated on Chromosome 8. A shared change-point hotspot is found in the region of probe 60.\n\n7 Conclusion\n\nWe have proposed a framework that extends total-variation based approximation to the multidi-\nmensional setting, developed a fast algorithm to approximately solve it, shown theoretically that\nthe method can consistently estimate change-points, and validated the results experimentally. We\nhave not discussed the problem of choosing the number of change-points, and suggest in practice\nto use existing criteria for this purpose [6, 7]. We observed both theoretically and empirically that\nincreasing the number of pro\ufb01les is highly bene\ufb01cial to detect shared change-points\nAcknowledgements We thank Zaid Harchaoui and Francis Bach for useful discussions. This work\nwas supported by ANR grants ANR-07-BLAN-0311-03 and ANR-09-BLAN-0051-04.\n\n8\n\n01000200030004000500000.20.40.60.81pAccuracy sigma2=0.05sigma2= 0.1sigma2= 0.2sigma2=0.27sigma2= 0.4050100150\u22121\u22120.500.51Probe050100150\u22121\u22120.500.51Probe050100150\u22121\u22120.500.51Probe050100150\u22121\u22120.500.51Probe\fReferences\n[1] Z. Harchaoui, F. Vallet, A. Lung-Yut-Fong, and O. Cappe. A regularized kernel-based approach\nto unsupervised audio segmentation. In ICASSP \u201909: Proceedings of the 2009 IEEE Interna-\ntional Conference on Acoustics, Speech and Signal Processing, pages 1665\u20131668, Washington,\nDC, USA, 2009. IEEE Computer Society.\n\n[2] N. R. Zhang, D. O. Siegmund, H. Ji, and J. Li. Detecting simultaneous change-points in\n\nmultiple sequences. Biometrika, 97(3):631\u2013645, 2010.\n\n[3] M. Basseville and N. Nikiforov. Detection of abrupt changes: theory and application. Infor-\n\nmation and System Sciences Series. Prentice Hall Information, 1993.\n\n[4] B. Brodsky and B. Darkhovsky. Nonparametric Methods in Change-Point Problems. Kluwer\n\nAcademic Publishers, 1993.\n\n[5] Y. C. Yao. Estimating the number of change-points via schwarz criterion. Stat. Probab. Lett.,\n\n6:181\u2013189, 1988.\n\n[6] L. Birg\u00e9 and P. Massart. Gaussian model selection. J. Eur. Math. Soc., 3:203\u2013268, 2001.\n[7] M. Lavielle and G. Teyssi\u00e8re. Detection of multiple change-points in multivariate time series.\n\nLithuanian Mathematical Journal, 46(3):287\u2013306, 2006.\n\n[8] L. J. Vostrikova. Detection of disorder in multidimensional stochastic processes. Soviet Math-\n\nematics Doklady, 24:55\u201359, 1981.\n\n[9] M. Lavielle and Teyssi\u00e8re. Adaptive detection of multiple change-points in asset price volatil-\nIn G. Teyssi\u00e8re and A. Kirman, editors, Long-Memory in Economics, pages 129\u2013156.\n\nity.\nSpringer Verlag, Berlin, 2005.\n\n[10] L. I. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algorithms.\n\nPhysica D, 60:259\u2013268, 1992.\n\n[11] R. Tibshirani, M. Saunders, S. Rosset, J. Zhu, and K. Knight. Sparsity and smoothness via the\n\nfused lasso. J. R. Stat. Soc. Ser. B Stat. Methodol., 67(1):91\u2013108, 2005.\n\n[12] Z. Harchaoui and C. Levy-Leduc. Catching change-points with lasso. In J.C. Platt, D. Koller,\nY. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20,\npages 617\u2013624. MIT Press, Cambridge, MA, 2008.\n\n[13] A. Rinaldo. Properties and re\ufb01nements of the fused lasso. Ann. Stat., 37(5B):2922\u20132952,\n\n2009.\n\n[14] J. Friedman, T. Hastie, H. H\u00f6\ufb02ing, and R. Tibshirani. Pathwise coordinate optimization. Ann.\n\nAppl. Statist., 1(1):302\u2013332, 2007.\n\n[15] H. Hoe\ufb02ing. A path algorithm for the Fused Lasso Signal Approximator. Technical Report\n\n0910.0526v1, arXiv, Oct. 2009.\n\n[16] M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. J.\n\nR. Stat. Soc. Ser. B, 68(1):49\u201367, 2006.\n\n[17] J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online learning for matrix factorization and sparse\n\ncoding. J. Mach. Learn. Res., 11:19\u201360, 2010.\n\n[18] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, New York,\n\nNY, USA, 2004.\n\n[19] S.P. Shah, W.L. Lam, R.T. Ng, and K.P. Murphy. Modeling recurrent DNA copy number\n\nalterations in array CGH data. Bioinformatics, 23(13):i450\u2013i458, 2007.\n\n9\n\n\f", "award": [], "sourceid": 1131, "authors": [{"given_name": "Jean-philippe", "family_name": "Vert", "institution": null}, {"given_name": "Kevin", "family_name": "Bleakley", "institution": null}]}