{"title": "Exact and Stable Recovery of Sequences of Signals with Sparse Increments via Differential _1-Minimization", "book": "Advances in Neural Information Processing Systems", "page_first": 2627, "page_last": 2635, "abstract": "We consider the problem of recovering a sequence of vectors, $(x_k)_{k=0}^K$, for which the increments $x_k-x_{k-1}$ are $S_k$-sparse (with $S_k$ typically smaller than $S_1$), based on linear measurements $(y_k = A_k x_k + e_k)_{k=1}^K$, where $A_k$ and $e_k$ denote the measurement matrix and noise, respectively. Assuming each $A_k$ obeys the restricted isometry property (RIP) of a certain order---depending only on $S_k$---we show that in the absence of noise a convex program, which minimizes the weighted sum of the $\\ell_1$-norm of successive differences subject to the linear measurement constraints, recovers the sequence $(x_k)_{k=1}^K$ \\emph{exactly}. This is an interesting result because this convex program is equivalent to a standard compressive sensing problem with a highly-structured aggregate measurement matrix which does not satisfy the RIP requirements in the standard sense, and yet we can achieve exact recovery. In the presence of bounded noise, we propose a quadratically-constrained convex program for recovery and derive bounds on the reconstruction error of the sequence. We supplement our theoretical analysis with simulations and an application to real video data. These further support the validity of the proposed approach for acquisition and recovery of signals with time-varying sparsity.", "full_text": "Exact and Stable Recovery of Sequences of Signals\n\nwith Sparse Increments via Differential\n\n\u21131-Minimization\n\nDemba Ba1,2, Behtash Babadi1,2, Patrick Purdon2 and Emery Brown1,2\n\n1MIT Department of BCS, Cambridge, MA 02139\n\n2MGH Department of Anesthesia, Critical Care and Pain Medicine\n\n55 Fruit st, GRJ 4, Boston, MA 02114\n\ndemba@mit.edu, {behtash,patrickp}@nmr.mgh.harvard.edu\n\nenb@neurostat.mit.edu\n\nAbstract\n\nWe consider the problem of recovering a sequence of vectors, (xk)K\nk=0, for which\nthe increments xk\u2212 xk\u22121 are Sk-sparse (with Sk typically smaller than S1), based\nk=1, where Ak and ek denote the mea-\non linear measurements (yk = Akxk + ek)K\nsurement matrix and noise, respectively. Assuming each Ak obeys the restricted\nisometry property (RIP) of a certain order\u2014depending only on Sk\u2014we show that\nin the absence of noise a convex program, which minimizes the weighted sum\nof the \u21131-norm of successive differences subject to the linear measurement con-\nk=1 exactly. This is an interesting result be-\nstraints, recovers the sequence (xk)K\ncause this convex program is equivalent to a standard compressive sensing prob-\nlem with a highly-structured aggregate measurement matrix which does not satisfy\nthe RIP requirements in the standard sense, and yet we can achieve exact recov-\nery. In the presence of bounded noise, we propose a quadratically-constrained\nconvex program for recovery and derive bounds on the reconstruction error of the\nsequence. We supplement our theoretical analysis with simulations and an ap-\nplication to real video data. These further support the validity of the proposed\napproach for acquisition and recovery of signals with time-varying sparsity.\n\n1 Introduction\n\nIn the \ufb01eld of theoretical signal processing, compressive sensing (CS) has arguably been one of the\nmajor developments of the past decade. This claim is supported in part by the deluge of research\nefforts (see for example Rice University\u2019s CS repository [1]) which has followed the inception of\nthis \ufb01eld [2, 3, 4]. CS considers the problem of acquiring and recovering signals that are sparse\n(or compressible) in a given basis using non-adaptive linear measurements, at a rate smaller than\nwhat the Shannon-Nyquist theorem would require. The work [2, 4] derived conditions under which\na sparse signal can be recovered exactly from a small set of non-adaptive linear measurements.\nIn [3], the authors propose a recovery algorithm for the case of measurements contaminated by\nbounded noise. They show that this algorithm is stable, that is, within a constant of the noise\ntolerance. Recovery of these sparse or compressible signals is performed using convex optimization\ntechniques.\n\nThe classic CS setting does not take into account the structure, e.g.\ntemporal or spatial, of the\nunderlying high-dimensional sparse signals of interest. In recent years, the attention has shifted to\nformulations which incorporate the signal structure into the CS framework. A number of problems\nand applications of interest deal with time-varying signals which may not only be sparse at any\ngiven instant, but may also exhibit sparse changes from one instant to the next. For example, a video\n\n1\n\n\fof a natural scene consists of a sequence of natural images (compressible signals) which exhibits\nsparse changes from one frame to the next. It is thus reasonable to hope that one would be able to\nget away with far fewer measurements than prescribed by conventional CS theory to acquire and\nrecover such time-varying signals as videos. The problem of recovering signals with time-varying\nsparsity has been referred to in the literature as dynamic CS. A number of empirically-motivated\nalgorithms to solve the dynamic CS problem have been proposed, e.g. [5, 6]. To our knowledge,\nno recovery guarantees have been proved for these algorithms, which typically assume that the\nsupport of the signal and/or the amplitudes of the coef\ufb01cients change smoothly with time. In [5],\nfor instance, the authors propose message-passing algorithms for tracking and smoothing of signals\nwith time-varying sparsity. Simulation results show the superiority of the algorithms compared to\none based on applying conventional CS principles at each time instant. Dynamic CS algorithms\nhave potential applications to video processing [7], estimation of sources of brain activity from\nMEG time-series [8], medical imaging [7], and estimation of time-varying networks [9].\n\nTo the best of our knowledge, the dynamic CS problem has not received rigorous, theoretical\nscrutiny. In this paper, we develop rigorous results for dynamic CS both in the absence and in\nthe presence of noise. More speci\ufb01cally, in the absence of noise, we show that one can exactly re-\ncover a sequence (xk)K\nk=0 of vectors, for which the increments xk \u2212 xk\u22121 are Sk-sparse, based on\nlinear measurements yk = Akxk and under certain regularity conditions on (Ak)K\nk=1, by solving a\nconvex program which minimizes the weighted sum of the \u21131-norms of successive differences. In\nthe presence of noise, we derive error bounds for a quadratically-constrained convex program for\nrecovery of the sequence (xk)K\nIn the following section, we formulate the problem of interest and introduce our notation. In Sec-\ntion 3, we present our main theoretical results, which we supplement with simulated experiments\nand an application to real video data in Section 4. In this latter section, we introduce probability-of-\nrecovery surfaces for the dynamic CS problem, which generalize the traditional recovery curves of\nCS. We give concluding remarks in Section 5.\n\nk=0.\n\n2 Problem Formulation and Notation\n\nWe denote the support of a vector x \u2208 Rp by supp(x) = {j : xj 6= 0}. We say that a vector\nx \u2208 Rp is S-sparse if ||x||0 \u2264 S, where ||x||0 := |supp(x)|. We consider the problem of recovering\na sequence (xk)K\nk=0 of Rp vectors such that xk \u2212 xk\u22121 is Sk-sparse based on linear measurements\nof the form yk = Akxk + ek. Here, Ak \u2208 Rnk\u00d7p, ek \u2208 Rnk and yk \u2208 Rnk denote the measurement\nmatrix, measurement noise, and the observation vector, respectively. Typically, Sk < nk \u226a p,\nwhich accounts for the compressive nature of the measurements. For convenience, we let x0 be the\nRp vector of all zeros.\n\nFor the rest of our treatment, it will be useful to introduce some notation. We will be dealing with\nsequences (of sets, matrices, vectors), as such we let the index k denote the kth element of any such\nsequence. Let J be the set of indices {1, 2,\u00b7\u00b7\u00b7 , p}. For each k, we denote by {akj : j \u2208 J}, the\ncolumns of the matrix Ak and by Hk the Hilbert space spanned by these vectors.\nFor two matrices A1 \u2208 Rn1\u00d7p and A2 \u2208 Rn2\u00d7p, n2 \u2264 n1, we say that A2 \u2282 A1 if the rows of A2\nare distinct and each row of A2 coincides with a row of A1.\nWe say that the matrix A \u2208 Rn\u00d7p satis\ufb01es the restricted isometry property (RIP) or order S if, for\nall S-sparse x \u2208 Rp, we have\n\n(1 \u2212 \u03b4S)||x||2\n\n2 \u2264 ||Ax||2\n\n2 \u2264 (1 + \u03b4S)||x||2\n2 ,\n\nwhere \u03b4S \u2208 (0, 1) is the smallest constant for which Equation 1 is satis\ufb01ed [2].\nConsider the following convex optimization programs\n\n(1)\n\nmin\n\nx1,x2,\u00b7\u00b7\u00b7 ,xK\n\nK\n\nX\n\nk=1\n\n||xk \u2212 xk\u22121||1\n\n\u221aSk\n\ns.t.\n\nyk = Akxk,\n\nk = 1, 2,\u00b7\u00b7\u00b7 , K.\n\nmin\n\nx1,x2,\u00b7\u00b7\u00b7 ,xK\n\nK\n\nX\n\nk=1\n\nkxk \u2212 xk\u22121k1\n\n\u221aSk\n\ns.t. kyk \u2212 Akxkk2 \u2264 \u01ebk,\n\nk = 1, 2,\u00b7\u00b7\u00b7 , K.\n\n2\n\n(P1)\n\n(P2)\n\n\fWhat theoretical guarantees can we provide on the performance of the above programs for recovery\nof sequences of signals with sparse increments, respectively in the absence (P1) and in the presence\n(P2) of noise?\n\n3 Theoretical Results\n\nWe \ufb01rst present a lemma giving suf\ufb01cient conditions for the uniqueness of sequences of vectors with\nsparse increments given linear measurements in the absence of noise. Then, we prove a theorem\nwhich shows that, by strengthening the conditions of this lemma, program (P1) can exactly recover\nevery sequence of vectors with sparse increments. Finally, we derive error bounds for program (P2)\nin the context of recovery of sequences of vectors with sparse increments in the presence of noise.\nLemma 1 (Uniqueness of Sequences of Vectors with Sparse Increments).\nSuppose (Sk)K\nk=0 is such that S0 = 0, and for each k, Sk \u2265 1. Let Ak satisfy the RIP of order 2Sk.\nLet xk \u2208 Rp supported on Tk \u2286 J be such that ||xk \u2212 xk\u22121||0 \u2264 Sk, for k = 1, 2,\u00b7\u00b7\u00b7 , K. Suppose\nT0 = \u2205 without loss of generality (w.l.o.g.). Then, given Ak and yk = Akxk, the sequence of sets\n(Tk)K\n\nk=1, and consequently the sequence of coef\ufb01cients (xk)K\n\nk=1, can be reconstructed uniquely.\n\nProof. For brevity, and w.l.o.g., we prove the lemma for K = 2. We prove that there is a unique\nchoice of x1 and x2 such that ||x1 \u2212 x0||0 \u2264 S1, ||x2 \u2212 x1||0 \u2264 S2 and obeying y1 = A1x1, y2 =\n2 6= x2 supported\nA2x2. We proceed by contradiction , and assume that there exist x\u2032\n1 \u2212 x0||0 \u2264 S1,\n2, respectively, such that y1 = A1x1 = A1x\u2032\n1 and T \u2032\non T \u2032\n1||0 \u2264 S2. Then ||A1(x1 \u2212 x\u2032\nand ||x\u2032\n1)||2 = 0. Using the lower bound in the RIP of A1 and the\n2 \u2212 x\u2032\n1||2\n1, thus contradicting our assumption\n2 = 0, i.e. x1 = x\u2032\nfact that \u03b42S1 < 1, this leads to ||x1 \u2212 x\u2032\n1. Now consider the case of x2 and x\u2032\nthat x1 6= x\u2032\n0 = A2(x2 \u2212 x\u2032\n\n2) = A2(x2 \u2212 x1 + x1 \u2212 x\u2032\n\n1 6= x1 and x\u2032\n2, ||x\u2032\n\n2) = A2(x2 \u2212 x1 + x\u2032\n\n1, y2 = A2x2 = A2x\u2032\n\n2. We have\n\n(2)\n\n1 \u2212 x\u2032\n2).\nthat \u03b42S2 < 1,\n\n2||2\n\n1 \u2212 x\u2032\n\n2 \u2212 x\u2032\n\n1, which implies x\u2032\n\n2 = 0, i.e. x2 \u2212 x1 = x\u2032\n\nthis leads to\n2 = x2, thus contradict-\n\nUsing the lower bound in the RIP of A2 and the fact\n||x2 \u2212 x1 + x\u2032\ning our assumption that x2 6= x\u2032\n2.\nAs in Cand`es and Tao\u2019s work [2], this lemma only suggests what may be possible in terms of\nk=1 through a combinatorial, brute-force approach. By imposing stricter conditions\nrecovery of (xk)K\non (\u03b42Sk )K\nk=1 by solving a convex program. This is summarized in the\nfollowing theorem.\nTheorem 2 (Exact Recovery in the Absence of Noise).\nk=1 \u2208 Rp be a sequence of Rp vectors such that, for each k, ||\u00afxk \u2212 \u00afxk\u22121||0 \u2264 Sk for some\nLet (\u00afxk)K\nSk < p/2. Suppose that the measurements yk = Ak \u00afxk \u2208 Rnk are given, such that nk < p, A1 \u2283\nA2, Ak = A2 for k = 3,\u00b7\u00b7\u00b7 , K and (Ak)K\nk=1 satis\ufb01es \u03b4Sk + \u03b42Sk + \u03b43Sk < 1 for k = 1, 2 \u00b7\u00b7\u00b7 , K.\nThen, the sequence (\u00afxk)K\n\nk=1 is the unique minimizer to the program (P1).\n\nk=1, we can recover (xk)K\n\nProof. As before, we consider the case K = 2. The proof easily generalizes to the case of arbitrary\nK. We can re-write the program as follows:\n\nmin\nx1,x2\n\n||x1||1\u221aS1\n\n+ ||x2 \u2212 x1||1\n\n\u221aS2\n\ns.t.\n\nA1x1 = A1 \u00afx1, A2(x2 \u2212 x1) = A2(\u00afx2 \u2212 \u00afx1),\n\n(3)\n\n1 and x\u2217\n\n2 be the solutions to the above program. Let T1 = supp(\u00afx1) and \u2206T2 = supp(\u00afx2\u2212 \u00afx1).\n\nwhere we have used the fact that A1 \u2283 A2: A2x2 \u2212 A1x1 = A2 \u00afx2 \u2212 A1 \u00afx1, which implies A2(x2 \u2212\nx1) = A2(\u00afx2 \u2212 \u00afx1).\nLet x\u2217\nAssume |T1| \u2264 S1 and |\u2206T2| \u2264 S2.\nKey element of the proof: The key element of the proof is the existence of vectors u1, u2 sat-\nisfying the exact reconstruction property (ERP) [10, 11].\nIt has been shown in [10] that given\n\u03b4Sk + \u03b42Sk + \u03b43Sk < 1 for k = 1, 2:\n\n3\n\n\f1. hu1, a1ji = sgn(x1,j ), for all j \u2208 T1, and hu2, a2ji = sgn(x2,j ), for all j \u2208 \u2206T2.\n2. |hu1, a1ji| < 1, for all j \u2208 T c\n\n1 , and |hu2, a2ji| < 1, for all j \u2208 \u2206T c\n2 .\n\nSince \u00afx1 and \u00afx2 \u2212 \u00afx1 are feasible, we have\n2 \u2212 x\u2217\n\u221aS2\n\n||x\u2217\n1||1\u221aS1\n\n+ ||x\u2217\n\n1||1\n\n\u2264 ||\u00afx1||1\u221aS1\n\n+ ||\u00afx2 \u2212 \u00afx1||1\n\n\u221aS2\n\n.\n\n(4)\n\n||x\u2217\n1||1\u221aS1\n\n+ ||x\u2217\n\n1||1\n\n=\n\n1\n\n\u221aS1 X\n\nj\u2208T1\n\n|\u00afx1,j + (x\u2217\n\n|\u00afx2,j \u2212 \u00afx1,j + (cid:0)x\u2217\n\n2,j \u2212 x\u2217\n\n|x\u2217\n2,j \u2212 x\u2217\n1,j|\n\n1\n\n2 \u2212 x\u2217\n\u221aS2\n\u221aS2 X\n\u221aS1 X\n\n1\n\nj\u2208T1\n\nj\u2208\u2206T2\n\n1\n\n\u221aS2 X\n\nj\u2208\u2206T2\n\nj\u2208\u2206T c\n2\n\n1\n\n\u221aS2 X\n\u221aS1 X\n\n1\n\nj\u2208T1\n\n1\n\n\u221aS2 X\n\nj\u2208\u2206T2\n\n+\n\n\u2265\n\n+\n\n+\n\n=\n\n+\n\n1\n\n1,j \u2212 \u00afx1,j )| +\n\n\u221aS1 X\n1,j \u2212 (\u00afx2,j \u2212 \u00afx1,j )(cid:1)| +\n\nj\u2208T c\n1\n\n|x\u2217\n1,j|\n1\n\u221aS2 X\n\nj\u2208\u2206T c\n2\n\n1\n\n\u221aS1 X\n\nj\u2208T c\n1\n\nx\u2217\n1,jhu1, a1ji\n\n2,j \u2212 x\u2217\n\n1,j \u2212 (\u00afx2,j \u2212 \u00afx1,j)))\n\nsgn(\u00afx1,j )\n}\n|\n\nhu1,a1j i\n\n(\u00afx1,j + (x\u2217\n\n1,j \u2212 \u00afx1,j)) +\n\n{z\n(\u00afx2,j \u2212 \u00afx1,j + (x\u2217\nsgn(\u00afx2,j \u2212 \u00afx1,j)\n|\n}\n(x\u2217\n\nhu2,a2j i\n\n{z\n2,j \u2212 x\u2217\n1,j)hu2, a2ji\n1\n\u221aS1hu1,X\n}\n|\n\u221aS2hu2,X\n|\n\nx\u2217\n1,ja1j\n\n{z\n\nA1x\u2217\n1\n\nj\u2208J\n\nj\u2208J\n\n1\n\n|\u00afx2,j \u2212 \u00afx1,j| +\n\n|\u00afx1,j| +\n\n\u00afx1,ja1j\n\nj\u2208T1\n\n\u2212 X\n{z\n|\n(x\u2217\n2,j \u2212 x\u2217\n\nA1 \u00afx1\n\ni\n}\n1,j)a2j\n\nA2(x\u2217\n\n2 \u2212x\u2217\n1)\n\n{z\n\n\u2212 X\n|\n\nj\u2208\u2206T2\n\ni\n(\u00afx2,j \u2212 \u00afx1,j )a2j\n}\n(5)\n\nA2(\u00afx2\u2212\u00afx1)\n\n{z\n\n}\n\nThis implies that all of the inequalities in the derivation above must in fact be equalities. In particular,\n\n= ||\u00afx1||1\u221aS1\n\n+ ||\u00afx2 \u2212 \u00afx1||1\n\n.\n\n\u221aS2\n\n1\n\n\u221aS1 X\n\nj\u2208T c\n1\n\n|x\u2217\n1,j| +\n\n=\n\n\u2264\n\nj\u2208\u2206T c\n2\n\n1\n\n1\n\n\u221aS2 X\n\u221aS1 X\n\u221aS1 X\n\nj\u2208T c\n1\n\n1\n\nj\u2208T c\n1\n\nx\u2217\n1,jhu1, a1ji +\n\n(x\u2217\n\n2,j \u2212 x\u2217\n\n1,j )hu2, a2ji\n\n|x\u2217\n2,j \u2212 x\u2217\n1,j|\n\n1\n\n\u221aS2 X\n\u221aS2 X\n\nj\u2208\u2206T c\n2\n1\n\n+\n\nj\u2208\u2206T c\n2\n\n|x\u2217\n1,j||hu1, a1ji|\n}\n\n|\n1,j = 0 \u2200j \u2208 \u2206T c\n\n{z\n\n<1\n\n2,j \u2212 x\u2217\n|x\u2217\n\n1,j||hu2, a2ji|\n}\n\n{z\n\n|\n\n<1\n\n.\n\nTherefore, x\u2217\nA1 and A2 leads to\n\n1,j = 0 \u2200j \u2208 T c\n\n0 = ||A2(x\u2217\n1 = \u00afx1, and x\u2217\n\nso that x\u2217\n\n2 . Using the lower bounds in the RIP of\n\n1 , and x\u2217\n\n2,j \u2212 x\u2217\n1 \u2212 \u00afx1)||2 \u2265 (1 \u2212 \u03b42S1)||x\u2217\n1 \u2212 (\u00afx2 \u2212 \u00afx1))||2 \u2265 (1 \u2212 \u03b42S2)||x\u2217\n\n0 = ||A1(x\u2217\n2 \u2212 x\u2217\n2 = \u00afx2. Uniqueness follows from simple convexity arguments.\n\n1 \u2212 \u00afx1||2\n2 \u2212 x\u2217\n\n1 \u2212 (\u00afx2 \u2212 \u00afx1)||2 ,\n\n(6)\n(7)\n\nA few remarks are in order. First, Theorem 2 effectively asserts that the program (P1) is equivalent\n0 the vector\nto sequentially solving (i.e. for k = 1, 2,\u00b7\u00b7\u00b7 , K) the following program, starting with x\u2217\nof all zeros in Rp:\n\nmin\n\nxk (cid:12)(cid:12)(cid:12)(cid:12)xk \u2212 x\u2217\n\nk\u22121(cid:12)(cid:12)(cid:12)(cid:12)1\n\ns.t.\n\nyk \u2212 Akx\u2217\n\nk\u22121 = Ak(xk \u2212 x\u2217\n\nk\u22121),\n\nk = 1, 2,\u00b7\u00b7\u00b7 , K.\n\n(8)\n\n4\n\n\fSecond, it is interesting and surprising that Theorem 2 would hold, if one naively applies standard\nCS principles to our problem. To see this, if we let wk = xk \u2212 xk\u22121, then program (P1) becomes\n\nmin\n\nw1,\u00b7\u00b7\u00b7 ,wK\n\nK\n\nX\n\nk=1\n\n||wk||1\u221aSk\n\ns.t.\n\ny = Aw,\n\n(9)\n\nwhere w = (w\u2032\n\n1,\u00b7\u00b7\u00b7 , w\u2032\n\n1,\u00b7\u00b7\u00b7 , y\u2032\n\nK )\u2032 \u2208 RK\u00d7p, y = (y\u2032\nA1\n0\n0\n0\nA2 A2\n...\n...\n...\nAK AK \u00b7\u00b7\u00b7 AK\n\nK)\u2032 \u2208 RPK\nk=1 nk and A is given by\n\uf8f9\n\u00b7\u00b7\u00b7\n\u00b7\u00b7\u00b7\n\uf8fa\uf8fa\uf8fb\n...\n\n\uf8ee\n\uf8ef\uf8ef\uf8f0\n\nA =\n\n.\n\nAs K grows large, the columns of A become increasingly correlated or coherent, which intuitively\nmeans that A would be far from satisfying RIP of any order. Yet, we get exact recovery. This is an\nimportant reminder that the RIP is a suf\ufb01cient, but not necessary condition for recovery.\nThird, the assumption that A1 \u2283 A2, Ak = A2 for k = 3,\u00b7\u00b7\u00b7 , K makes practical sense as it\nallows one to avoid the prohibitive storage and computational cost of generating several distinct\nmeasurement matrices. Note that if a random A1 satis\ufb01es the RIP of some order and A1 \u2283 A2, then\nA2 also satis\ufb01es the RIP (of lower order).\nLastly, the key advantage of dynamic CS recovery (P1) is the smaller number of measurements\nrequired compared to the classical approach [2] which would solve K separate \u21131-minimization\nproblems. For each k = 1,\u00b7\u00b7\u00b7 , K, one would require nk \u2265 CSk log(p/Sk) measurements for\ndynamic recovery, compared to nk \u2265 CS1 log(p/S1) for classical recovery. Due to the hypothesis\nof Sk \u2264 S1 \u226a p, i.e., the sparse increments are small, we conclude that there are less number of\nmeasurements required for dynamic CS.\n\nWe now move to the case where the measurements are perturbed by bounded noise. More specif-\nically, we derive error bounds for a quadratically-constrained convex program for recovery of se-\nquences of vectors with sparse increments in the presence of noise.\nTheorem 3 (Conditionally Stable Recovery in Presence of Noise).\nLet (\u00afxk)K\nthe measurements yk = Akxk + ek \u2208 Rnk are given such that ||ek||2 \u2264 \u01ebk and (Ak)K\n\u03b43Sk + 3\u03b44Sk < 2, for each k. Let (x\u2217\n(x\u2217\nThen, we have:\n\nk=1 \u2208 Rp be as stated in Theorem 2, and x0 be the vector of all zeros in Rp. Suppose that\nk=1 satisfy\nk=1 be the solution to the program (P2). Finally, let hk :=\n0 := 0 \u2208 Rp.\n\nk\u22121) \u2212 (\u00afxk \u2212 \u00afxk\u22121), for k = 1, 2,\u00b7\u00b7\u00b7 , K, with the convention that \u00afx0 := x\u2217\n\nk \u2212 x\u2217\n\nk)K\n\nK\n\nX\n\nk=1\n\nkhkk2 \u2264\n\nK\n\nX\n\nk=1\n\n2CSk \u01ebk +\n\nK\n\nX\n\nk=2\n\nCSk\n\nAk X\n\n\u2113