{"title": "Universal low-rank matrix recovery from Pauli measurements", "book": "Advances in Neural Information Processing Systems", "page_first": 1638, "page_last": 1646, "abstract": "We study the problem of reconstructing an unknown matrix M of rank r and dimension d using O(rd polylog d) Pauli measurements.  This has applications in quantum state tomography, and is a non-commutative analogue of a well-known problem in compressed sensing:  recovering a sparse vector from a few of its Fourier coefficients.      We show that almost all sets of O(rd log^6 d) Pauli measurements satisfy the rank-r restricted isometry property (RIP).  This implies that M can be recovered from a fixed (\"universal\") set of Pauli measurements, using nuclear-norm minimization (e.g., the matrix Lasso), with nearly-optimal bounds on the error.  A similar result holds for any class of measurements that use an orthonormal operator basis whose elements have small operator norm.  Our proof uses Dudley's inequality for Gaussian processes, together with bounds on covering numbers obtained via entropy duality.", "full_text": "Universal low-rank matrix recovery\n\nfrom Pauli measurements\n\nYi-Kai Liu\n\nApplied and Computational Mathematics Division\n\nNational Institute of Standards and Technology\n\nGaithersburg, MD, USA\n\nyi-kai.liu@nist.gov\n\nAbstract\n\nWe study the problem of reconstructing an unknown matrix M of rank r and di-\nmension d using O(rd poly log d) Pauli measurements. This has applications in\nquantum state tomography, and is a non-commutative analogue of a well-known\nproblem in compressed sensing: recovering a sparse vector from a few of its\nFourier coef\ufb01cients.\nWe show that almost all sets of O(rd log6 d) Pauli measurements satisfy the rank-\nr restricted isometry property (RIP). This implies that M can be recovered from\na \ufb01xed (\u201cuniversal\u201d) set of Pauli measurements, using nuclear-norm minimization\n(e.g., the matrix Lasso), with nearly-optimal bounds on the error. A similar result\nholds for any class of measurements that use an orthonormal operator basis whose\nelements have small operator norm. Our proof uses Dudley\u2019s inequality for Gaus-\nsian processes, together with bounds on covering numbers obtained via entropy\nduality.\n\n1\n\nIntroduction\n\nLow-rank matrix recovery is the following problem: let M be some unknown matrix of dimension\nd and rank r (cid:28) d, and let A1, A2, . . . , Am be a set of measurement matrices; then can one recon-\nstruct M from its inner products tr(M\u2217A1), tr(M\u2217A2), . . . , tr(M\u2217Am)? This problem has many\napplications in machine learning [1, 2], e.g., collaborative \ufb01ltering (the Net\ufb02ix problem). Remark-\nably, it turns out that for many useful choices of measurement matrices, low-rank matrix recovery\nis possible, and can even be done ef\ufb01ciently. For example, when the Ai are Gaussian random ma-\ntrices, then it is known that m = O(rd) measurements are suf\ufb01cient to uniquely determine M, and\nfurthermore, M can be reconstructed by solving a convex program (minimizing the nuclear norm)\n[3, 4, 5]. Another example is the \u201cmatrix completion\u201d problem, where the measurements return a\nrandom subset of matrix elements of M; in this case, m = O(rd poly log d) measurements suf\ufb01ce,\nprovided that M satis\ufb01es some \u201cincoherence\u201d conditions [6, 7, 8, 9, 10].\nThe focus of this paper is on a different class of measurements, known as Pauli measurements. Here,\nthe Ai are randomly chosen elements of the Pauli basis, a particular orthonormal basis of Cd\u00d7d. The\nPauli basis is a non-commutative analogue of the Fourier basis in Cd; thus, low-rank matrix recovery\nusing Pauli measurements can be viewed as a generalization of the idea of compressed sensing of\nsparse vectors using their Fourier coef\ufb01cients [11, 12]. In addition, this problem has applications\nin quantum state tomography, the task of learning an unknown quantum state by performing mea-\nsurements [13]. This is because most quantum states of physical interest are accurately described by\ndensity matrices that have low rank; and Pauli measurements are especially easy to carry out in an\nexperiment (due to the tensor product structure of the Pauli basis).\n\n1\n\n\fIn this paper we show stronger results on low-rank matrix recovery from Pauli measurements. Pre-\nviously [13, 8], it was known that, for every rank-r matrix M \u2208 Cd\u00d7d, almost all choices of\nm = O(rd poly log d) random Pauli measurements will lead to successful recovery of M. Here\nwe show a stronger statement: there is a \ufb01xed (\u201cuniversal\u201d) set of m = O(rd poly log d) Pauli mea-\nsurements, such that for all rank-r matrices M \u2208 Cd\u00d7d, we have successful recovery.1 We do this\nby showing that the random Pauli sampling operator obeys the \u201crestricted isometry property\u201d (RIP).\nIntuitively, RIP says that the sampling operator is an approximate isometry, acting on the set of all\nlow-rank matrices. In geometric terms, it says that the sampling operator embeds the manifold of\nlow-rank matrices into O(rd poly log d) dimensions, with low distortion in the 2-norm.\nRIP for low-rank matrices is a very strong property, and prior to this work, it was only known to hold\nfor very unstructured types of random measurements, such as Gaussian measurements [3], which\nare unsuitable for most applications. RIP was known to fail in the matrix completion case, and\nwhether it held for Pauli measurements was an open question. Once we have established RIP for\nPauli measurements, we can use known results [3, 4, 5] to show low-rank matrix recovery from a\nuniversal set of Pauli measurements. In particular, using [5], we can get nearly-optimal universal\nbounds on the error of the reconstructed density matrix, when the data are noisy; and we can even get\nbounds on the recovery of arbitrary (not necessarily low-rank) matrices. These RIP-based bounds are\nqualitatively stronger than those obtained using \u201cdual certi\ufb01cates\u201d [14] (though the latter technique\nis applicable in some situations where RIP fails).\nIn the context of quantum state tomography, this implies that, given a quantum state that consists\nof a low-rank component Mr plus a residual full-rank component Mc, we can reconstruct Mr up\nto an error that is not much larger than Mc. In particular, let (cid:107)\u00b7(cid:107)\u2217 denote the nuclear norm, and let\n(cid:107)\u00b7(cid:107)F denote the Frobenius norm. Then the error can be bounded in the nuclear norm by O((cid:107)Mc(cid:107)\u2217)\n(assuming noiseless data), and it can be bounded in the Frobenius norm by O((cid:107)Mc(cid:107)F poly log d)\n(which holds even with noisy data2). This shows that our reconstruction is nearly as good as the\n\u221a\nbest rank-r approximation to M (which is given by the truncated SVD). In addition, a completely\narbitrary quantum state can be reconstructed up to an error of O(1/\nr) in Frobenius norm. Lastly,\nthe RIP gives some insight into the optimal design of tomography experiments, in particular, the\ntradeoff between the number of measurement settings (which is essentially m), and the number of\nrepetitions of the experiment at each setting (which determines the statistical noise that enters the\ndata) [15].\nThese results can be generalized beyond the class of Pauli measurements. Essentially, one can\nreplace the Pauli basis with any orthonormal basis of Cd\u00d7d that is incoherent, i.e., whose elements\n\u221a\nhave small operator norm (of order O(1/\nd), say); a similar generalization was noted in the earlier\nnot just for all rank-r matrices, but for all matrices X that satisfy (cid:107)X(cid:107)\u2217 \u2264 \u221a\nresults of [8]. Also, our proof shows that the RIP actually holds in a slightly stronger sense: it holds\n\nr(cid:107)X(cid:107)F .\n\nTo prove this result, we combine a number of techniques that have appeared elsewhere. RIP results\nwere previously known for Gaussian measurements and some of their close relatives [3]. Also,\nrestricted strong convexity (RSC), a similar but somewhat weaker property, was recently shown\nin the context of the matrix completion problem (with additional \u201cnon-spikiness\u201d conditions) [10].\nThese results follow from covering arguments (i.e., using a concentration inequality to upper-bound\nthe failure probability on each individual low-rank matrix X, and then taking the union bound over\nall such X). Showing RIP for Pauli measurements seems to be more delicate, however. Pauli\nmeasurements have more structure and less randomness, so the concentration of measure phenomena\nare weaker, and the union bound no longer gives the desired result.\nInstead, one must take into account the favorable correlations between the behavior of the sampling\noperator on different matrices \u2014 intuitively, if two low-rank matrices M and M(cid:48) have overlapping\nsupports, then good behavior on M is positively correlated with good behavior on M(cid:48). This can be\ndone by transforming the problem into a Gaussian process, and using Dudley\u2019s entropy bound. This\nis the same approach used in classical compressed sensing, to show RIP for Fourier measurements\n[12, 11]. The key difference is that in our case, the Gaussian process is indexed by low-rank matrices,\nrather than sparse vectors. To bound the correlations in this process, one then needs to bound the\ncovering numbers of the nuclear norm ball (of matrices), rather than the (cid:96)1 ball (of vectors). This\n\n1Note that in the universal result, m is slightly larger, by a factor of poly log d.\n2However, this bound is not universal.\n\n2\n\n\frequires a different technique, using entropy duality, which is due to Gu\u00b4edon et al [16]. (See also\nthe related work in [17].)\nAs a side note, we remark that matrix recovery can sometimes fail because there exist large sets of\nup to d Pauli matrices that all commute, i.e., they have a simultaneous eigenbasis \u03c61, . . . , \u03c6d. (These\n\u03c6i are of interest in quantum information \u2014 they are called stabilizer states [18].) If one were to\nmeasure such a set of Pauli\u2019s, one would gain complete knowledge about the diagonal elements of\nthe unknown matrix M in the \u03c6i basis, but one would learn nothing about the off-diagonal elements.\nThis is reminiscent of the dif\ufb01culties that arise in matrix completion. However, in our case, these\npathological cases turn out to be rare, since it is unlikely that a random subset of Pauli matrices will\nall commute.\nFinally, we note that there is a large body of related work on estimating a low-rank matrix by solving\na regularized convex program; see, e.g., [19, 20].\nThis paper is organized as follows. In section 2, we state our results precisely, and discuss some\nspeci\ufb01c applications to quantum state tomography. In section 3 we prove the RIP for Pauli matrices,\nand in section 4 we discuss some directions for future work. Some technical details appear in\nsections A and B in the supplementary material [21].\nNotation: For vectors, (cid:107)\u00b7(cid:107)2 denotes the (cid:96)2 norm. For matrices, (cid:107)\u00b7(cid:107)p denotes the Schatten p-norm,\ni \u03c3i(X)p)1/p, where \u03c3i(X) are the singular values of X. In particular, (cid:107)\u00b7(cid:107)\u2217 = (cid:107)\u00b7(cid:107)1\nis the trace or nuclear norm, (cid:107)\u00b7(cid:107)F = (cid:107)\u00b7(cid:107)2 is the Frobenius norm, and (cid:107)\u00b7(cid:107) = (cid:107)\u00b7(cid:107)\u221e is the operator\nnorm. Finally, for matrices, A\u2217 is the adjoint of A, and (\u00b7,\u00b7) is the Hilbert-Schmidt inner product,\nthe superoperator that maps every matrix X \u2208 Cd\u00d7d to the matrix A tr(A\u2217X).\n\n(cid:107)X(cid:107)p = ((cid:80)\n(A, B) = tr(A\u2217B). Calligraphic letters denote superoperators acting on matrices. Also,(cid:12)(cid:12)A(cid:1)(cid:0)A(cid:12)(cid:12) is\n\n2 Our Results\nWe will consider the following approach to low-rank matrix recovery. Let M \u2208 Cd\u00d7d be an un-\nknown matrix of rank at most r. Let W1, . . . , Wd2 be an orthonormal basis for Cd\u00d7d, with respect\nto the inner product (A, B) = tr(A\u2217B). We choose m basis elements, S1, . . . , Sm, iid uniformly\nat random from {W1, . . . , Wd2} (\u201csampling with replacement\u201d). We then observe the coef\ufb01cients\n(Si, M ). From this data, we want to reconstruct M.\nFor this to be possible, the measurement matrices Wi must be \u201cincoherent\u201d with respect to M.\nRoughly speaking, this means that the inner products (Wi, M ) must be small. Formally, we say that\nthe basis W1, . . . , Wd2 is incoherent if the Wi all have small operator norm,\n\n\u221a\n(cid:107)Wi(cid:107) \u2264 K/\n\nd,\n\n(1)\n\nwhere K is a constant.3 (This assumption was also used in [8].)\nBefore proceeding further, let us sketch the connection between this problem and quantum state\ntomography. Consider a system of n qubits, with Hilbert space dimension d = 2n. We want to learn\nthe state of the system, which is described by a density matrix \u03c1 \u2208 Cd\u00d7d; \u03c1 is positive semide\ufb01nite,\nhas trace 1, and has rank r (cid:28) d when the state is nearly pure. There is a class of convenient (and\nexperimentally feasible) measurements, which are described by Pauli matrices (also called Pauli\nobservables). These are matrices of the form P1 \u2297 \u00b7\u00b7\u00b7 \u2297 Pn, where \u2297 denotes the tensor product\n(Kronecker product), and each Pi is a 2 \u00d7 2 matrix chosen from the following four possibilities:\n\n(cid:18)0 \u2212i\n(cid:19)\n\n(cid:18)1\n\n(cid:19)\n\n(cid:18)1 0\n\n(cid:19)\n\n(cid:18)0\n\n1\n\n(cid:19)\n\n1\n0\n\n,\n\n,\n\nI =\n\n0 1\n\n\u03c3x =\n\n(2)\nOne can estimate expectation values of Pauli observables, which are given by (\u03c1, (P1 \u2297 \u00b7\u00b7\u00b7 \u2297 Pn)).\n\u221a\n\u221a\nThis is a special case of the above measurement model, where the measurement matrices Wi are\nthe (scaled) Pauli observables (P1 \u2297 \u00b7\u00b7\u00b7 \u2297 Pn)/\nd,\nK = 1.\n\nd, and they are incoherent with (cid:107)Wi(cid:107) \u2264 K/\n\n\u03c3y =\n\n,\n\n\u03c3z =\n\ni\n\n0\n\n0\n0 \u22121\n\n.\n\n3Note that (cid:107)Wi(cid:107) is the maximum inner product between Wi and any rank-1 matrix M (normalized so that\n\n(cid:107)M(cid:107)F = 1).\n\n3\n\n\fNow we return to our discussion of the general problem. We choose S1, . . . , Sm iid uniformly at\nrandom from {W1, . . . , Wd2}, and we de\ufb01ne the sampling operator A : Cd\u00d7d \u2192 Cm as\n\nThe normalization is chosen so that EA\u2217A = I. (Note that A\u2217A =(cid:80)m\n\n(A(X))i = d\u221a\n\nm tr(S\u2217\n\ni = 1, . . . , m.\n\ni X),\n\n(3)\n\n(cid:12)(cid:12)Sj\n\n(cid:1)(cid:0)Sj\n\n(cid:12)(cid:12) \u00b7 d2\n\nm .)\n\nj=1\n\nWe assume we are given the data y = A(M ) + z, where z \u2208 Cm is some (unknown) noise contribu-\ntion. We will construct an estimator \u02c6M by minimizing the nuclear norm, subject to the constraints\nspeci\ufb01ed by y. (Note that one can view the nuclear norm as a convex relaxation of the rank function\n\u2014 thus these estimators can be computed ef\ufb01ciently.) One approach is the matrix Dantzig selector:\n\n\u02c6M = arg min\nX\n\n(cid:107)X(cid:107)\u2217 such that (cid:107)A\u2217(y \u2212 A(X))(cid:107) \u2264 \u03bb.\n\nAlternatively, one can solve a regularized least-squares problem, also called the matrix Lasso:\n\n\u02c6M = arg min\nX\n\n1\n\n2(cid:107)A(X) \u2212 y(cid:107)2\n\n2 + \u00b5(cid:107)X(cid:107)\u2217.\n\n(4)\n\n(5)\n\nHere, the parameters \u03bb and \u00b5 are set according to the strength of the noise component z (we will\ndiscuss this later). We will be interested in bounding the error of these estimators. To do this, we\nwill show that the sampling operator A satis\ufb01es the restricted isometry property (RIP).\n\n2.1 RIP for Pauli Measurements\nFix some constant 0 \u2264 \u03b4 < 1. Fix d, and some set U \u2282 Cd\u00d7d. We say that A satis\ufb01es the restricted\nisometry property (RIP) over U if, for all X \u2208 U, we have\n\n(1 \u2212 \u03b4)(cid:107)X(cid:107)F \u2264 (cid:107)A(X)(cid:107)2 \u2264 (1 + \u03b4)(cid:107)X(cid:107)F .\n\n(6)\n(Here, (cid:107)A(X)(cid:107)2 denotes the (cid:96)2 norm of a vector, while (cid:107)X(cid:107)F denotes the Frobenius norm of a\nmatrix.) When U is the set of all X \u2208 Cd\u00d7d with rank r, this is precisely the notion of RIP studied\nall X \u2208 Cd\u00d7d such that (cid:107)X(cid:107)\u2217 \u2264 \u221a\nin [3, 5]. We will show that Pauli measurements satisfy the RIP over a slightly larger set (the set of\nr(cid:107)X(cid:107)F ), provided the number of measurements m is at least\n\u2126(rd poly log d). This result generalizes to measurements in any basis with small operator norm.\nTheorem 2.1 Fix some constant 0 \u2264 \u03b4 < 1. Let {W1, . . . , Wd2} be an orthonormal basis for Cd\u00d7d\nthat is incoherent in the sense of (1). Let m = CK 2 \u00b7 rd log6 d, for some constant C that depends\nonly on \u03b4, C = O(1/\u03b42). Let A be de\ufb01ned as in (3). Then, with high probability (over the choice\nof S1, . . . , Sm), A satis\ufb01es the RIP over the set of all X \u2208 Cd\u00d7d such that (cid:107)X(cid:107)\u2217 \u2264 \u221a\nr(cid:107)X(cid:107)F .\n\nFurthermore, the failure probability is exponentially small in \u03b42C.\n\nWe will prove this theorem in section 3. In the remainder of this section, we discuss its applications\nto low-rank matrix recovery, and quantum state tomography in particular.\n\n2.2 Applications\n\nBy combining Theorem 2.1 with previous results [3, 4, 5], we immediately obtain bounds on the\naccuracy of the matrix Dantzig selector (4) and the matrix Lasso (5). In particular, for the \ufb01rst time\nwe can show universal recovery of low-rank matrices via Pauli measurements, and near-optimal\nbounds on the accuracy of the reconstruction when the data is noisy [5]. (Similar results hold for\nmeasurements in any incoherent operator basis.) These RIP-based results improve on the earlier\nresults based on dual certi\ufb01cates [13, 8, 14]. See [3, 4, 5] for details.\nHere, we will sketch a couple of these results that are of particular interest for quantum state to-\nmography. Here, M is the density matrix describing the state of a quantum mechanical object, and\nA(M ) is a vector of Pauli expectation values for the state M. (M has some additional properties:\nit is positive semide\ufb01nite, and has trace 1; thus A(M ) is a real vector.) There are two main issues\nthat arise. First, M is not precisely low-rank. In many situations, the ideal state has low rank (for\ninstance, a pure state has rank 1); however, for the actual state observed in an experiment, the den-\nsity matrix M is full-rank with decaying eigenvalues. Typically, we will be interested in obtaining a\ngood low-rank approximation to M, ignoring the tail of the spectrum.\n\n4\n\n\fSecondly, the measurements of A(M ) are inherently noisy. We do not observe A(M ) directly;\nrather, we estimate each entry (A(M ))i by preparing many copies of the state M, measuring the\nPauli observable Si on each copy, and averaging the results. Thus, we observe yi = (A(M ))i + zi,\nwhere zi is binomially distributed. When the number of experiments being averaged is large, zi can\nbe approximated by Gaussian noise. We will be interested in getting an estimate of M that is stable\nwith respect to this noise. (We remark that one can also reduce the statistical noise by performing\nmore repetitions of each experiment. This suggests the possibility of a tradeoff between the accuracy\nof estimating each parameter, and the number of parameters one chooses to measure overall. This\nwill be discussed elsewhere [15].)\nWe would like to reconstruct M up to a small error in the nuclear or Frobenius norm. Let \u02c6M be\nour estimate. Bounding the error in nuclear norm implies that, for any measurement allowed by\nquantum mechanics, the probability of distinguishing the state \u02c6M from M is small. Bounding the\nerror in Frobenius norm implies that the difference \u02c6M \u2212 M is highly \u201cmixed\u201d (and thus does not\ncontribute to the coherent or \u201cquantum\u201d behavior of the system).\nWe now sketch a few results from [4, 5] that apply to this situation. Write M = Mr + Mc, where\nMr is a rank-r approximation to M, corresponding to the r largest singular values of M, and Mc\nis the residual part of M (the \u201ctail\u201d of M). Ideally, our goal is to estimate M up to an error that is\nnot much larger than Mc. First, we can bound the error in nuclear norm (assuming the data has no\nnoise):\nProposition 2.2 (Theorem 5 from [4]) Let A : Cd\u00d7d \u2192 Cm be the random Pauli sampling operator,\nwith m = Crd log6 d, for some absolute constant C. Then, with high probability over the choice of\nA, the following holds:\nLet M be any matrix in Cd\u00d7d, and write M = Mr + Mc, as described above. Say we observe\ny = A(M ), with no noise. Let \u02c6M be the Dantzig selector (4) with \u03bb = 0. Then\n\nwhere C(cid:48)\n\n0 is an absolute constant.\n\n(cid:107) \u02c6M \u2212 M(cid:107)\u2217 \u2264 C(cid:48)\n\n0(cid:107)Mc(cid:107)\u2217,\n\n(7)\n\nWe can also bound the error in Frobenius norm, allowing for noisy data:\n\n\u221a\nProposition 2.3 (Lemma 3.2 from [5]) Assume the same set-up as above, but say we observe y =\nA(M ) + z, where z \u223c N (0, \u03c32I). Let \u02c6M be the Dantzig selector (4) with \u03bb = 8\n\u221a\nd\u03c3, or the Lasso\n(5) with \u00b5 = 16\n\nd\u03c3. Then, with high probability over the noise z,\n\u221a\nrd\u03c3 + C1(cid:107)Mc(cid:107)\u2217/\n\n(cid:107) \u02c6M \u2212 M(cid:107)F \u2264 C0\n\n\u221a\n\nr,\n\n(8)\n\nwhere C0 and C1 are absolute constants.\n\n\u221a\n\nThis bounds the error of \u02c6M in terms of the noise strength \u03c3 and the size of the tail Mc. It is universal:\none sampling operator A works for all matrices M. While this bound may seem unnatural because\nit mixes different norms, it can be quite useful. When M actually is low-rank (with rank r), then\nMc = 0, and the bound (8) becomes particularly simple. The dependence on the noise strength \u03c3\nis known to be nearly minimax-optimal [5]. Furthermore, when some of the singular values of M\nd\u03c3, one can show a tighter bound, with a nearly-optimal bias-variance\nfall below the \u201cnoise level\u201d\ntradeoff; see Theorem 2.7 in [5] for details.\nOn the other hand, when M is full-rank, then the error of \u02c6M depends on the behavior of the tail Mc.\nWe will consider a couple of cases. First, suppose we do not assume anything about M, besides the\nfact that it is a density matrix for a quantum state. Then (cid:107)M(cid:107)\u2217 = 1, hence (cid:107)Mc(cid:107)\u2217 \u2264 1 \u2212 r\nd, and we\ncan use (8) to get (cid:107) \u02c6M \u2212 M(cid:107)F \u2264 C0\nr . Thus, even for arbitrary (not necessarily low-rank)\nquantum states, the estimator \u02c6M gives nontrivial results. The O(1/\nr) term can be interpreted as\nthe penalty for only measuring an incomplete subset of the Pauli observables.\nFinally, consider the case where M is full-rank, but we do know that the tail Mc is small. If we\nknow that Mc is small in nuclear norm, then we can use equation (8). However, if we know that Mc\nis small in Frobenius norm, one can give a different bound, using ideas from [5], as follows.\n\nrd\u03c3 + C1\u221a\n\n\u221a\n\n\u221a\n\n5\n\n\fProposition 2.4 Let M be any matrix in Cd\u00d7d, with singular values \u03c31(M ) \u2265 \u00b7\u00b7\u00b7 \u2265 \u03c3d(M ).\nChoose a random Pauli sampling operator A : Cd\u00d7d \u2192 Cm, with m = Crd log6 d, for some\nabsolute constant C. Say we observe y = A(M ) + z, where z \u223c N (0, \u03c32I). Let \u02c6M be the Dantzig\n\u221a\nselector (4) with \u03bb = 16\nd\u03c3. Then, with high probability over\nthe choice of A and the noise z,\nF \u2264 C0\n\n\u221a\nd\u03c3, or the Lasso (5) with \u00b5 = 32\n\ni (M ), d\u03c32) + C2(log6 d)\n\n(cid:107) \u02c6M \u2212 M(cid:107)2\n\nr(cid:88)\n\nd(cid:88)\n\n\u03c32\ni (M ),\n\nmin(\u03c32\n\n(9)\n\nwhere C0 and C2 are absolute constants.\n\ni=1\n\ni=r+1\n\nThis bound can be interpreted as follows. The \ufb01rst term expresses the bias-variance tradeoff for esti-\nmay not be tight.) In particular, this implies: (cid:107) \u02c6M \u2212 M(cid:107)F \u2264 \u221a\nmating Mr, while the second term depends on the Frobenius norm of Mc. (Note that the log6 d factor\nC2(log3 d)(cid:107)Mc(cid:107)F .\nThis can be compared with equation (8) (involving (cid:107)Mc(cid:107)\u2217). This bound will be better when\n(cid:107)Mc(cid:107)F (cid:28) (cid:107)Mc(cid:107)\u2217, i.e., when the tail Mc has slowly-decaying eigenvalues (in physical terms, it\nis highly mixed).\nProposition 2.4 is an adaptation of Theorem 2.8 in [5]. We sketch the proof in section B in [21]. Note\nthat this bound is not universal: it shows that for all matrices M, a random choice of the sampling\noperator A is likely to work.\n\nrd\u03c3 +\n\n\u221a\n\n\u221a\n\nC0\n\n3 Proof of the RIP for Pauli Measurements\n\nWe now prove Theorem 2.1. The general approach involving Dudley\u2019s entropy bound is similar to\n[12], while the technical part of the proof (bounding certain covering numbers) uses ideas from [16].\nWe summarize the argument here; the details are given in section A in [21].\n\nr(cid:107)X(cid:107)F}. Let B be the set of all self-adjoint linear\n\n3.1 Overview\nLet U2 = {X \u2208 Cd\u00d7d | (cid:107)X(cid:107)F \u2264 1, (cid:107)X(cid:107)\u2217 \u2264 \u221a\noperators from Cd\u00d7d to Cd\u00d7d, and de\ufb01ne the following norm on B:\n(cid:107)M(cid:107)(r) = sup\nX\u2208U2\n\n|(X,MX)|.\n\n(10)\n(Suppose r \u2265 2, which is suf\ufb01cient for our purposes. It is straightforward to show that (cid:107)\u00b7(cid:107)(r) is a\nnorm, and that B is a Banach space with respect to this norm.) Then let us de\ufb01ne\n\n\u03b5r(A) = (cid:107)A\u2217A \u2212 I(cid:107)(r).\n\n(11)\nBy an elementary argument, in order to prove RIP, it suf\ufb01ces to show that \u03b5r(A) < 2\u03b4 \u2212 \u03b42. We\nwill proceed as follows: we will \ufb01rst bound E\u03b5r(A), then show that \u03b5r(A) is concentrated around\nits mean.\n\nUsing a standard symmetrization argument, we have that E\u03b5r(A) \u2264 2E(cid:13)(cid:13)(cid:13)(cid:80)m\n(cid:12)(cid:12)Sj\n(cid:1)(cid:0)Sj\nwhere the \u03b5j are Rademacher (iid \u00b11) random variables. Here the round ket notation(cid:12)(cid:12)Sj\n(cid:12)(cid:12) denotes the adjoint element in the (dual) vector space.\nthe round bra(cid:0)Sj\n\nwe view the matrix Sj as an element of the vector space Cd2 with Hilbert-Schmidt inner product;\n\n(cid:13)(cid:13)(cid:13)(r)\n(cid:12)(cid:12) d2\n(cid:1) means\n\nj=1 \u03b5j\n\nm\n\n,\n\nNow we use the following lemma, which we will prove later. This bounds the expected magnitude\nin (r)-norm of a Rademacher sum of a \ufb01xed collection of operators V1, . . . , Vm that have small\noperator norm.\nLemma 3.1 Let m \u2264 d2. Fix some V1, . . . , Vm \u2208 Cd\u00d7d that have uniformly bounded operator\nnorm, (cid:107)Vi(cid:107) \u2264 K (for all i). Let \u03b51, . . . , \u03b5m be iid uniform \u00b11 random variables. Then\n\n(cid:13)(cid:13)(cid:13) m(cid:88)\n\ni=1\n\nE\u03b5\n\n(cid:12)(cid:12)Vi\n\n(cid:1)(cid:0)Vi\n\n(cid:12)(cid:12)(cid:13)(cid:13)(cid:13)(r)\n\n\u2264 C5 \u00b7(cid:13)(cid:13)(cid:13) m(cid:88)\n(cid:12)(cid:12)Vi\n\n(cid:1)(cid:0)Vi\n\n(cid:12)(cid:12)(cid:13)(cid:13)(cid:13)1/2\n\n(r)\n\n\u03b5i\n\ni=1\n\n\u221a\n\nwhere C5 =\n\nr \u00b7 C4K log5/2 d log1/2 m and C4 is some universal constant.\n\n,\n\n(12)\n\n6\n\n\fAfter some algebra, one gets that E\u03b5r(A) \u2264 2(E\u03b5r(A) + 1)1/2 \u00b7 C5 \u00b7(cid:113) d\n\nr \u00b7\nC4K log3 d. By \ufb01nding the roots of this quadratic equation, we get the following bound on E\u03b5r(A).\n4 \u00b7 dr \u00b7 K 2 log6 d. Then we have the desired result:\nLet \u03bb \u2265 1. Assume that m \u2265 \u03bbd(2C5)2 = \u03bb\u00b7 4C 2\nE\u03b5r(A) \u2264 1\n\u03bb + 1\u221a\n(13)\nIt remains to show that \u03b5r(A) is concentrated around its expectation. For this we use a concentration\ninequality from [22] for sums of independent symmetric random variables that take values in some\nBanach space. See section A in [21] for details.\n\nm, where C5 =\n\n\u221a\n\n\u03bb\n\n.\n\n3.2 Proof of Lemma 3.1 (bounding a Rademacher sum in (r)-norm)\n\n(cid:12)(cid:12)Vi\n\n(cid:1)(cid:0)Vi\n\n(cid:12)(cid:12)(cid:107)(r); this is the quantity we want to bound. Using a standard com-\n\nLet L0 = E\u03b5(cid:107)(cid:80)m\n\nparison principle, we can replace the \u00b11 random variables \u03b5i with iid N (0, 1) Gaussian random\nvariables gi; then we get\n\ni=1 \u03b5i\n\nL0 \u2264 Eg sup\nX\u2208U2\n\n(14)\nThe random variables G(X) (indexed by X \u2208 U2) form a Gaussian process, and L0 is upper-\nbounded by the expected supremum of this process. Using the fact that G(0) = 0 and G(\u00b7) is\nsymmetric, and Dudley\u2019s inequality (Theorem 11.17 in [22]), we have\n\ni=1\n\ngi|(Vi, X)|2.\n\nm(cid:88)\n\n(cid:113) \u03c0\n2|G(X)|, G(X) =\n(cid:90) \u221e\n\n\u221a\nG(X) \u2264 24\n\n2\u03c0\n\n(15)\nwhere N (U2, dG, \u03b5) is a covering number (the number of balls in Cd\u00d7d of radius \u03b5 in the metric dG\nthat are needed to cover the set U2), and the metric dG is given by\n\nlog1/2 N (U2, dG, \u03b5)d\u03b5,\n\n0\n\n2\u03c0Eg sup\nX\u2208U2\n\n\u221a\n\nL0 \u2264\n\n(cid:16)E[(G(X) \u2212 G(Y ))2]\n(cid:17)1/2\n\ndG(X, Y ) =\n\n.\n\n(16)\n\nDe\ufb01ne a new norm (actually a semi-norm) (cid:107)\u00b7(cid:107)X on Cd\u00d7d, as follows:\n\n(17)\nWe use this to upper-bound the metric dG. An elementary calculation shows that dG(X, Y ) \u2264\n(r) . This lets us upper-bound the covering numbers in\n\n2R(cid:107)X \u2212 Y (cid:107)X, where R = (cid:107)(cid:80)m\n\n(cid:12)(cid:12)(cid:107)1/2\n\n(cid:1)(cid:0)Vi\n\n(cid:12)(cid:12)Vi\n\ni=1,...,m\n\ni=1\n\n(cid:107)M(cid:107)X = max\n\n|(Vi, M )|.\n\ndG with covering numbers in (cid:107)\u00b7(cid:107)X:\n\nr ).\n\n\u221a\n\u03b5\n2R\n\n2R ) = N ( 1\u221a\n\nr U2,(cid:107)\u00b7(cid:107)X ,\n\nN (U2, dG, \u03b5) \u2264 N (U2,(cid:107)\u00b7(cid:107)X , \u03b5\n\n(18)\nWe will now bound these covering numbers. First, we introduce some notation: let (cid:107)\u00b7(cid:107)p denote the\nSchatten p-norm on Cd\u00d7d, and let Bp be the unit ball in this norm. Also, let BX be the unit ball in\nthe (cid:107)\u00b7(cid:107)X norm.\nr U2 \u2286 B1 \u2286 K \u00b7 BX.\n1\u221a\nObserve that\nmaxi=1,...,m(cid:107)Vi(cid:107)(cid:107)M(cid:107)\u2217 \u2264 K(cid:107)M(cid:107)\u2217.) This gives a simple bound on the covering numbers:\n\n(The second inclusion follows because (cid:107)M(cid:107)X \u2264\n\nr U2,(cid:107)\u00b7(cid:107)X , \u03b5) \u2264 N (B1,(cid:107)\u00b7(cid:107)X , \u03b5) \u2264 N (K \u00b7 BX ,(cid:107)\u00b7(cid:107)X , \u03b5).\n\n(19)\nThis is 1 when \u03b5 \u2265 K. So, in Dudley\u2019s inequality, we can restrict the integral to the interval [0, K].\nWhen \u03b5 is small, we will use the following simple bound (equation (5.7) in [23]):\n\nN ( 1\u221a\n\n(20)\nWhen \u03b5 is large, we will use a more sophisticated bound based on Maurey\u2019s empirical method and\nentropy duality, which is due to [16] (see also [17]):\n\nN (K \u00b7 BX ,(cid:107)\u00b7(cid:107)X , \u03b5) \u2264 (1 + 2K\n\n\u03b5 )2d2\n\n.\n\nN (B1,(cid:107)\u00b7(cid:107)X , \u03b5) \u2264 exp( C2\n\n1 K2\n\u03b52\nWe defer the proof of (21) to the next section.\nUsing (20) and (21), we can bound the integral in Dudley\u2019s inequality. We get\n\u221a\n\nlog3 d log m),\n\nfor some constant C1.\n\nL0 \u2264 C4R\n\nrK log5/2 d log1/2 m,\n\n(21)\n\n(22)\n\nwhere C4 is some universal constant. This proves the lemma.\n\n7\n\n\f3.3 Proof of Equation (21) (covering numbers of the nuclear-norm ball)\n\nOur result will follow easily from a bound on covering numbers introduced in [16] (where it appears\nas Lemma 1):\n\nLemma 3.2 Let E be a Banach space, having modulus of convexity of power type 2 with constant\n\u03bb(E). Let E\u2217 be the dual space, and let T2(E\u2217) denote its type 2 constant. Let BE denote the unit\nball in E.\nLet V1, . . . , Vm \u2208 E\u2217, such that (cid:107)Vj(cid:107)E\u2217 \u2264 K (for all j). De\ufb01ne the norm on E,\n\n(23)\n\n(24)\n\n(cid:107)M(cid:107)X = max\n\n|(Vj, M )|.\n\nj=1,...,m\n\nThen, for any \u03b5 > 0,\n\n\u03b5 log1/2 N (BE,(cid:107)\u00b7(cid:107)X , \u03b5) \u2264 C2\u03bb(E)2T2(E\u2217)K log1/2 m,\n\nwhere C2 is some universal constant.\n\nThe proof uses entropy duality to reduce the problem to bounding the \u201cdual\u201d covering number. The\np denote the complex vector space Cm with the (cid:96)p norm. Consider\nbasic idea is as follows. Let (cid:96)m\n1 \u2192 E\u2217 that takes the j\u2019th coordinate vector to Vj. Let N (S) denote the number of\nthe map S : (cid:96)m\nballs in E\u2217 needed to cover the image (under the map S) of the unit ball in (cid:96)m\n1 . We can bound N (S)\nusing Maurey\u2019s empirical method. Also de\ufb01ne the dual map S\u2217 : E \u2192 (cid:96)m\u221e, and the associated dual\ncovering number N (S\u2217). Then N (BE,(cid:107)\u00b7(cid:107)X , \u03b5) is related to N (S\u2217). Finally, N (S) and N (S\u2217) are\nrelated via entropy duality inequalities. See [16] for details.\nWe will apply this lemma as follows, using the same approach as [17]. Let Sp denote the Banach\nspace consisting of all matrices in Cd\u00d7d with the Schatten p-norm.\nIntuitively, we want to set\nE = S1 and E\u2217 = S\u221e, but this won\u2019t work because \u03bb(S1) is in\ufb01nite. Instead, we let E = Sp,\np = (log d)/(log d \u2212 1), and E\u2217 = Sq, q = log d. Note that (cid:107)M(cid:107)p \u2264 (cid:107)M(cid:107)\u2217, hence B1 \u2286 Bp and\n(25)\nAlso, we have \u03bb(E) \u2264 1/\nlog d \u2212 1 (see the\nAppendix in [17]). Note that (cid:107)M(cid:107)q \u2264 e(cid:107)M(cid:107), thus we have (cid:107)Vj(cid:107)q \u2264 eK (for all j). Then, using\nthe lemma, we have\n\nlog d \u2212 1 and T2(E\u2217) \u2264 \u03bb(E) \u2264 \u221a\n\n\u03b5 log1/2 N (B1,(cid:107)\u00b7(cid:107)X , \u03b5) \u2264 \u03b5 log1/2 N (Bp,(cid:107)\u00b7(cid:107)X , \u03b5).\n\n\u221a\n\n\u221a\n\np \u2212 1 =\n\n\u03b5 log1/2 N (Bp,(cid:107)\u00b7(cid:107)X , \u03b5) \u2264 C2 log3/2 d (eK) log1/2 m,\n\n(26)\n\nwhich proves the claim.\n\n4 Outlook\n\nWe have showed that random Pauli measurements obey the restricted isometry property (RIP), which\nimplies strong error bounds for low-rank matrix recovery. The key technical tool was a bound on\ncovering numbers of the nuclear norm ball, due to Gu\u00b4edon et al [16].\nAn interesting question is whether this method can be applied to other problems, such as matrix com-\npletion, or constructing embeddings of low-dimensional manifolds into linear spaces with slightly\nhigher dimension. For matrix completion, one can compare with the work of Negahban and Wain-\nwright [10], where the sampling operator satis\ufb01es restricted strong convexity (RSC) over a certain set\nof \u201cnon-spiky\u201d low-rank matrices. For manifold embeddings, one could try to generalize the results\nof [24], which use the sparse-vector RIP to construct Johnson-Lindenstrauss metric embeddings.\nThere are also many questions pertaining to low-rank quantum state tomography. For example,\nhow does the matrix Lasso compare to the traditional approach using maximum likelihood estima-\ntion? Also, there are several variations on the basic tomography problem, and alternative notions of\nsparsity (e.g., elementwise sparsity in a known basis) [25], which have not been fully explored.\nAcknowledgements: Thanks to David Gross, Yaniv Plan, Emmanuel Cand`es, Stephen Jordan, and\nthe anonymous reviewers, for helpful suggestions. Parts of this work were done at the University\nof California, Berkeley, and supported by NIST grant number 60NANB10D262. This paper is\na contribution of the National Institute of Standards and Technology, and is not subject to U.S.\ncopyright.\n\n8\n\n\fReferences\n[1] M. Fazel. Matrix Rank Minimization with Applications. PhD thesis, Stanford, 2002.\n[2] N. Srebro. Learning with Matrix Factorizations. PhD thesis, MIT, 2004.\n[3] B. Recht, M. Fazel, and P. A. Parrilo. Guaranteed minimum rank solutions to linear matrix\n\nequations via nuclear norm minimization. SIAM Review, 52(3):471\u2013501, 2010.\n\n[4] M. Fazel, E. Candes, B. Recht, and P. Parrilo. Compressed sensing and robust recovery of\nlow rank matrices. In 42nd Asilomar Conference on Signals, Systems and Computers, pages\n1043\u20131047, 2008.\n\n[5] E. J. Candes and Y. Plan. Tight oracle bounds for low-rank matrix recovery from a minimal\n\nnumber of random measurements. 2009.\n\n[6] E. J. Candes and B. Recht. Exact matrix completion via convex optimization. Found. of\n\nComput. Math., 9:717\u2013772.\n\n[7] E. J. Candes and T. Tao. The power of convex relaxation: Near-optimal matrix completion.\n\nIEEE Trans. Inform. Theory, 56(5):2053\u20132080, 2009.\n\n[8] D. Gross. Recovering low-rank matrices from few coef\ufb01cients in any basis.\n\nInform. Theory, to appear. arXiv:0910.1879, 2010.\n\nIEEE Trans.\n\n[9] B. Recht. A simpler approach to matrix completion. J. Machine Learning Research (to appear),\n\n2010.\n\n[10] S. Negahban and M. J. Wainwright. Restricted strong convexity and weighted matrix comple-\n\ntion: Optimal bounds with noise. arXiv:1009.2118, 2010.\n\n[11] E. J. Candes and T. Tao. Near-optimal signal recovery from random projections: universal\n\nencoding strategies. IEEE Trans. Inform. Theory, 52:5406\u20135425, 2004.\n\n[12] M. Rudelson and R. Vershynin. On sparse reconstruction from Fourier and Gaussian measure-\n\nments. Commun. Pure and Applied Math., 61:1025\u20131045, 2008.\n\n[13] D. Gross, Y.-K. Liu, S. T. Flammia, S. Becker, and J. Eisert. Quantum state tomography via\n\ncompressed sensing. Phys. Rev. Lett., 105(15):150401, Oct 2010. arXiv:0909.3304.\n\n[14] E. J. Candes and Y. Plan. Matrix completion with noise. Proc. IEEE, 98(6):925 \u2013 936, 2010.\n[15] B. Brown, S. Flammia, D. Gross, and Y.-K. Liu. in preparation, 2011.\n[16] O. Gu\u00b4edon, S. Mendelson, A. Pajor, and N. Tomczak-Jaegermann. Majorizing measures and\nproportional subsets of bounded orthonormal systems. Rev. Mat. Iberoamericana, 24(3):1075\u2013\n1095, 2008.\n\n[17] G. Aubrun. On almost randomizing channels with a short Kraus decomposition. Commun.\n\nMath. Phys., 288:1103\u20131116, 2009.\n\n[18] M. A. Nielsen and I. Chuang. Quantum Computation and Quantum Information. Cambridge\n\nUniversity Press, 2001.\n\n[19] A. Rohde and A. Tsybakov.\n\narXiv:0912.5338, 2009.\n\nEstimation of high-dimensional\n\nlow-rank matrices.\n\n[20] V. Koltchinskii, K. Lounici, and A. B. Tsybakov. Nuclear norm penalization and optimal rates\n\nfor noisy low rank matrix completion. arXiv:1011.6256, 2010.\n\n[21] Y.-K. Liu. Universal low-rank matrix recovery from Pauli measurements. arXiv:1103.2816,\n\n2011.\n\n[22] M. Ledoux and M. Talagrand. Probability in Banach spaces. Springer, 1991.\n[23] G. Pisier. The volume of convex bodies and Banach space geometry. Cambridge, 1999.\n[24] F. Krahmer and R. Ward. New and improved Johnson-Lindenstrauss embeddings via the re-\n\nstricted isometry property. SIAM J. Math. Anal., 43(3):1269\u20131281, 2011.\n\n[25] A. Shabani, R. L. Kosut, M. Mohseni, H. Rabitz, M. A. Broome, M. P. Almeida, A. Fedrizzi,\nand A. G. White. Ef\ufb01cient measurement of quantum dynamics via compressive sensing. Phys.\nRev. Lett., 106(10):100401, 2011.\n\n[26] P. Wojtaszczyk. Stability and instance optimality for gaussian measurements in compressed\n\nsensing. Found. Comput. Math., 10(1):1\u201313, 2009.\n\n9\n\n\f", "award": [], "sourceid": 935, "authors": [{"given_name": "Yi-kai", "family_name": "Liu", "institution": null}]}