{"title": "Online Learning of Quantum States", "book": "Advances in Neural Information Processing Systems", "page_first": 8962, "page_last": 8972, "abstract": "Suppose we have many copies of an unknown n-qubit state $\\rho$. We measure some copies of $\\rho$ using a known two-outcome measurement E_1, then other copies using a measurement E_2, and so on. At each stage t, we generate a current hypothesis $\\omega_t$ about the state $\\rho$, using the outcomes of the previous measurements. We show that it is possible to do this in a way that guarantees that $|\\trace(E_i \\omega_t) - \\trace(E_i\\rho)|$, the error in our prediction for the next measurement, is at least $eps$ at most $O(n / eps^2) $\\ times. Even in the non-realizable setting---where there could be arbitrary noise in the measurement outcomes---we show how to output hypothesis states that incur at most $O(\\sqrt {Tn}) $ excess loss over the best possible state on the first $T$ measurements. These results generalize a 2007 theorem by Aaronson on the PAC-learnability of quantum states, to the online and regret-minimization settings. We give three different ways to prove our results---using convex optimization, quantum postselection, and sequential fat-shattering dimension---which have different advantages in terms of parameters and portability.", "full_text": "Online Learning of Quantum States\n\nScott Aaronson\nUT Austin \u21e4\n\naaronson@cs.utexas.edu\n\nXinyi Chen\n\nGoogle AI Princeton \u2020\nxinyic@google.com\n\nElad Hazan\n\nPrinceton University and Google AI Princeton\n\nehazan@cs.princeton.edu\n\nSatyen Kale\n\nGoogle AI, New York\n\nsatyenkale@google.com\n\nAshwin Nayak\n\nUniversity of Waterloo \u2021\n\nashwin.nayak@uwaterloo.ca\n\nAbstract\n\nSuppose we have many copies of an unknown n-qubit state \u21e2. We measure some\ncopies of \u21e2 using a known two-outcome measurement E1, then other copies using\na measurement E2, and so on. At each stage t, we generate a current hypothesis !t\nabout the state \u21e2, using the outcomes of the previous measurements. We show that\nit is possible to do this in a way that guarantees that |Tr(Ei!t) Tr(Ei\u21e2)|, the er-\nror in our prediction for the next measurement, is at least \" at most On/\"2 times.\nEven in the \u201cnon-realizable\u201d setting\u2014where there could be arbitrary noise in the\nmeasurement outcomes\u2014we show how to output hypothesis states that incur at\nmost O(pT n ) excess loss over the best possible state on the \ufb01rst T measurements.\nThese results generalize a 2007 theorem by Aaronson on the PAC-learnability of\nquantum states, to the online and regret-minimization settings. We give three\ndifferent ways to prove our results\u2014using convex optimization, quantum postse-\nlection, and sequential fat-shattering dimension\u2014which have different advantages\nin terms of parameters and portability.\n\n1\n\nIntroduction\n\nState tomography is a fundamental task in quantum computing of great practical and theoretical\nimportance. In a typical scenario, we have access to an apparatus that is capable of producing many\ncopies of a quantum state, and we wish to obtain a description of the state via suitable measurements.\nSuch a description would allow us, for example, to check the accuracy with which the apparatus\nconstructs a speci\ufb01c target state.\nHow many single-copy measurements are needed to \u201clearn\u201d an unknown n-qubit quantum state \u21e2?\nSuppose we wish to reconstruct the full 2n \u21e5 2n density matrix, even approximately, to within \"\nin trace distance. If we make no assumptions about \u21e2, then it is straightforward to show that the\nnumber of measurements needed grows exponentially with n. In fact, even when we allow joint\nmeasurement of multiple copies of the state, an exponential number of copies of \u21e2 are required (see,\n\u21e4Supported by a Vannevar Bush Faculty Fellowship from the US Department of Defense. Part of this work\n\nwas done while the author was supported by an NSF Alan T. Waterman Award.\n\n\u2020Part of this work was done when the author was a research assistant at Princeton University.\n\u2021Research supported in part by NSERC Canada.\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fe.g., O\u2019Donnell and Wright [2016], Haah et al. [2017]). (A \u201cjoint measurement\u201d of two or more\nstates on disjoint sequences of qubits is a single measurement of all the qubits together.)\nSuppose, on the other hand, that there is some probability distribution D over possible yes/no\nmeasurements, where we identify the measurements with 2n \u21e5 2n Hermitian matrices E with\neigenvalues in [0, 1]. Further suppose we are only concerned about learning the state \u21e2 well\nenough to predict the outcomes of most measurements E drawn from D\u2014where \u201cpredict\u201d means\napproximately calculating the probability, Tr(E\u21e2), of a \u201cyes\u201d result. Then for how many (known)\nsample measurements Ei, drawn independently from D, do we need to know the approximate value\nof Tr(Ei\u21e2), before we have enough data to achieve this?\nAaronson [2007] proved that the number of sample measurements needed, m, grows only linearly\nwith the number of qubits n. What makes this surprising is that it represents an exponential reduction\ncompared to full quantum state tomography. Furthermore, the prediction strategy is extremely simple.\nInformally, we merely need to \ufb01nd any \u201chypothesis state\u201d ! that satis\ufb01es Tr(Ei!) \u21e1 Tr(Ei\u21e2) for\nall the sample measurements E1, . . . , Em. Then with high probability over the choice of sample\nmeasurements, that hypothesis ! necessarily \u201cgeneralizes\u201d, in the sense that Tr(E!) \u21e1 Tr(E\u21e2) for\nmost additional E\u2019s drawn from D. The learning theorem led to followup work including a full\ncharacterization of quantum advice (Aaronson and Drucker [2014]); ef\ufb01cient learning for stabilizer\nstates (Rocchetto [2017]); the \u201cshadow tomography\u201d protocol (Aaronson [2018]); and recently, the\n\ufb01rst experimental demonstration of quantum state PAC-learning (Rocchetto et al. [2017]).\nA major drawback of the learning theorem due to Aaronson is the assumption that the sample\nmeasurements are drawn independently from D\u2014and moreover, that the same distribution D governs\nboth the training samples, and the measurements on which the learner\u2019s performance is later tested.\nIt has long been understood, in computational learning theory, that these assumptions are often\nunrealistic: they fail to account for adversarial environments, or environments that change over\ntime. This is precisely the state of affairs in current experimental implementations of quantum\ninformation processing. Not all measurements of quantum states may be available or feasible in a\nspeci\ufb01c implementation, which measurements are feasible is dictated by Nature, and as we develop\nmore control over the experimental set-up, more sophisticated measurements become available. The\ntask of learning a state prepared in the laboratory thus takes the form of a game, with the theorist\non one side, and the experimentalist and Nature on the other: the theorist is repeatedly challenged\nto predict the behaviour of the state with respect to the next measurement that Nature allows the\nexperimentalist to realize, with the opportunity to re\ufb01ne the hypothesis as more measurement data\nbecome available.\nIt is thus desirable to design learning algorithms that work in the more stringent online learning model.\nHere the learner is presented a sequence of input points, say x1, x2, . . ., one at a time. Crucially, there\nis no assumption whatsoever about the xt\u2019s: the sequence could be chosen adversarially, and even\nadaptively, which means that the choice of xt might depend on the learner\u2019s behavior on x1, . . . , xt1.\nThe learner is trying to learn some unknown function f (x), about which it initially knows only that f\nbelongs to some hypothesis class H\u2014or perhaps not even that; we also consider the scenario where\nthe learner simply tries to compete with the best predictor in H, which might or might not be a good\npredictor. The learning proceeds as follows: for each t, the learner \ufb01rst guesses a value yt for f (xt),\nand is then told the true value f (xt), or perhaps only an approximation of this value. Our goal is\nto design a learning algorithm with the following guarantee: regardless of the sequence of xt\u2019s, the\nlearner\u2019s guess, yt, will be far from the true value f (xt) at most k times (where k, of course, is as\nsmall as possible). The xt\u2019s on which the learner errs could be spaced arbitrarily; all we require is\nthat they be bounded in number.\nThis leads to the following question: can the learning theorem established by Aaronson [2007] be\ngeneralized to the online learning setting? In other words: is it true that, given a sequence E1, E2, . . .\nof yes/no measurements, where each Et is followed shortly afterward by an approximation of\nTr(Et\u21e2), there is a way to anticipate the Tr(Et\u21e2) values by guesses yt 2 [0, 1], in such a way that\n|yt Tr(Et\u21e2)| >\" at most, say, O(n) times (where \"> 0 is some constant, and n again is the\nnumber of qubits)? The purpose of this paper is to provide an af\ufb01rmative answer.\nThroughout the paper, we consider only two-outcome measurements of an n qubit mixed state \u21e2, and\nwe specify such a measurement by a 2n \u21e5 2n Hermitian matrix E with eigenvalues in [0, 1]. We say\nthat E \u201caccepts\u201d \u21e2 with probability Tr(E\u21e2) and \u201crejects\u201d \u21e2 with probability 1 Tr(E\u21e2). We prove\nthat:\n\n2\n\n\f\"2 values of t.\n\nTheorem 1. Let \u21e2 be an n-qubit mixed state, and let E1, E2, . . . be a sequence of 2-outcome\nmeasurements that are revealed to the learner one by one, each followed by a value bt 2 [0, 1]\nsuch that |Tr(Et\u21e2) bt|\uf8ff \"/3. Then there is an explicit strategy for outputting hypothesis states\n!1,! 2, . . . such that |Tr(Et!t) Tr(Et\u21e2)| >\" for at most O n\n\nWe also prove a theorem for the so-called regret minimization model (i.e., the \u201cnon-realizable case\u201d),\nwhere we make no assumption about the input data arising from an actual quantum state, and our\ngoal is simply to do not much worse than the best hypothesis state that could be found with perfect\nforesight. In this model, the measurements E1, E2, . . . are presented to a learner one-by-one. In\niteration t, after seeing Et, the learner is challenged to output a hypothesis state !t, and then suffers\na \u201closs\u201d equal to `t(Tr(Et!t)) where `t is a real function that is revealed to the learner. Important\nexamples of loss functions are L1 loss, when `t(z) := |z bt|, and L2 loss, when `t(z) := (z bt)2,\nwhere bt 2 [0, 1]. The number bt may be an approximation of Tr(Et\u21e2) for some \ufb01xed but unknown\nquantum state \u21e2, but is allowed to be arbitrary in general. In particular, the pairs (Et, bt) may not\nbe consistent with any quantum state. De\ufb01ne the regret RT , after T iterations, to be the amount by\nwhich the actual loss of the learner exceeds the loss of the best single hypothesis:\n\nRT :=\n\n`t(Tr(Et!t)) min\n\n'\n\nTXt=1\n\n`t(Tr(Et')) .\n\nTXt=1\n\nThe learner\u2019s objective is to minimize regret. We show that:\nTheorem 2. Let E1, E2, . . . be a sequence of two-outcome measurements on an n-qubit state\npresented to the learner, and `1,` 2, . . . be the corresponding loss functions revealed in successive\niterations in the regret minimization model. Suppose `t is convex and L-Lipschitz; in particular,\nfor every x 2 R, there is a sub-derivative `0t(x) such that |`0t(x)|\uf8ff L. Then there is an explicit\nlearning strategy that guarantees regret RT = O(LpT n ) for all T . This is so even assuming the\nmeasurement Et and loss function `t are chosen adaptively, in response to the learner\u2019s previous\nbehavior.\nSpeci\ufb01cally, the algorithm applies to L1 loss and L2 loss, and achieves regret O(pT n ) for both.\nThe online strategies we present enjoy several advantages over full state tomography, and even over\n\u201cstate certi\ufb01cation\u201d, in which we wish to test whether a given quantum state is close to a desired\nstate or far from it. Optimal algorithms for state tomography (O\u2019Donnell and Wright [2016], Haah\net al. [2017]) or certi\ufb01cation (B\u02d8adescu et al. [2017]) require joint measurements of an exponential\nnumber of copies of the quantum state, and assume the ability to perform noiseless, universal quantum\ncomputation. On the other hand, the algorithms implicit in Theorems 1 and 2 involve only single-copy\nmeasurements, allow for noisy measurements, and capture ground reality more closely. They produce\na hypothesis state that mimics the unknown state with respect to measurements that can be performed\nin a given experimental set-up, and the accuracy of prediction improves as the set of available\nmeasurements grows. For example, in the realizable case, i.e., when the data arise from an actual\n\nquantum state, the average L1 loss per iteration is O(pn/T ). This tends to zero, as the number of\nmeasurements becomes large. Note that L1 loss may be as large as 1/2 per iteration in the worst\ncase, but this occurs at most O(pnT ) times. Finally, the algorithms have run time exponential in the\nnumber of qubits in each iteration, but are entirely classical. Exponential run time is unavoidable, as\nthe measurements are presented explicitly as 2n \u21e5 2n matrices, where n is the number of qubits. If\nwe were required to output the hypothesis states, the length of the output\u2014also exponential in the\nnumber of qubits\u2014would again entail exponential run time.\nIt is natural to wonder whether Theorems 1 and 2 leave any room for improvement. Theorem 1 is\nasymptotically optimal in its mistake bound of O(n/\"2); this follows from the property that n-qubit\nquantum states, considered as a hypothesis class, have \"-fat-shattering dimension \u21e5(n/\"2) (see, for\nexample, Aaronson [2007]). On the other hand, there is room to improve Theorem 2. The bounds\nof which we are aware are \u2326(pT n ) for the L1 loss (see, e.g., [Arora et al., 2012, Theorem 4.1]) in\nthe non-realizable case and \u2326(n) for the L2 loss in the realizable case, when the feedback consists\nof the measurement outcomes. (The latter bound, as well as an \u2326(pT n ) bound for L1 loss in the\nsame setting, come from considering quantum mixed states that consist of n independent classical\ncoins, each of which could land heads with probability either 1/2 or 1/2 + \". The paramater \" is set\ntopn/T .)\n\n3\n\n\fWe mention an application of Theorem 1, that appears in simultaneous work. Aaronson [2018]\nhas given an algorithm for the so-called shadow tomography problem. Here we have an unknown\nD-dimensional pure state \u21e2, as well as known two-outcome measurements E1, . . . , Em. Our goal is\nto approximate Tr(Ei\u21e2), for every i, to within additive error \". We would like to do this by measuring\n\u21e2\u2326k, where k is as small as possible. Surprisingly, Aaronson [2018] showed that this can be achieved\n\nboth D and M. One component of his algorithm is essentially tantamount to online learning with\n\nusing Theorem 1 from this paper in a black-box manner, we can improve the sample complexity of\n\nwith k = eO((log M )4(log D)/\"5), that is, a number of copies of \u21e2 that is only polylogarithmic in\neO(n/\"3) mistakes\u2014i.e., the learning algorithm we present in Section 4 of this paper. However, by\nshadow tomography toeO((log M )4(log D)/\"4). Details appear in (Aaronson [2018]).\n\nTo maximize insight, in this paper we give three very different approaches to proving Theorems 1\nand 2 (although we do not prove every statement with all three approaches). Our \ufb01rst approach is to\nadapt techniques from online convex optimization to the setting of density matrices, which in general\nmay be over a complex Hilbert space. This requires extending standard techniques to cope with\nconvexity and Taylor approximations, which are widely used for functions over the real domain, but\nnot over the complex domain. We also give an ef\ufb01cient iterative algorithm to produce predictions.\nThis approach connects our problem to the modern mainstream of online learning algorithms, and\nachieves the best parameters (as stated in Theorems 1 and 2).\nOur second approach is via a postselection-based learning procedure, which starts with the maximally\nmixed state as a hypothesis and then repeatedly re\ufb01nes it by simulating postselected measurements.\nThis approach builds on earlier work due to Aaronson [2005], speci\ufb01cally the proof of BQP/qpoly \u2713\nPP/poly. The advantage is that it is almost entirely self-contained, requiring no \u201cpower tools\u201d from\nconvex optimization or learning theory. On the other hand, the approach does not give optimal\nparameters, and we do not know how to prove Theorem 2 with it.\nOur third approach is via an upper-bound on the so-called sequential fat-shattering dimension of\nquantum states, considered as a hypothesis class (see, e.g., Rakhlin et al. [2015]). In the original\nquantum PAC-learning theorem by Aaronson, the key step was to upper-bound the so-called \"-fat-\nshattering dimension of quantum states considered as a hypothesis class. Fat-shattering dimension is\na real-valued generalization of VC dimension. One can then appeal to known results to get a sample-\nef\ufb01cient learning algorithm. For online learning, however, bounding the fat-shattering dimension\nno longer suf\ufb01ces; one instead needs to consider a possibly-larger quantity called sequential fat-\nshattering dimension. However, by appealing to a lower bound due to Nayak [1999], Ambainis et al.\n[2002] for a variant of quantum random access codes, we are able to upper-bound the sequential\nfat-shattering dimension of quantum states. Using known results\u2014in particular, those due to Rakhlin\net al. [2015]\u2014this implies the regret bound in Theorem 2, up to a multiplicative factor of log3/2 T .\nThe statement that the hypothesis class of n-qubit states has \"-sequential fat-shattering dimension\nO(n/\"2) might be of independent interest: among other things, it implies that any online learning\nalgorithm that works given bounded sequential fat-shattering dimension, will work for online learning\nof quantum states. We also give an alternative proof for the lower bound due to Nayak for quantum\nrandom access codes, and extend it to codes that are decoded by what we call measurement decision\ntrees. We expect these also to be of independent interest.\n\n1.1 Structure of the paper\n\nWe start by describing background and the technical learning setting as well as notations used\nthroughout (Section 2). In Section 3 we give the algorithms and main theorems derived using\nconvexity arguments and online convex optimization. In Section 4 we state the main theorem using\na postselection algorithm.\nIn Section 5 we give a sequential fat-shattering dimension bound for\nquantum states and its implication for online learning of quantum states. Proofs of the theorems and\nrelated claims are presented in the appendices.\n\n2 Preliminaries and de\ufb01nitions\nWe de\ufb01ne the trace norm of a matrix M as kMkTr := TrpM M\u2020, where M\u2020 is the adjoint of M. We\ndenote the ith eigenvalue of a Hermitian matrix X by i(X), its minimum eigenvalue by min(X),\nand its maximum eigenvalue by max(X). We sometimes use the notation X \u2022 Y to denote the trace\n\n4\n\n\finner-product Tr(X\u2020Y ) between two complex matrices of the same dimensions. By \u2018log\u2019 we denote\nthe natural logarithm, unless the base is explicitly mentioned.\nAn n-qubit quantum state \u21e2 is an element of Cn, where Cn is the set of all trace-1 positive semi-\nde\ufb01nite (PSD) complex matrices of dimension 2n:\n\nCn = {M 2 C2n\u21e52n\n\n, M = M\u2020 , M \u232b 0 , Tr(M ) = 1} .\n\nNote that Cn is a convex set. A two-outcome measurement of an n-qubit state is de\ufb01ned by a 2n \u21e5 2n\nHermitian matrix E with eigenvalues in [0, 1]. The measurement E \u201caccepts\u201d \u21e2 with probability\nTr(E\u21e2), and \u201crejects\u201d with probability 1 Tr(E\u21e2). For the algorithms we present in this article,\nwe assume that a two-outcome measurement is speci\ufb01ed via a classical description of its de\ufb01ning\nmatrix E. In the rest of the article, unless mentioned otherwise, a \u201cmeasurement\u201d refers to a \u201ctwo-\noutcome measurement\u201d. We refer the reader to the book by Watrous [2018] for a more thorough\nintroduction to the relevant concepts from quantum information.\n\nOnline learning and regret.\nIn online learning of quantum states, we have a sequence of itera-\ntions t = 1, 2, 3, . . . of the following form. First, the learner constructs a state !t 2 Cn; we say that\nthe learner \u201cpredicts\u201d !t. It then suffers a \u201closs\u201d `t(Tr(Et!t)) that depends on a measurement Et,\nboth of which are presented by an adversary. Commonly used loss functions are L2 loss (also called\n\u201cmean square error\u201d), given by\n\nand L1 loss (also called \u201cabsolute loss\u201d), given by\n\n`t(z) := (z bt)2 ,\n\n`t(z) := |z bt|\n\n,\n\nwhere bt 2 [0, 1]. The parameter bt may be an approximation of Tr(Et\u21e2) for some \ufb01xed quantum\nstate \u21e2 not known to the learner, obtained by measuring multiple copies of \u21e2. However, in general,\nthe parameter is allowed to be arbitrary.\nThe learner then \u201cobserves\u201d feedback from the measurement Et; the feedback is also provided by the\nadversary. The simplest feedback is the realization of a binary random variable Yt such that\n\nYt =\u21e2 1 with probability Tr(Et\u21e2) ,\n0 with probability 1 Tr(Et\u21e2) .\n\nand\n\nAnother common feedback is a number bt as described above, especially in case that the learner\nsuffers L1 or L2 loss.\nWe would like to design a strategy for updating !t based on the loss, measurements, and feedback in\nall the iterations so far, so that the learner\u2019s total loss is minimized in the following sense. We would\nlike that over T iterations (for a number T known in advance), the learner\u2019s total loss is not much\nmore than that of the hypothetical strategy of outputting the same quantum state ' at every time step,\nwhere ' minimizes the total loss with perfect hindsight. Formally this is captured by the notion of\nregret RT , de\ufb01ned as\n\nRT :=\n\n`t(Tr(Et!t)) min\n'2Cn\n\nTXt=1\n\n`t(Tr(Et')) .\n\nTXt=1\n\nThe sequence of measurements Et can be arbitrary, even adversarial, based on the learner\u2019s previous\nactions. Note that if the loss function is given by a \ufb01xed state \u21e2 (as in the case of mean square error),\nthe minimum total loss would be 0. This is called the \u201crealizable\u201d case. However, in general, the loss\nfunction presented by the adversary need not be consistent with any quantum state. This is called the\n\u201cnon-realizable\u201d case.\nA special case of the online learning setting is called agnostic learning; here the measurements Et\nare drawn from a \ufb01xed and unknown distribution D. The setting is called \u201cagnostic\u201d because we still\ndo not assume that the losses correspond to any actual state \u21e2 (i.e., the setting may be non-realizable).\n\nOnline mistake bounds.\nIn some online learning scenarios the quantity of interest is not the mean\nsquare error, or some other convex loss, but rather simply the total number of \u201cmistakes\u201d made. For\nexample, we may be interested in the number of iterations in which the predicted probability of\n\n5\n\n\facceptance Tr(Et!t) is more than \"-far from the actual value Tr(Et\u21e2), where \u21e2 is again a \ufb01xed state\nnot known to the learner. More formally, let\n\n`t(Tr(Et!t)) := | Tr(Et!t) Tr(Et\u21e2)|\n\nbe the absolute loss function. Then the goal is to bound the number of iterations in which\n`t(Tr(Et!t)) >\" , regardless of the sequence of measurements Et presented by the adversary.\nWe assume that in this setting,the adversary provides as feedback an approximation bt 2 [0, 1] that\nsatis\ufb01es | Tr(Et\u21e2) bt|\uf8ff \"\n3.\n3 Online learning of quantum states\n\nIn this section, we use techniques from online convex optimization to minimize regret. The same\nalgorithms may be adapted to also minimize the number of mistakes made.\n\n3.1 Regularized Follow-the-Leader\n\nWe \ufb01rst follow the template of the Regularized Follow-the-Leader algorithm (RFTL; see, for example,\n[Hazan, 2015, Chapter 5]). The algorithm below makes use of von Neumann entropy, which relates\nto the Matrix Exponentiated Gradient algorithm (Tsuda et al. [2005]).\n\nAlgorithm 1 RFTL for Quantum Tomography\n1: Input: T , K := Cn, \u2318< 1\n2: Set !1 := 2n .\n3: for t = 1, . . . , T do\n4:\n\n2\n\nPredict !t. Consider the convex and L-Lipschitz loss function `t : R ! R given by measure-\nment Et : `t(Tr(Et')). Let `0t(x) be a sub-derivative of `t with respect to x. De\ufb01ne\n\n5:\n\nUpdate decision according to the RFTL rule with von Neumann entropy:\n\nrt := `0t(Tr(Et!t))Et .\ntXs=1\n\nTr(rs') +\n\n2nXi=1\n\n'2K (\u2318\n\n!t+1 := arg min\n\ni(') log i(')) .\n\n(1)\n\n6: end for\n\nRemark 1: The mathematical program in Eq. (1) is convex, and thus can be solved in polynomial\ntime in the dimension, which is 2n.\n\nTheorem 3. Setting \u2318 =q (log 2)n\n, the regret of Algorithm 1 is bounded by 2Lp(2 log 2)T n .\nRemark 2:\nIn the case where the feedback is an independent random variable Yt, where Yt = 0\nwith probability 1 Tr(Et\u21e2) and Yt = 1 with probability Tr(Et\u21e2) for a \ufb01xed but unknown state \u21e2,\nwe de\ufb01ne rt in Algorithm 1 as rt := 2(Tr(Et!t) Yt)Et. Then E[rt] is the gradient of the L2\nloss function where we receive precise feedback Tr(Et\u21e2) instead of Yt. It follows from the proof of\nTheorem 3 that the expected L2 regret of Algorithm 1, namely\n\n2T L2\n\nE\" TXt=1\n\n(Tr(Et!t) Tr(Et\u21e2))2# ,\n\nis bounded by O(pT n ).\nThe proof of Theorem 3 appears in Appendix B. The proof is along the lines of [Hazan, 2015,\nTheorem 5.2], except that the loss function does not take a raw state as input, and our domain for\noptimization is complex. Therefore, the mean value theorem does not hold, which means we need\nto approximate the Bregman divergence instead of replacing it by a norm as in the original proof.\nAnother subtlety is that convexity needs to be carefully de\ufb01ned with respect to the complex domain.\n\n6\n\n\f3.2 Matrix Multiplicative Weights\n\nThe Matrix Multiplicative Weights (MMW) algorithm [Arora and Kale, 2016] provides an alternative\nmeans of proving Theorem 2. The algorithm follows the template of Algorithm 1 with step 5 replaced\nby the following update rule:\n\n!t+1 :=\n\n.\n\n(2)\n\nLPt\nexp( \u2318\nLPt\nTr(exp( \u2318\n\n\u2327 =1 r\u2327 )\n\u2327 =1 r\u2327 ))\n\nIn the notation of Arora and Kale [2016], this algorithm is derived using the loss matrices Mt =\nL `0t(Tr(Et!t))Et. Since kEtk \uf8ff 1 and |`0t(Tr(Et!t))|\uf8ff L, we have kMtk \uf8ff 1, as\nLrt = 1\n1\nrequred in the analysis of the Matrix Multiplicative Weights algorithm. We have the following regret\nbound for the algorithm (proved in Appendix C):\nTheorem 4. Setting \u2318 = q (log 2)n\nbounded by 2Lp(log 2)T n.\n\n, the regret of the algorithm based on the update rule (2) is\n\n4T\n\n3.3 Proof of Theorem 1\n\nConsider either the RFTL or MMW based online learning algorithm described in the previous\nsubsections, with the 1-Lipschitz convex absolute loss function `t(x) = |x bt|. We run the\nalgorithm in a sub-sequence of the iterations, using only the measurements presented in those\niterations. The subsequence of iterations is determined as follows. Let !t denote the hypothesis\nmaintained by the algorithm in iteration t. We run the algorithm in iteration t if `t(Tr(Et!t)) > 2\"\n3 .\nNote that whenever | Tr(Et!t) Tr(Et\u21e2)| >\" , we have `t(Tr(Et!t)) > 2\"\n3 , so we update the\nhypothesis according to the RFTL/MMW rule in that iteration.\nAs we explain next, the algorithm makes at most O( n\n\"2 ) updates regardless of the number of measure-\nments presented (i.e., regardless of the number of iterations), giving the required mistake bound. For\nthe true quantum state \u21e2, we have `t(Tr(Et\u21e2)) < \"\n3 for all t. Thus if the algorithm makes T updates\n3 T +O(pT n ).\n(i.e., we run the algorithm in T of the iterations), the regret bound implies that 2\"\nSimplifying, we get the bound T = O( n\n\n3 T \uf8ff \"\n\n\"2 ), as required.\n\n4 Learning Using Postselection\n\nIn this section, we give a direct route to proving a slightly weaker version of Theorem 1: one that\ndoes not need the tools of convex optimization, but only tools intrinsic to quantum information.\nIn the following, by a \u201cregister\u201d we mean a designated sequence of qubits. Given a two-outcome\nmeasurement E on n-qubits states, we de\ufb01ne an operator M that \u201cpostselects\u201d on acceptance\nby E. (While a measurement results in a random outcome distributed according to the probability\nof acceptance or rejection, postselection is a hypothetical operation that produces an outcome of\none\u2019s choice with certainty.) Let U be any unitary operation on n + 1 qubits that maps states of\n\nthe form | i|0i to pE | i|0i + p E | i|1i. Such a unitary operation always exists (see, e.g.,\n[Watrous, 2018, Theorem 2.42]). Denote the (n + 1)th qubit by register B. Let \u21e7 := \u2326| 0ih0| be\nthe orthogonal projection onto states that equal |0i in register B. Then we de\ufb01ne the operator M as\n(3)\n\n1\n\nM(') :=\n\nTr(E')\n\nTrBU1\u21e7U (' \u2326| 0ih0|) U1\u21e7U ,\n\nif Tr(E') 6= 0, and M(') := 0 otherwise. Here, TrB is the partial trace operator over qubit B [Wa-\ntrous, 2018, Section 1.1]. This operator M has the effect of mapping the quantum state ' to the\n(normalized) post-measurement state when we perform the measurement E and get outcome \u201cyes\u201d\n(i.e., the measurement \u201caccepts\u201d). We emphasize that we use a fresh ancilla qubit initialized to\nstate |0i as register B in every application of the operator M. We say that the postselection succeeds\nwith probability Tr(E').\nWe need a slight variant of a well-known result, which Aaronson called the \u201cQuantum Union Bound\u201d\n(see, for example, Aaronson [2006, 2016], Wilde [2013]).\n\n7\n\n\fTheorem 5 (variant of Quantum Union Bound; Gao [2015]). Suppose we have a sequence of two-\noutcome measurements E1, . . . , Ek, such that each Ei accepts a certain mixed state ' with probability\nat least 1\". Consider the corresponding operators M1,M2, . . . ,Mk that postselect on acceptance\nby the respective measurements E1, E2, . . . , Ek. Let e' denote the state (MkMk1 \u00b7\u00b7\u00b7M1)(')\nobtained by applying each of the k postselection operations in succession. Then the probability that\nall the postselection operations succeed, i.e., the k measurements all accept ', is at least 1 2pk\".\nMoreover, ke' 'kTr \uf8ff 4pk\".\n\nWe may infer the above theorem by applying Theorem 1 from (Gao [2015]) to the state ' augmented\nwith k ancillary qubits B1, B2, . . . , Bk initialized to 0, and considering k orthogonal projection\noperators U1\ni \u21e7iUi, where the unitary operator Ui and the projection operator \u21e7i are as de\ufb01ned for\nthe postselection operation Mi for Ei. The ith projection operator U1\ni \u21e7iUi acts on the registers\nholding ' and the ith ancillary qubit Bi.\nWe prove the main result of this section using suitably de\ufb01ned postselection operators in an online\nlearning algorithm (proof in Appendix D):\nTheorem 6. Let \u21e2 be an unknown n-qubit mixed state, let E1, E2, . . . be a sequence of two-outcome\nmeasurements, and let \"> 0. There exists a strategy for outputting hypothesis states !0,! 1, . . .,\nwhere !t depends only on E1, . . . , Et and real numbers b1, . . . , bt in [0, 1], such that as long as\n|bt Tr(Et\u21e2)|\uf8ff \"/3 for every t, we have\nfor at most O n\n\n\" values of t. Here the Et\u2019s and bt\u2019s can otherwise be chosen adversarially.\n\n|Tr(Et+1!t) Tr(Et+1\u21e2)| >\"\n\n\"3 log n\n\n5 Learning Using Sequential Fat-Shattering Dimension\n\nIn this section, we prove regret bounds using the notion of sequential fat-shattering dimension. Let\nS be a set of functions f : U ! [0, 1], and \"> 0. Then, following Rakhlin et al. [2015], let the\n\"-sequential fat-shattering dimension of S, or sfat\"(S), be the largest k for which we can construct a\ncomplete binary tree T of depth k, such that\n\n\u2022 each internal vertex v 2 T has associated with it a point xv 2 U and a real av 2 [0, 1], and\n\u2022 for each leaf vertex v 2 T there exists an f 2 S that causes us to reach v if we traverse T\nfrom the root such that at any internal node w we traverse the left subtree if f (xw) \uf8ff aw \"\nand the right subtree if f (xw) aw +\". If we view the leaf v as a k-bit string, the function f\nis such that for all ancestors u of v, we have f (xu) \uf8ff au \" if vi = 0, and f (xu) au + \"\nif vi = 1, when u is at depth i 1 from the root.\n\nAn n-qubit state \u21e2 induces a function f on the set of two-outcome measurements E de\ufb01ned as f (E) :=\nTr(E\u21e2). With this correspondence in mind, we establish a bound on the sequential fat-shattering\ndimension of the set of n-qubit quantum states. The bound is based on a generalization of \u201crandom\naccess coding\u201d (Nayak [1999], Ambainis et al. [2002]) called \u201cserial encoding\u201d. We derive the\nfollowing bound on the length of serial encoding. Let H(x) := x log2 x (1 x) log2(1 x) be\nthe binary entropy function.\nCorollary 7. Let k and n be positive integers. For each k-bit string y := y1 \u00b7\u00b7\u00b7 yk, let \u21e2y be an n-\nqubit mixed state such that for each i 2{ 1, 2, . . . , k}, there is a two-outcome measurement E0 that\ndepends only on i and the pre\ufb01x v := y1y2 \u00b7\u00b7\u00b7 yi1, and has the following properties\n\n(iii) if yi = 0 then Tr(E0\u21e2y) \uf8ff av \", and\n(iv) if yi = 1 then Tr(E0\u21e2y) av + \",\n\nwhere \" 2 (0, 1/2] and av 2 [0, 1] is a \u201cpivot point\u201d associated with the pre\ufb01x v. Then\n\nn \u27131 H\u2713 1 \"\n\n2 \u25c6\u25c6 k .\n\nIn particular, k = On/\"2.\n\n8\n\n\f(The proof is presented in Appendix E).\nCorollary 7 immediately implies the following theorem:\nTheorem 8. Let U be the set of two-outcome measurements E on an n-qubit state, and let S be the\nset of all functions f : U ! [0, 1] that have the form f (E) := Tr(E\u21e2) for some \u21e2. Then for all\n\"> 0, we have sfat\"(S) = On/\"2.\n\nTheorem 8 strengthens an earlier result due to Aaronson [2007], which proved the same upper\nbound for the \u201cordinary\u201d (non-sequential) fat-shattering dimension of quantum states considered as a\nhypothesis class.\nNow we may use existing results from the literature, which relate sequential fat-shattering dimension\nto online learnability. In particular, in the non-realizable case, Rakhlin et al. [2015] recently showed\nthe following:\nTheorem 9 (Rakhlin et al. [2015]). Let S be a set of functions f : U ! [0, 1] and for every\ninteger t 1, let `t : [0, 1] ! R be a convex, L-Lipschitz loss function. Suppose we are sequentially\npresented elements x1, x2, . . . 2 U, with each xt followed by the loss function `t. Then there exists a\nlearning strategy that lets us output a sequence of hypotheses f1, f2, . . . 2 S, such that the regret is\nupper-bounded as:\nTXt=1\n\n`t (ft(xt)) \uf8ff min\nf2S\n\n\u21b5 (4\u21b5 +\n\n`t (f (xt)) + 2LT inf\n\n12\n\npT Z 1\n\n\u21b5 ssfat(S) log\u2713 2eT\n\n \u25c6d) .\n\nTXt=1\n\nThis follows from Theorem 8 in (Rakhlin et al. [2015]) as in the proof of Proposition 9 in the same\narticle.\nCombining Theorem 8 with Theorem 9 gives us the following:\nCorollary 10. Suppose we are presented with a sequence of two-outcome measurements E1, E2, . . .\nof an n-qubit state, with each Et followed by a loss function `t as in Theorem 9. Then there exists a\nlearning strategy that lets us output a sequence of hypothesis states !1,! 2, . . . such that the regret\nafter the \ufb01rst T iterations is upper-bounded as:\n\nTXt=1\n\n`t (Tr(Et!t)) \uf8ff min\n!2Cn\n\nTXt=1\n\n`t (Tr(Et!)) + O\u21e3LpnT log3/2 T\u2318 .\n\nNote that the result due to Rakhlin et al. [2015] is non-explicit. In other words, by following this\napproach, we do not derive any speci\ufb01c online learning algorithm for quantum states that has the\nstated upper bound on regret; we only prove non-constructively that such an algorithm exists.\nWe expect that the approach in this section, based on sequential fat-shattering dimension, could also\nbe used to prove a mistake bound for the realizable case, but we leave that to future work.\n\n6 Open Problems\n\nWe conclude with some questions arising from this work. The regret bound established in Theorem 2\nfor L1 loss is tight. Can we similarly achieve optimal regret for other loss functions of interest, for\nexample for L2-loss? It would also be interesting to obtain regret bounds in terms of the loss of the\nbest quantum state in hindsight, as opposed to T (the number of iterations), using the techniques in\nthis article. Such a bound has been shown by [Tsuda et al., 2005, Lemma 3.2] for L2-loss using the\nMatrix Exponentiated Gradient method.\nIn what cases can one do online learning of quantum states, not only with few samples, but also with\na polynomial amount of computation? What is the tight generalization of our results to measurements\nwith d outcomes? Is it the case, in online learning of quantum states, that any algorithm works, so\nlong as it produces hypothesis states that are approximately consistent with all the data seen so far?\nNote that none of our three proof techniques seem to imply this general conclusion.\n\nReferences\nS. Aaronson. Limitations of quantum advice and one-way communication. Theory of Computing, 1:\n\n1\u201328, 2005. Earlier version in CCC\u20192004. quant-ph/0402095.\n\n9\n\n\fS. Aaronson. QMA/qpoly is contained in PSPACE/poly: de-Merlinizing quantum protocols. In Proc.\n\nConference on Computational Complexity, pages 261\u2013273, 2006. quant-ph/0510230.\n\nS. Aaronson. The learnability of quantum states. Proc. Roy. Soc. London, A463(2088):3089\u20133114,\n\n2007. quant-ph/0608142.\n\nS. Aaronson. The complexity of quantum states and transformations: From quantum money to\nblack holes, February 2016. Lecture Notes for the 28th McGill Invitational Workshop on Compu-\ntational Complexity, Holetown, Barbados. With guest lectures by A. Bouland and L. Schaeffer.\nwww.scottaaronson.com/barbados-2016.pdf.\n\nS. Aaronson. Shadow tomography of quantum states. In Proc. ACM STOC, STOC 2018, pages\n\n325\u2013338, New York, NY, USA, 2018. ACM. ISBN 978-1-4503-5559-9. arXiv:1711.01053.\n\nS. Aaronson and A. Drucker. A full characterization of quantum advice. SIAM J. Comput., 43(3):\n\n1131\u20131183, 2014. Earlier version in STOC\u20192010. arXiv:1004.0377.\n\nA. Ambainis, A. Nayak, A. Ta-Shma, and U. V. Vazirani. Quantum dense coding and quantum \ufb01nite\nautomata. J. of the ACM, 49:496\u2013511, 2002. Combination of an earlier version in STOC\u20191999, pp.\n376-383, arXiv:quant-ph/9804043 and (Nayak [1999]).\n\nS. Arora and S. Kale. A combinatorial, primal-dual approach to semide\ufb01nite programs. J. ACM, 63\n\n(2):12:1\u201312:35, 2016.\n\nS. Arora, E. Hazan, and S. Kale. The multiplicative weights update method: a meta-algorithm and\n\napplications. Theory of Computing, 8(1):121\u2013164, 2012.\n\nK. M. R. Audenaert and J. Eisert. Continuity bounds on the quantum relative entropy. Journal of\n\nMathematical Physics, 46(10):102104, 2005. arXiv:quant-ph/0503218.\n\nC. B\u02d8adescu, R. O\u2019Donnell, and J. Wright. Quantum state certi\ufb01cation. Technical Report\n\narXiv:1708.06002 [quant-ph], arXiv.org, 2017.\n\nRajendra Bhatia. Matrix Analysis, volume 169 of Graduate Texts in Mathematics. Springer-Verlag,\n\nNew York, 1997.\n\nE. A. Carlen and E. H. Lieb. Remainder terms for some quantum entropy inequalities. Journal of\n\nMathematical Physics, 55(4), 2014. arXiv:1402.3840.\n\nJ. Gao. Quantum union bounds for sequential projective measurements. Phys. Rev. A, 92:052331,\n\nNov 2015.\n\nJ. Haah, A. W. Harrow, Z. Ji, X. Wu, and N. Yu. Sample-optimal tomography of quantum states.\n\nIEEE Transactions on Information Theory, 63(9):5628\u20135641, Sept 2017.\n\nE. Hazan. Introduction to Online Convex Optimization, volume 2 of Foundations and Trends in\n\nOptimization. 2015.\n\nA. Nayak. Optimal lower bounds for quantum automata and random access codes. In Proc. IEEE\n\nFOCS, pages 369\u2013376, 1999. quant-ph/9904093.\n\nR. O\u2019Donnell and J. Wright. Ef\ufb01cient quantum tomography. In Proceedings of the Forty-eighth\nAnnual ACM Symposium on Theory of Computing, STOC \u201916, pages 899\u2013912, New York, NY,\nUSA, 2016. ACM.\n\nA. Rakhlin, K. Sridharan, and A. Tewari. Online learning via sequential complexities. The Journal of\n\nMachine Learning Research, 16(1):155\u2013186, 2015.\n\nA. Rocchetto. Stabiliser states are ef\ufb01ciently PAC-learnable. arXiv:1705.00345, 2017.\nA. Rocchetto, S. Aaronson, S. Severini, G. Carvacho, D. Poderini, I. Agresti, M. Bentivegna, and\n\nF. Sciarrino. Experimental learning of quantum states. arXiv:1712.00127, 2017.\n\nK. Tsuda, G. R\u00e4tsch, and M. K. Warmuth. Matrix exponentiated gradient updates for on-line learning\n\nand Bregman projection. Journal of Machine Learning Research, 6:995\u20131018, 2005.\n\n10\n\n\fJ. Watrous. Theory of Quantum Information. Cambridge University Press, May 2018.\nM. Wilde. Sequential decoding of a general classical-quantum channel. Proc. Roy. Soc. London,\n\nA469(2157):20130259, 2013. arXiv:1303.0808.\n\n11\n\n\f", "award": [], "sourceid": 5367, "authors": [{"given_name": "Scott", "family_name": "Aaronson", "institution": "UT Austin"}, {"given_name": "Xinyi", "family_name": "Chen", "institution": "Google Brain"}, {"given_name": "Elad", "family_name": "Hazan", "institution": "Princeton University"}, {"given_name": "Satyen", "family_name": "Kale", "institution": "Google"}, {"given_name": "Ashwin", "family_name": "Nayak", "institution": "University of Waterloo"}]}