{"title": "Quantum Wasserstein Generative Adversarial Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 6781, "page_last": 6792, "abstract": "The study of quantum generative models is well-motivated, not only because of its importance in quantum machine learning and quantum chemistry but also because of the perspective of its implementation on near-term quantum machines. Inspired by previous studies on the adversarial training of classical and quantum generative models, we propose the first design of quantum Wasserstein Generative Adversarial Networks (WGANs), which has been shown to improve the robustness and the scalability of the adversarial training of quantum generative models even on noisy quantum hardware. Specifically, we propose a definition of the Wasserstein semimetric between quantum data, which inherits a few key theoretical merits of its classical counterpart. We also demonstrate how to turn the quantum Wasserstein semimetric into a concrete design of quantum WGANs that can be efficiently implemented on quantum machines. Our numerical study, via classical simulation of quantum systems, shows the more robust and scalable numerical performance of our quantum WGANs over other quantum GAN proposals. As a surprising application, our quantum WGAN has been used to generate a 3-qubit quantum circuit of ~50 gates that well approximates a 3-qubit 1-d Hamiltonian simulation circuit that requires over 10k gates using standard techniques.", "full_text": "Quantum Wasserstein GANs\n\nShouvanik Chakrabarti1,2,4,\u21e4, Yiming Huang3,1,5,\u21e4, Tongyang Li1,2,4\n\nSoheil Feizi2,4, Xiaodi Wu1,2,4\n\n1 Joint Center for Quantum Information and Computer Science, University of Maryland\n\n2 Department of Computer Science, University of Maryland\n\n3 School of Information and Software Engineering\n\nUniversity of Electronic Science and Technology of China\n\n4 {shouv,tongyang,sfeizi,xwu}@cs.umd.edu\n\n5 yiminghwang@gmail.com\n\nAbstract\n\nThe study of quantum generative models is well motivated, not only because\nof its importance in quantum machine learning and quantum chemistry but also\nbecause of the perspective of its implementation on near-term quantum machines.\nInspired by previous studies on the adversarial training of classical and quantum\ngenerative models, we propose the \ufb01rst design of quantum Wasserstein Generative\nAdversarial Networks (WGANs), which has been shown to improve the robustness\nand the scalability of the adversarial training of quantum generative models even on\nnoisy quantum hardware. Speci\ufb01cally, we propose a de\ufb01nition of the Wasserstein\nsemimetric between quantum data, which inherits a few key theoretical merits of\nits classical counterpart. We also demonstrate how to turn the quantum Wasserstein\nsemimetric into a concrete design of quantum WGANs that can be ef\ufb01ciently\nimplemented on quantum machines. Our numerical study, via classical simulation\nof quantum systems, shows the more robust and scalable numerical performance\nof our quantum WGANs over other quantum GAN proposals. As a surprising\napplication, our quantum WGAN has been used to generate a 3-qubit quantum\ncircuit of \u21e050 gates that well approximates a 3-qubit 1-d Hamiltonian simulation\ncircuit that requires over 10k gates using standard techniques.\n\nIntroduction\n\n1\nGenerative adversarial networks (GANs) [19] represent a power tool of training deep generative\nmodels, which have a profound impact on machine learning. In GANs, a generator tries to generate\nfake samples resembling the true data, while a discriminator tries to discriminate between the true and\nthe fake data. The learning process for generator and discriminator can be deemed as an adversarial\ngame that converges to some equilibrium point under reasonable assumptions.\nInspired by the success of GANs and classical generative models, developing their quantum coun-\nterparts is a natural and important topic in the emerging \ufb01eld of quantum machine learning [5, 37].\nThere are at least two appealing reasons for which quantum GANs are extremely interesting. First,\nquantum GANs could provide potential quantum speedups due to the fact that quantum generators\nand discriminators (i.e., parameterized quantum circuits) cannot be ef\ufb01ciently simulated by classical\ngenerators/discriminators. In other words, there might exist distributions that can be ef\ufb01ciently\ngenerated by quantum GANs, while otherwise impossible with classical GANs. Second, simple\nprototypes of quantum GANs (i.e., executing simple parameterized quantum circuits), similar to\nthose of the variational methods (e.g., [16, 27, 30]), are likely to be implementable on near-term\nnoisy-intermediate-scale-quantum (NISQ) machines [33]. Since the seminal work of [25], there are\n\n\u21e4Equal contribution.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fquite a few proposals (e.g, [4, 13, 23, 34, 39, 46, 49]) of constructions of quantum GANs on how\nto encode quantum or classical data into this framework. Furthermore, [23, 49] also demonstrated\nproof-of-principle implementations of small-scale quantum GANs on actual quantum machines.\nA lot of existing quantum GANs focus on using quantum generators to generate classical distributions.\nFor truly quantum applications such as investigation of quantum systems in condensed matter physics\nor quantum chemistry, the ability to generate quantum data is also important. In contrast to the case\nof classical distributions, where the loss function measuring the difference between the real and the\nfake distributions can be borrowed directly from the classical GANs, the design of the loss function\nbetween real and fake quantum data as well as the ef\ufb01cient training of the corresponding GAN is\nmuch more challenging. The only existing results on quantum data either have a unique design\nspeci\ufb01c to the 1-qubit case [13, 23], or suffer from robust training issues discussed below [4].\nMore importantly, classical GANs are well known for being delicate and somewhat unstable in\ntraining. In particular, it is known [1] that the choice of the metric between real and fake distributions\nwill be critical for the stability of the performance in the training. A few widely used ones such as the\nKullback-Leibler (KL) divergence, the Jensen-Shannon (JS) divergence, and the total variation (or\nstatistical) distance are not sensible for learning distributions supported by low dimensional generative\nmodels. The shortcoming of these metrics will likely carry through to their quantum counterparts\nand hence quantum GANs based on these metrics will likely suffer from the same weaknesses in\ntraining. This training issue was not signi\ufb01cant in the existing numerical study of quantum GANs in\nthe 1-qubit case [13, 23]. However, as observed by [4] and us, the training issue becomes much more\nsigni\ufb01cant when the quantum system scales up, even just in the case of a few qubits.\nTo tackle the training issue of classical GANs, a lot of research has been conducted on the convergence\nof training GANs in classical machine learning. A seminal work [1] used Wasserstein distance (or,\noptimal transport distance) [43] as a metric for measuring the distance between real and fake\ndistributions. Comparing to other measures (such as KL and JS), Wasserstein distance is more\nappealing from optimization perspective because of its continuity and smoothness. As a result, the\ncorresponding Wasserstein GAN (WGAN) is promising for improving the training stability of GANs.\nThere are a lot of subsequent studies on various modi\ufb01cations of the WGAN, such as GAN with\nregularized Wasserstein distance [35], WGAN with entropic regularizers [12, 38], WGAN with\ngradient penalty [20, 31], relaxed WGAN [21], etc. It is known [26] that WGAN and its variants such\nas [20] have demonstrated improved training stability compared to the original GAN formulation.\nContributions.\nInspired by the success of classical Wasserstein GANs and the need of smooth,\nrobust, and scalable training methods for quantum GANs on quantum data, we propose the \ufb01rst design\nof quantum Wasserstein GANs (qWGANs). To this end, our technical contributions are multi-folded.\nIn Section 3, we propose a quantum counterpart of the Wasserstein distance, denoted by qW(P, Q)\nbetween quantum data P and Q, inspired by [1, 43]. We prove that qW(\u00b7,\u00b7) is a semi-metric\n(i.e., a metric without the triangle inequality) over quantum data and inherits nice properties such\nas continuity and smoothness of the classical Wasserstein distance. We will discuss a few other\nproposals of quantum Wasserstein distances such as [6, 8\u201310, 18, 29, 32, 45] and in particular why\nmost of them are not suitable for the purpose of generating quantum data in GANs. We will also\ndiscuss the limitation of our proposal of quantum Wasserstein semi-metric and hope its successful\napplication in quantum GANs could provide another perspective and motivation to study this topic.\nIn Section 4, we show how to add the quantum entropic regularization to qW(\u00b7,\u00b7) to further smoothen\nthe loss function in the spirit of the classical case (e.g., [35]). We then show the construction of our\nregularized quantum Wasserstein GAN (qWGAN) in Figure 1 and describe the con\ufb01guration and\nthe parameterization of both the generator and the discriminator. Most importantly, we show that\nthe evaluation of the loss function and the evaluation of the gradient of the loss function can be in\nprinciple ef\ufb01ciently implemented on quantum machines. This enables direct applications of classical\ntraining methods of GANs, such as alternating gradient-based optimization, to the quantum setting. It\nis a wide belief that classical computation cannot ef\ufb01ciently simulate quantum machines, in our case,\nthe evaluation of the loss function and its gradient. Hence, the ability of evaluating them ef\ufb01ciently\non quantum machines is critical for its scalability.\nIn Section 5, we supplement our theoretical results with experimental validations via classical\nsimulation of qWGAN. Speci\ufb01cally, we demonstrate numerical performance of our qWGAN for\nquantum systems up to 8 qubits for pure states and up to 3 qubits for mixed states (i.e., mixture of\npure states). Comparing to existing results [4, 13, 23], our numerical performance is more favorable\nin both the system size and its numerical stability. To give a rough sense, a single step in the classical\n\n2\n\n\fsimulation of the 8-qubit system involves multiple multiplications of 28 \u21e5 28 matrices. Learning\na mixed state is much harder than learning pure states (a reasonable classical analogue of their\ndifference is the one between learning a Gaussian distribution and learning a mixture of Gaussian\ndistributions [2]). We present the only result for learning a true mixed state up to 3 qubits.\nFurthermore, following a speci\ufb01c 4-qubit generator that is recently implemented on an ion-trap\nquantum machine [48] and a reasonable noise model on the same machine [47], we simulate the\nperformance of our qWGAN with noisy quantum operations. Our result suggests that qWGAN\ncan tolerant a reasonable amount of noise in quantum systems and still converge. This shows the\npossibility of implementing qWGAN on near-term (NISQ) machines [33].\nFinally, we demonstrate a real-world application of qWGAN to approximate useful quantum ap-\nplication with large circuits by small ones. qWGAN can be used to approximate a potentially\ncomplicated unknown quantum state by a simple one when using a reasonably simple generator. We\nleverage this property and the Choi-Jamio\u0142kowski isomorphism [28] between quantum operations\nand quantum states to generate a simple state that approximates another Choi-Jamio\u0142kowski state\ncorresponding to potentially complicated circuits in real quantum applications. The closeness in two\nChoi-Jamio\u0142kowski states of quantum circuits will translate to the average output closeness between\ntwo quantum circuits over random input states. Speci\ufb01cally, we show that the quantum Hamiltonian\nsimulation circuit for 1-d 3-qubit Heisenberg model in [11] can be approximated by a circuit of 52\ngates with an average output \ufb01delity over 0.9999 and a worst-case error 0.15. The circuit based on\nthe second order product formula will need \u21e011900 gates, however, with a worst-case error 0.001.\nRelated results. All existing quantum GANs [4, 13, 23, 25, 34, 39, 46, 49], no matter dealing with\nclassical or quantum data, have not investigated the possibility of using the Wasserstein distance. The\nmost relevant works to ours are [4, 13, 23] with speci\ufb01c GANs dealing with quantum data. As we\ndiscussed above, [13, 23] only discussed the 1-qubit case (both pure and mixed) and [4] discussed the\npure state case (up to 6 qubits) but with the loss function being the quantum counterpart of the total\nvariation distance. Moreover, the mixed state case in [13] is a labeled one: in addition to observing\ntheir mixture, one also gets a label of which pure state it is sampled from.\n\n~e0\n\nQ\n\n~e0\n\n{(pi, Ui)}\n\n{(pi, Ui)}\n\nQ\n\n\n\n \n\n\u21e0R\n\nL\n\n(1) {(pi, Ui)} refers to the generator with initial state\n~e0 and its parameterization; (2) , , \u21e0R refers to the\ndiscriminator; (3) the \ufb01gure shows how to evaluate the\nloss function L by measuring , , \u21e0R on the gener-\nated state and the real state Q with post-processing.\n\nR1(\u27131)\n\nR2(\u27132)\n\nR3(\u27133)\n\nR4(\u27134)\n\nR5(\u27135)\n\nAn example of a parameterized 3-qubit quantum cir-\ncuit for Ui in the generator. Ri (\u2713i) = exp( 1\n2 \u2713ii)\ndenotes a Pauli rotation with angle \u2713i. It could be a\n1-qubit or 2-qubit gate depending on the speci\ufb01c Pauli\nmatrix i. The circuit consists of many such gates.\n\nFigure 1: The Architecture of Quantum Wasserstein GAN.\n\n2 Classical Wasserstein Distance & Wasserstein GANs\nLet us \ufb01rst review the de\ufb01nition of Wasserstein distance and how it is used in classical WGANs.\nWasserstein distance Consider two probability distributions p and q given by corresponding density\nfunctions p : X! R, q : Y! R. Given a cost function c : X\u21e5Y! R, the optimal transport cost\nbetween p and q, known as the Kantorovich\u2019s formulation [43], is de\ufb01ned as\n\ndc(p, q) := min\n\n\u21e1(x, y)c(x, y) dx dy\n\n(2.1)\n\nwhere \u21e7(p, q) is the set of joint distributions \u21e1 having marginal distributions p and q, i.e.,\n\nRY\n\n\u21e1(x, y) dy = p(x) andRX\n\n\u21e1(x, y) dx = q(y).\n\nWasserstein GAN The Wasserstein distance dc(p, q) can be used as an objective for learning a real\ndistribution q by a parameterized function G\u2713 that acts on a base distribution p. Then the objective\n\n\u21e12\u21e7(p,q)ZXZY\n\n3\n\n\fbecomes learning parameters \u2713 such that dc(G\u2713(p), q) is minimized as follows:\n\n\u21e12\u21e7(P,Q)ZXZY\n\nmin\n\nmin\n\n\u2713\n\n\u21e1(x, y)c(G\u2713(x), y) dx dy.\n\n(2.2)\n\nIn [1], Arjovsky et al. propose using the dual of (2.2) to formulate the original min-min problem into\na min-max problem, i.e., a generative adversarial network, with the following form:\n\n(2.3)\n\nmin\n\n\u2713\n\nmax\n\u21b5,\ns.t\n\nEx\u21e0P [\u21b5(x)] Ey\u21e0Q[ (y)],\n\u21b5(G\u2713(x)) (y) \uf8ff c(G\u2713(x), y), 8x, y,\n\n(2.4)\nwhere , are functions parameterized by \u21b5, respectively. This is advantageous because it is\nusually easier to parameterize functions rather than joint distributions. The constraint (2.4) is usually\nenforced by a regularizer term for actual implementation. Out of many choices of regularizers, the\nmost relevant one to ours is the entropy regularizer in [35]. In the case that c(x, y) = kx yk2 and\n = in (2.3), the constraint is that must be a 1-Lipschitz function. This is often enforced by the\ngradient penalty method in a neural network used to parameterize .\n3 Quantum Wasserstein Semimetric\nMathematical formulation of quantum data We refer curious readers to Supplemental Materials A\nfor a more comprehensive introduction. Any quantum data (or quantum states) over space X (e.g.,\nX = Cd) are mathematically described by a density operator \u21e2 that is a positive semide\ufb01nite matrix\n(i.e., \u21e2 \u232b 0) with trace one (i.e., Tr(\u21e2) = 1), and the set of which is denoted by D(X ).\nA quantum state \u21e2 is pure if rank(\u21e2) = 1; otherwise it is a mixed state. For a pure state \u21e2, it can be\nrepresented by the outer-product of a unit vector ~v2 Cd, i.e., \u21e2 = ~v ~v\u2020, where \u2020 refers to conjugate\ntranspose. We can also use ~vto directly represent pure states. Mixed states are a classical mixture of\npure states, e.g., \u21e2 =Pi pi ~vi ~vi\u2020 where pis form a classical distribution and ~vis are all unit vectors.\nQuantum states in a composed system of X and Y are represented by density operators \u21e2 over the\nKronecker-product space X\u2326Y with dimension dim(X ) dim(Y). 1-qubit systems refer to X = C2.\nA 2-qubit system has dimension 4 (X \u23262) and an n-qubit system has dimension 2n. The partial trace\noperation TrX (\u00b7) (resp. TrY (\u00b7)) is a linear mapping from \u21e2 to its marginal state on Y (resp. X ).\nFrom classical to quantum data Classical distributions p, q in (2.1) can be viewed as special mixed\nstates P2D (X ),Q2D (Y) where P,Q are diagonal and p, q (viewed as density vectors) take the\ndiagonals of P, Q respectively. Note that this is different from the conventional meaning of samples\nfrom classical distributions, which are random variables with the corresponding distributions.\nThis distinction is important to understand quantum data as the former (i.e., density operators) rather\nthan the latter (i.e., samples) actually represents the entity of quantum data. This is because there are\nmultiple ways (different quantum measurements) to read out classical samples out of quantum data\nfor one \ufb01xed density operator. Mathematically, this is because density operators in general can have\noff-diagonal terms and quantum measurements can happen along arbitrary bases.\nConsider X and Y from (2.1) being \ufb01nite sets. We can express the classical Wasserstein distance (2.1)\nas a special case of the matrix formulation of quantum data. Precisely, we can replace the integral in\n(2.1) by summation, which can be then expressed by the trace of \u21e7C where C is a diagonal matrix\nwith c(x, y) in the diagonal. \u21e1 is also a diagonal matrix expressing the coupling distribution \u21e1(x, y)\nof p, q. Namely, \u21e1\u2019s diagonal is \u21e1(x, y) and satis\ufb01es the coupling marginal condition TrY (\u21e1) = P\nand TrX (\u21e1) = Q where P, Q are diagonal matrices with the distribution of p, q in the diagonal,\nrespectively. As a result, the Kantorovich\u2019s optimal transport in (2.1) can be reformulated as\n(3.1)\n\ndc(p, q) := min\n\nTr(\u21e1C )\n\n\u21e1\n\ns.t. TrY (\u21e1) = diag{p(x)}, TrX (\u21e1) = diag{q(y)},\u21e1 2D (X\u2326Y ),\n\nwhere C = diag{c(x, y)}. Note that (3.1) is effectively a linear program.\nQuantum Wasserstein semimetric Our matrix reformulation of the classical Wasserstein distance\n(2.1) suggests a naive extension to the quantum setting as follows. Let qW(P,Q) denote the quantum\nWasserstein semimetric between P2D (X ),Q2D (Y), which is de\ufb01ned by\n(3.2)\n\nTr(\u21e1C )\n\nqW(P,Q) := min\n\n\u21e1\n\ns.t. TrY (\u21e1) = P, TrX (\u21e1) = Q,\u21e1 2D (X\u2326Y ),\n\n4\n\n\fwhere C is a matrix over X\u2326Y that should refer to some cost-type function. The choice of C is\nhence critical to make sense of the de\ufb01nition. First, matrix C needs to be Hermitian (i.e., C = C\u2020)\nto make sure that qW(\u00b7,\u00b7) is real. A natural attempt is to use C = diag{c(x, y)} from (3.1), which\nturns out to be signi\ufb01cantly wrong. This is because qW(~v ~v\u2020,~v~v \u2020) will be strictly greater than zero for\nrandom choice of unit vector ~vin that case. This demonstrates a crucial difference between classical\nand quantum data: while classical information is always stored in the diagonal (or computational\nbasis) of the space, quantum information can be stored off-diagonally (or in an arbitrary basis of the\nspace). Thus, choosing a diagonal C fails to detect the off-diagonal information in quantum data.\nOur proposal is to leverage the concept of symmetric subspace in quantum information [22] to make\nsure that qW(P, P ) = 0 for any P . The projection onto the symmetric subspace is de\ufb01ned by\n\n(3.3)\nwhere IX\u2326Y is the identity operator over X\u2326Y and SWAP is the operator such that SWAP(~x\u2326~y) =\n(~y\u2326 ~x),8~x2X ,~y 2Y .2 It is well known that \u21e7sym(~u\u2326 ~u) = ~u\u2326 ~ufor all unit vectors u. With this\nproperty and by choosing C to be the complement of \u21e7sym, i.e.,\n\n(IX\u2326Y + SWAP),\n\n\u21e7sym :=\n\n1\n2\n\nC := IX\u2326Y \u21e7sym =\n\n(IX\u2326Y SWAP),\n\n1\n2\n\n(3.4)\n\nwe can show qW(P, P ) = 0 for any P . This is achieved by choosing \u21e1 =Pi i(~vi ~vi\u2020 \u2326 ~vi ~vi\u2020)\ngiven P \u2019s spectral decomposition P =Pi i ~vi ~vi\u2020. Moreover, we can show\nTheorem 3.1 (Proof in Supplemental Materials B). qW(\u00b7,\u00b7) forms a semimetric over D(X ) over\nany space X , i.e., for any P,Q2D (X ),\n\n1. qW(P,Q) 0,\n2. qW(P,Q) = qW(Q,P),\n3. qW(P,Q) = 0 iff P = Q.\n\nEven though our de\ufb01nition of qW(\u00b7,\u00b7), especially the choice of C, does not directly come from a\ncost function c(x, y) over X and Y, it however still encodes some geometry of the space of quantum\nstates. For example, let P = ~v ~v\u2020 and Q = ~u ~u\u2020, qW(P, Q) becomes 0.5 (1 | ~u\u2020~v|2) where |~u\u2020~v|\ndepends on the angle between ~uand ~vwhich are unit vectors representing (pure) quantum states.\nThe dual form of qW(\u00b7,\u00b7) The formulation of qW(\u00b7,\u00b7) in (3.2) is given by a semide\ufb01nite program\n(SDP), opposed to the classical form in (3.1) given by a linear program. Its dual form is as follows.\n(3.5)\n\nmax\n, \ns.t.\n\nTr(Q ) Tr(P )\nIX \u2326 \u2326 IY C, 2H (X ), 2H (Y),\n\nwhere H(X ),H(Y) denote the set of Hermitian matrices over space X and Y. We further show the\nstrong duality for this SDP in Supplemental Materials B. Thus, both the primal (3.2) and the dual\n(3.5) can be used as the de\ufb01nition of qW(\u00b7,\u00b7).\nComparison with other quantum Wasserstein metrics There have been a few different proposals\nthat introduce matrices into the original de\ufb01nition of classical Wasserstein distance. We will compare\nthese de\ufb01nitions with ours and discuss whether they are appropriate in our context of quantum GANs.\nA few of these proposals (e.g., [7, 9, 10]) extend the dynamical formulation of Benamou and\nBrenier [3] in optimal transport to the matrix/quantum setting. In this formulation, couplings are\nde\ufb01ned not in terms of joint density measures, but in terms of smooth paths t ! \u21e2(x, t) in the\nspace of densities that satisfy some continuity equation with some time dependent vector \ufb01eld v(x, t)\ninspired by physics. A pair {\u21e2(\u00b7,\u00b7), v(\u00b7,\u00b7)} is said to couple P and Q, the set of which is denoted\nC(P, Q), if \u21e2(x, t) is a smooth path with \u21e2(\u00b7, 0) = P and \u21e2(\u00b7, 1) = Q. The 2-Wasserstein distance is\n(3.6)\n\nW2(P, Q) =\n\ninf\n\n1\n\n2Z 1\n0 ZRn |v(x, t)|2\u21e2(x, t) dx dt.\n\nThe above formulation seems dif\ufb01cult to manipulate in the context of GAN. It is unclear (a) whether\nthe above de\ufb01nition has a favorable duality to admit the adversarial training and (b) whether the\nphysics-inspired quantities like v(x, t) are suitable for the purpose of generating fake quantum data.\n2One needs that X is isometric to Y to well de\ufb01ne \u21e7sym. However, this is without loss of generality by\n\nchoosing appropriate and potentially larger spaces X and Y to describe quantum data.\n\n{\u21e2(\u00b7,\u00b7),v(\u00b7,\u00b7)}2C(P,Q)\n\n5\n\n\fA few other proposals (e.g., [29, 32]) introduce the matrix-valued mass de\ufb01ned by a function \u00b5 :\n\nF where the Frobenius norm (k\u00b7k F ) is directly used in the de\ufb01nition.\n\nX ! Cn\u21e5n over domain X, where \u00b5(x) is positive semide\ufb01nite and satis\ufb01es Tr(RX \u00b5(x)dx) = 1.\n\nInstead of considering transport probability masses from X to Y , one considers transporting a matrix-\nvalued mass \u00b50(x) on X to another matrix-valued mass \u00b51(y) on Y . One can similarly de\ufb01ne the\nKantorovich\u2019s coupling \u21e1(x, y) of \u00b50(x) and \u00b51(y), and de\ufb01ne the Wasserstein distance based on a\nslight different combination of \u21e1(x, y) and c(x, y) comparing to (2.1). This de\ufb01nition, however, fails\nto derive a new metric between two matrices. This is because the de\ufb01ned Wasserstein distance still\nmeasures the distance between X and Y based on some induced measure (k\u00b7k F ) on the dimension-n\nmatrix space. This is more evident when X = {P} and Y = {Q}. The Wasserstein distance reduces\nto c(x, y) + kP Qk2\nThe proposals in [6, 18] are very similar to us in the sense they de\ufb01ne the same coupling in the\nKantorovich\u2019s formulation as ours. However, their de\ufb01nition of the Wasserstein distance motivated by\nphysics is induced by unbounded operator applied on continuous space, e.g., rx, divx. This makes\ntheir de\ufb01nition only applicable to continuous space, rather than qubits in our setting.\nThe closest result to ours is [45], although the authors haven\u2019t proposed one concrete quantum\nWasserstein metric. Instead, they formulate a general form of reasonable quantum Wasserstein\nmetrics between \ufb01nite-dimensional quantum states and prove that Kantorovich-Rubinstein theorem\ndoes not hold under this general form. Namely, they show the trace distance between quantum states\ncannot be determined by any quantum Wasserstein metric out of their general form.\nLimitation of our qW(\u00b7,\u00b7) Although we have successfully implemented qWGAN based on our\nqW(\u00b7,\u00b7) and observed improved numerical performance, there are a few perspectives about qW(\u00b7,\u00b7)\nworth further investigation. First, numerical study reveals that qW(\u00b7,\u00b7) does not satisfy the triangle\ninequality. Second, our qW(\u00b7,\u00b7) does not come from an explicit cost function, even though it encodes\nsome geometry of the quantum state space. We conjecture that there could be a concrete underlying\ncost function and our qW(\u00b7,\u00b7) (or a related form) could be emerged as the 2-Wasserstein metric of\nthat cost function. We hope our work provides an important motivation to further study this topic.\n4 Quantum Wasserstein GAN\nWe describe the speci\ufb01c architecture of our qWGAN (Figure 1) and its training. Similar to (2.2) with\nthe fake state P from a parameterized quantum generator G, consider\n\nmin\nG\n\n\u21e1\n\nTr(\u21e1C )\n\nmin\ns.t. TrY (\u21e1) = P, TrX (\u21e1) = Q, \u21e1 2D (X\u2326Y ),\n\nor similar to (2.3) by taking the dual from (3.5),\n\nmin\nG\n\nmax\n, \ns.t.\n\nTr(Q ) Tr(P ) = EQ[ ] EP []\nIX \u2326 \u2326 IY C, 2H (X ), 2H (Y),\n\nwhere we abuse the notation of EQ[ ] := Tr(Q ), which refers to the expectation of the outcome of\nmeasuring Hermitian on quantum state Q. We hence refer , as the discriminator.\nRegularized Quantum Wasserstein GAN\nThe dual form (4.2) is inconvenient for optimizing directly due to the constraint IX \u2326 \u2326 IY C.\nInspired by the entropy regularizer in the classical setting (e.g., [35]), we add a quantum-relative-\nentropy-based regularizer between \u21e1 and P \u2326 Q with a tunable parameter to (4.1) to obtain\n\n(4.1)\n\n(4.2)\n\n(4.3)\n\n(4.4)\n\n(4.5)\n\nmin\nG\n\n\u21e1\n\nTr(\u21e1C ) + Tr(\u21e1 log(\u21e1) \u21e1 log(P \u2326 Q))\nmin\ns.t. TrY (\u21e1) = P, TrX (\u21e1) = Q, \u21e1 2D (X\u2326Y ).\n\nUsing duality and the Golden-Thomposon inequality [17, 40], we can approximate (4.3) by\n\nmin\nG\n\nmax\n, \n\nEQ[ ] EP [] EP\u2326Q[\u21e0R]\n\ns.t. 2H (X ), 2H (Y),\n\nwhere \u21e0R refers to the regularizing Hermitian\n\n\u21e0R =\n\n\ne\n\nexp\u2713C \u2326 IY + IX \u2326 \n\n\n\n\u25c6 .\n\nSimilar to [35], we prove that this entropic regularization ensures that the objective for the outer\nminimization problem (4.4) is differentiable in P . (Proofs are given in Supplemental Materials B.2.)\n\n6\n\n\fParameterization of the Generator and the Discriminator\nGenerator G is a quantum operation that generates P from a \ufb01xed initial state \u21e20 (e.g., the classical\nall-zero state ~e0). Speci\ufb01cally, generator G can be described by an ensemble {(p1, U1), . . . , (pr, Ur)}\nthat means applying the unitary Ui with probability pi. The distribution {p1, . . . , pr} can be parame-\nterized directly or through some classical generative network. The rank of the generated state is r\n(r = 1 for pure states and r > 1 for mixed states). Our experiments include the cases r = 1, 2.\nEach unitary Ui refers to a quantum circuit consisting of simple parameterized 1-qubit and 2-qubit\nPauli-rotation quantum gates (see the right of Figure 1). These Pauli gates can be implemented on\nnear-term machines (e.g., [48]) and also form a universal gate set for quantum computation. Hence,\nthis generator construction is widely used in existing quantum GANs. The jth gate in Ui contains an\nangle \u2713i,j as the parameter. All variables pi, \u2713i,j constitute the set of parameters for the generator.\nDiscriminator , can be parameterized at least in two ways. The \ufb01rst approach is to represent\n, as linear combinations of tensor products of Pauli matrices, which form a basis of the matrix\nspace (details on Pauli matrices and measurements can be found in Supplemental Materials A). Let\n\n =Pk \u21b5kAk and =Pl lBl, where Ak, Bl are tensor products of Pauli matrices. To evaluate\n\nEP [] (similarly for EQ[ ]), by linearity it suf\ufb01ces to collect the information of EP [Ak]s, which\nare simply Pauli measurements on the quantum state P and amenable to experiments. Hence, \u21b5k\nand l can be used as the parameters of the discriminator. The second approach is to represent\n, as parameterized quantum circuits (similar to the G) with a measurement in the computational\nbasis. The set of parameters of (respectively ) could be the parameters of the circuit and values\nassociated with each measurement outcome. Our implementation mostly uses the \ufb01rst representation.\nTraining the Regularized Quantum Wasserstein GAN\nFor the scalability of the training of the Regularized Quantum Wasserstein GAN, one must be able to\nevaluate the loss function L = EQ[ ] EP [] EP\u2326Q[\u21e0R] or its gradient ef\ufb01ciently on a quantum\ncomputer. Ideally, one would hope to directly approximate gradients by quantum computers to\nfacilitate the training of qWGAN, e.g., by using the alternating gradient descent method. We show\nthat it is indeed possible and we outline the key steps. More details are in Supplemental Materials C.\nComputing the loss function: Each unitary operation Ui that refers to an actual quantum circuit\ncan be ef\ufb01ciently evaluated on quantum machines in terms of the circuit size. It can be shown that\nL is a linear function of P and can be computed by evaluating each Li = EQ[ ] EUi\u21e20U\u2020i\n[] \nEUi\u21e20U\u2020i \u2326Q[\u21e0R] where Ui\u21e20U\u2020i refers to the state after applying Ui on \u21e20. Similarly, one can show\nthat L is a linear function of the Hermitian matrices , , \u21e0R. Our parameterization of and readily\nallows the use of ef\ufb01cient Pauli measurements to evaluate EP [] and EQ[ ]. To handle the tricky\npart EP\u2326Q[\u21e0R], we relax \u21e0R and use a Taylor series to approximate EP\u2326Q[\u21e0R]; the result form can\nagain be evaluated by Pauli measurements composed with simple SWAP operations. As the major\ncomputation (e.g., circuit evaluation and Pauli measurements) is ef\ufb01cient on quantum machines, the\noverall implementation is ef\ufb01cient with possible overhead of sampling trials.\nComputing the gradients: The parameters of the qWGAN are {pi}[{ \u2713i,j}[{ \u21b5k}[{ l}. L is a\nlinear function of pi,\u21b5 k, l. Thus it can be shown that the partial derivatives w.r.t. pi can be computed\nby evaluating the loss function on a generated state Ui\u21e20U\u2020i and the partial derivatives w.r.t. \u21b5k, l\ncan be computed by evaluating the loss function with , replaced with Ak, Bl respectively. The\npartial derivatives w.r.t. \u2713i,j can be evaluated using techniques due to [36] via a simple yet elegant\nmodi\ufb01cation of the quantum circuits used to evaluate the loss function. The complexity analysis is\nsimilar to above. The only new ingredient is the quantum circuits to evaluate the partial derivatives\nw.r.t. \u2713i,j due to [36], which are again ef\ufb01cient on quantum machines.\nSummary of the training complexity: A rough complexity analysis above suggests that one step of\nthe evaluation of the loss function (or the gradients) of our qWGAN can be ef\ufb01ciently implemented\non quantum machines. (A careful analysis is in Supplemental Materials C.5.) Given this ability, the\nrest of the training of qWGAN is similar to the classical case and will share the same complexity. It\nis worthwhile mentioning that quantum circuit evaluation and Pauli measurements are not known to\nbe ef\ufb01ciently computable by classical machines; the best known approach will cost exponential time.\n5 Experimental Results\nWe supplement our theoretical \ufb01ndings with numerical results by classical simulation of quantum\nWGANs of learning pure states (up to 8 qubits) and mixed states (up to 3 qubits) as well as its\nperformance on noisy quantum machines. We use quantum \ufb01delity between the generated and target\n\n7\n\n\fFidelity vs Training Epochs\n\nTraining Loss\n\nFigure 2: A typical performance of learning pure states (1,2,4, and 8 qubits).\n\nstates to track the progress of our quantum WGAN. If the training is successful, the \ufb01delity will\napproach 1. Our quantum WGAN is trained using the alternating gradient descent method.\nIn most of the cases, the target state is generated by a circuit sharing the same structure with\nthe generator but with randomly chosen parameters. We also demonstrate a special target state\ncorresponding to useful quantum unitaries via Choi-Jamio\u0142kowski isomorphism. More details of the\nfollowing experiments (e.g., parameter choices) can be found in Supplemental Materials D.\nMost of the simulations were run on a dual core Intel I5 processor with 8G memory. The 8-qubit\npure state case was run on a Dual Intel Xeon E5-2697 v2 @ 2.70GHz processor with 128G memory.\nAll source codes are publicly available at https://github.com/yiminghwang/qWGAN.\nPure states We demonstrate a typical performance of quantum WGAN of learning 1, 2, 4, and 8\nqubit pure states in Figure 2. We also plot the average \ufb01delity for 10 runs with random initializations\nin Figure 3 which shows the numerical stability of qWGAN.\n\n1 qubit\n\n2 qubits\n\n4 qubits\n\n8 qubits\n\nFigure 3: Average performance of learning pure states (1, 2, 4, 8 qubits) where the black line is the average\n\ufb01delity over multi-runs with random initializations and the shaded area refers to the range of the \ufb01delity.\n\n1 qubit\n\n2 qubits\n\n3 qubits\n\nFigure 4: Average performance of learning mixed states (1, 2, 3 qubits) where the black line is the average\n\ufb01delity over multi-runs with random initializations and the shaded area refers to the range of the \ufb01delity.\n\nMixed states We also demonstrate a typical learning of mixed quantum states of rank 2 with 1, 2,\nand 3 qubits in Figure 4. The generator now consists of 2 unitary operators and 2 real probability\nparameters p1, p2 which are normalized to form a probability distribution using a softmax layer.\nLearning pure states with noise To investigate the possibility of implementing our quantum\nWGAN on near-term machines, we perform a numerical test on a practically implementable 4-qubit\ngenerator on the ion-trap machine [48] with an approximate noise model [47]. We deem this as\nthe closest example that we can simulate to an actual physical experiment. In particular, we add\na Gaussian sampling noise with standard deviation = 0.2, 0.15, 0.1, 0.05 to the measurement\noutcome of the quantum system. Our results (in Figure 5) show that the quantum WGAN can still\n\n8\n\n\fFigure 5: Learning 4-qubit pure states with\nnoisy quantum operations.\n\nFigure 6: Learning to approximate the 3-qubit Hamiltonian\nsimulation circuit of the 1-d Heisenberg model.\n\nlearn a 4-qubit pure state in the presence of this kind of noise. As expected, noise of higher degrees\n(higher ) increases the number of epochs before the state is learned successfully.\nComparison with existing experimental results We will compare to quantum GANs with quan-\ntum data [4, 13, 23]. It is unfortunate that there is neither precise \ufb01gure nor public data in their papers\nwhich makes a precise comparison infeasible. However, we manage to give a rough comparison as\nfollows. Ref. [13] studies the pure state and the labeled mixed state case for 1 qubit. It can be inferred\nfrom the plots of their results (Figure 8.b in [13]) that the relative entropy for both labels converges\nto 1010 after \u21e0 5000 iterations, and it takes more than 1000 iterations for the relative entropy to\nsigni\ufb01cantly decrease from 1. Ref. [23] performs experiments to learn 1-qubit pure and mixed states\nusing a quantum GAN on a superconducting quantum circuit. However, the speci\ufb01c design of their\nGAN is very unique to the 1-qubit case. They observe that the \ufb01delity between the fake state and the\nreal state approaches 1 after 220 iterations for the pure state, and 120 iterations for the mixed state.\nFrom our \ufb01gures, qWGAN can quickly converge for 1-qubit pure states after 150 160 iterations\nand for a 1-qubit mixed state after \u21e0 120 iterations.\nRef. [4] studies only pure states but with numerical results up to 6 qubits.\nIn particular, they\ndemonstrate (in Figure 6 from [4]) in the case of 6-qubit that the normal gradient descent approach,\nlike the one we use here, won\u2019t make much progress at all after 600 iterations. Hence they introduce\na new training method. This is in sharp contrast to our Figure 2 where we demonstrate smooth\nconvergence to \ufb01delity 1 with the simple gradient descent for 8-qubit pure states within 900 iterations.\nApplication: approximating quantum circuits To approximate any quantum circuit U0 over\nn-qubit space X , consider Choi-Jamio\u0142kowski state 0 over X\u2326X de\ufb01ned as (U0 \u2326 IX ) where\nforms an orthonormal basis\n is the maximally entangled state\nof X . The generator is the normal generator circuit U1 on the \ufb01rst X and identity on the second X ,\ni.e., U1 \u2326 I. In order to learn for the 1-d 3-qubit Heisenberg model circuit (treated as U0) in [11], we\nsimply run our qWGAN to learn the 6-qubit Choi-Jamio\u0142kowski state 0 in Figure 6 and obtain the\ngenerator (i.e., U1). We use the gate set of single or 2-qubit Pauli rotation gates. Then U1 only has 52\ngates, while using the best product-formula (2nd order) U0 has \u21e011900 gates. It is worth noting that\nU1 achieves an average output \ufb01delity over 0.9999 and a worst-case error 0.15, whereas U0 has a\nworst-case error 0.001. However, the worst-case input of U1 is not realistic in current experiments\nand hence the high average \ufb01delity implies very reasonable approximation in practice.\n\ni=0 ~ei \u2326 ~ei and {ei}2n1\n\n1p2nP2n1\n\ni=0\n\n6 Conclusion & Open Questions\n\nWe provide the \ufb01rst design of quantum Wasserstein GANs, its performance analysis on realistic\nquantum hardware through classical simulation, and a real-world application in this paper. At the\ntechnical level, we propose a counterpart of Wasserstein metric between quantum data. We believe\nthat our result opens the possibility of quite a few future directions, for example:\n\u2022 Can we implement our quantum WGAN on an actual quantum computer? Our noisy simulation\n\u2022 Can we apply our quantum WGAN to even larger and noisy quantum systems? In particular, can\nwe approximate more useful quantum circuits using small ones by using quantum WGAN? It\nseems very likely but requires more careful numerical analysis.\n\nsuggests the possibility at least on an ion-trap machine.\n\n\u2022 Can we better understand and build a rich theory of quantum Wasserstein metrics in light of [43]?\n\n9\n\n\fAcknowledgement\nWe thank anonymous reviewers for many constructive comments and Yuan Su for helpful discussions\nabout the reference [11]. SC, TL, and XW received support from the U.S. Department of Energy,\nOf\ufb01ce of Science, Of\ufb01ce of Advanced Scienti\ufb01c Computing Research, Quantum Algorithms Teams\nprogram. SF received support from Capital One and NSF CDS&E-1854532. TL also received support\nfrom an IBM Ph.D. Fellowship and an NSF QISE-NET Triplet Award (DMR-1747426). XW also\nreceived support from NSF CCF-1755800 and CCF-1816695.\n\nReferences\n[1] Martin Arjovsky, Soumith Chintala, and L\u00e9on Bottou, Wasserstein generative adversarial\nnetworks, Proceedings of the 34th International Conference on Machine Learning, pp. 214\u2013223,\n2017, arXiv:1701.07875.\n\n[2] Yogesh Balaji, Rama Chellappa, and Soheil Feizi, Normalized Wasserstein distance for mix-\nture distributions with applications in adversarial learning and domain adaptation, 2019,\narXiv:1902.00415.\n\n[3] Jean-David Benamou and Yann Brenier, A computational \ufb02uid mechanics solution to the\nMonge-Kantorovich mass transfer problem, Numerische Mathematik 84 (2000), no. 3, 375\u2013393.\n[4] Marcello Benedetti, Edward Grant, Leonard Wossnig, and Simone Severini, Adversarial quan-\ntum circuit learning for pure state approximation, New Journal of Physics 21 (2019), no. 4,\n043023, arXiv:1806.00463.\n\n[5] Jacob Biamonte, Peter Wittek, Nicola Pancotti, Patrick Rebentrost, Nathan Wiebe, and Seth\n\nLloyd, Quantum machine learning, Nature 549 (2017), no. 7671, 195, arXiv:1611.09347.\n\n[6] Emanuele Caglioti, Fran\u00e7ois Golse, and Thierry Paul, Quantum optimal transport is cheaper,\n\n2019, arXiv:1908.01829.\n\n[7] Eric A. Carlen and Jan Maas, An analog of the 2-Wasserstein metric in non-commutative\nprobability under which the Fermionic Fokker\u2013Planck equation is gradient \ufb02ow for the entropy,\nCommunications in Mathematical Physics 331 (2014), no. 3, 887\u2013926, arXiv:1203.5377.\n\n[8] Eric A. Carlen and Jan Maas, Gradient \ufb02ow and entropy inequalities for quantum Markov\nsemigroups with detailed balance, Journal of Functional Analysis 273 (2017), no. 5, 1810 \u2013\n1869, arXiv:1609.01254.\n\n[9] Yongxin Chen, Tryphon T. Georgiou, and Allen Tannenbaum, Matrix optimal mass transport:\na quantum mechanical approach, IEEE Transactions on Automatic Control 63 (2018), no. 8,\n2612\u20132619, arXiv:1610.03041.\n\n[10] Yongxin Chen, Tryphon T. Georgiou, and Allen Tannenbaum, Wasserstein geometry of quantum\nstates and optimal transport of matrix-valued measures, Emerging Applications of Control and\nSystems Theory, Springer, 2018, pp. 139\u2013150.\n\n[11] Andrew M. Childs, Dmitri Maslov, Yunseong Nam, Neil J. Ross, and Yuan Su, Toward the \ufb01rst\nquantum simulation with quantum speedup, Proceedings of the National Academy of Sciences\n115 (2018), no. 38, 9456\u20139461, arXiv:1711.10980.\n\n[12] Marco Cuturi, Sinkhorn distances: Lightspeed computation of optimal transport, Advances in\n\nNeural Information Processing Systems, pp. 2292\u20132300, 2013, arXiv:1306.0895.\n\n[13] Pierre-Luc Dallaire-Demers and Nathan Killoran, Quantum generative adversarial networks,\n\nPhysical Review A 98 (2018), 012324, arXiv:1804.08641.\n\n[14] John M. Danskin, The theory of max-min, with applications, SIAM Journal on Applied Mathe-\n\nmatics 14 (1966), no. 4, 641\u2013664.\n\n[15] Christopher M. Dawson and Michael A. Nielsen, The Solovay-Kitaev algorithm, 2005,\n\narXiv:quant-ph/0505030.\n\n10\n\n\f[16] Edward Farhi, Jeffrey Goldstone, and Sam Gutmann, A quantum approximate optimization\n\nalgorithm, 2014, arXiv:1411.4028.\n\n[17] Sidney Golden, Lower bounds for the Helmholtz function, Physical Review 137 (1965), no. 4B,\n\nB1127.\n\n[18] Fran\u00e7ois Golse, Cl\u00e9ment Mouhot, and Thierry Paul, On the mean \ufb01eld and classical limits of\nquantum mechanics, Communications in Mathematical Physics 343 (2016), no. 1, 165\u2013205,\narXiv:1502.06143.\n\n[19] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil\nOzair, Aaron Courville, and Yoshua Bengio, Generative adversarial nets, Advances in Neural\nInformation Processing Systems 27, pp. 2672\u20132680, 2014, arXiv:1406.2661.\n\n[20] Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C. Courville,\nImproved training of Wasserstein GANs, Advances in Neural Information Processing Systems,\npp. 5767\u20135777, 2017, arXiv:1704.00028.\n\n[21] Xin Guo, Johnny Hong, Tianyi Lin, and Nan Yang, Relaxed Wasserstein with applications to\n\nGANs, 2017, arXiv:1705.07164.\n\n[22] Aram W. Harrow, The church of the symmetric subspace, 2013, arXiv:1308.6595.\n\n[23] Ling Hu, Shu-Hao Wu, Weizhou Cai, Yuwei Ma, Xianghao Mu, Yuan Xu, Haiyan Wang, Yipu\nSong, Dong-Ling Deng, Chang-Ling Zou, and Luyan Sun, Quantum generative adversarial\nlearning in a superconducting quantum circuit, Science Advances 5 (2019), no. 1, eaav2761,\narXiv:1808.02893.\n\n[24] Solomon Kullback and Richard A. Leibler, On information and suf\ufb01ciency, The Annals of\n\nMathematical Statistics 22 (1951), no. 1, 79\u201386.\n\n[25] Seth Lloyd and Christian Weedbrook, Quantum generative adversarial learning, Physical\n\nReview Letters 121 (2018), 040502, arXiv:1804.09139.\n\n[26] Lars Mescheder, Andreas Geiger, and Sebastian Nowozin, Which training methods for GANs\ndo actually converge?, Proceedings of the 35th International Conference on Machine Learning,\nvol. 80, pp. 3481\u20133490, 2018, arXiv:1801.04406.\n\n[27] Nikolaj Moll, Panagiotis Barkoutsos, Lev S. Bishop, Jerry M. Chow, Andrew Cross, Daniel J.\nEgger, Stefan Filipp, Andreas Fuhrer, Jay M. Gambetta, Marc Ganzhorn, Abhinav Kandala,\nAntonio Mezzacapo, Peter Muller, Walter Riess, Gian Salis, John Smolin, Ivano Tavernelli, and\nKristan Temme, Quantum optimization using variational algorithms on near-term quantum\ndevices, Quantum Science and Technology 3 (2018), no. 3, 030503, arXiv:1710.01022.\n\n[28] Michael A. Nielsen and Isaac L. Chuang, Quantum computation and quantum information,\n\nCambridge University Press, 2000.\n\n[29] Lipeng Ning, Tryphon T. Georgiou, and Allen Tannenbaum, On matrix-valued Monge\u2013\nKantorovich optimal mass transport, IEEE Transactions on Automatic Control 60 (2014),\nno. 2, 373\u2013382, arXiv:1304.3931.\n\n[30] Alberto Peruzzo, Jarrod McClean, Peter Shadbolt, Man-Hong Yung, Xiao-Qi Zhou, Peter J.\nLove, Al\u00e1n Aspuru-Guzik, and Jeremy L. O\u2019brien, A variational eigenvalue solver on a photonic\nquantum processor, Nature Communications 5 (2014), 4213, arXiv:1304.3061.\n\n[31] Henning Petzka, Asja Fischer, and Denis Lukovnicov, On the regularization of Wasserstein\n\nGANs, 2017, arXiv:1709.08894.\n\n[32] Gabriel Peyre, Lenaic Chizat, Francois-Xavier Vialard, and Justin Solomon, Quantum optimal\n\ntransport for tensor \ufb01eld processing, 2016, arXiv:1612.08731.\n\n[33] John Preskill, Quantum computing in the NISQ era and beyond, Quantum 2 (2018), 79,\n\narXiv:1801.00862.\n\n11\n\n\f[34] Jonathan Romero and Alan Aspuru-Guzik, Variational quantum generators: Generative adver-\n\nsarial quantum machine learning for continuous distributions, 2019, arXiv:1901.00848.\n\n[35] Maziar Sanjabi, Jimmy Ba, Meisam Razaviyayn, and Jason D. Lee, On the convergence and\nrobustness of training GANs with regularized optimal transport, Advances in Neural Information\nProcessing Systems 31, pp. 7091\u20137101, Curran Associates, Inc., 2018, arXiv:1802.08249.\n\n[36] Maria Schuld, Ville Bergholm, Christian Gogolin, Josh Izaac, and Nathan Killoran, Evaluat-\ning analytic gradients on quantum hardware, Physical Review A 99 (2019), no. 3, 032331,\narXiv:1811.11184.\n\n[37] Maria Schuld, Ilya Sinayskiy, and Francesco Petruccione, An introduction to quantum machine\n\nlearning, Contemporary Physics 56 (2015), no. 2, 172\u2013185, arXiv:1409.3097.\n\n[38] Vivien Seguy, Bharath Bhushan Damodaran, R\u00e9mi Flamary, Nicolas Courty, Antoine Ro-\nlet, and Mathieu Blondel, Large-scale optimal transport and mapping estimation, 2017,\narXiv:1711.02283.\n\n[39] Haozhen Situ, Zhimin He, Lvzhou Li, and Shenggen Zheng, Quantum generative adversarial\n\nnetwork for generating discrete data, 2018, arXiv:1807.01235.\n\n[40] Colin J. Thompson, Inequality with applications in statistical mechanics, Journal of Mathemati-\n\ncal Physics 6 (1965), no. 11, 1812\u20131813.\n\n[41] Hale F. Trotter, On the product of semi-groups of operators, Proceedings of the American\n\nMathematical Society 10 (1959), no. 4, 545\u2013551.\n\n[42] Koji Tsuda, Gunnar R\u00e4tsch, and Manfred K. Warmuth, Matrix exponentiated gradient updates\nfor on-line learning and Bregman projection, Journal of Machine Learning Research 6 (2005),\nno. Jun, 995\u20131018.\n\n[43] C\u00e9dric Villani, Optimal transport: old and new, vol. 338, Springer Science & Business Media,\n\n2008.\n\n[44] John Watrous, Simpler semide\ufb01nite programs for completely bounded norms, Chicago Journal\n\nof Theoretical Computer Science 8 (2013), 1\u201319, arXiv:1207.5726.\n\n[45] Nengkun Yu, Li Zhou, Shenggang Ying, and Mingsheng Ying, Quantum earth mover\u2019s dis-\ntance, no-go quantum kantorovich-rubinstein theorem, and quantum marginal problem, 2018,\n1803.02673.\n\n[46] Jinfeng Zeng, Yufeng Wu, Jin-Guo Liu, Lei Wang, and Jiangping Hu, Learning and inference\n\non generative adversarial quantum circuits, 2018, arXiv:1808.03425.\n\n[47] Daiwei Zhu, Personal Communication, Feb, 2019.\n[48] Daiwei Zhu, Norbert M. Linke, Marcello Benedetti, Kevin A. Landsman, Nhung H. Nguyen,\nC. Huerta Alderete, Alejandro Perdomo-Ortiz, Nathan Korda, A. Garfoot, Charles Brecque,\nLaird Egan, Oscar Perdomo, and Christopher Monroe, Training of quantum circuits on a hybrid\nquantum computer, Science Advances 5 (2019), no. 10, eaaw9918, arXiv:1812.08862.\n\n[49] Christa Zoufal, Aur\u00e9lien Lucchi, and Stefan Woerner, Quantum generative adversarial networks\n\nfor learning and loading random distributions, 2019, arXiv:1904.00043.\n\n12\n\n\f", "award": [], "sourceid": 3674, "authors": [{"given_name": "Shouvanik", "family_name": "Chakrabarti", "institution": "University of Maryland"}, {"given_name": "Huang", "family_name": "Yiming", "institution": "University of Electronic Science and Technology of China; University of Maryland"}, {"given_name": "Tongyang", "family_name": "Li", "institution": "University of Maryland"}, {"given_name": "Soheil", "family_name": "Feizi", "institution": "University of Maryland"}, {"given_name": "Xiaodi", "family_name": "Wu", "institution": "University of Maryland"}]}