{"title": "Deep Generative Models with Learnable Knowledge Constraints", "book": "Advances in Neural Information Processing Systems", "page_first": 10501, "page_last": 10512, "abstract": "The broad set of deep generative models (DGMs) has achieved remarkable advances. However, it is often difficult to incorporate rich structured domain knowledge with the end-to-end DGMs. Posterior regularization (PR) offers a principled framework to impose structured constraints on probabilistic models, but has limited applicability to the diverse DGMs that can lack a Bayesian formulation or even explicit density evaluation. PR also requires constraints to be fully specified {\\it a priori}, which is impractical or suboptimal for complex knowledge with learnable uncertain parts. In this paper, we establish mathematical correspondence between PR and reinforcement learning (RL), and, based on the connection, expand PR to learn constraints as the extrinsic reward in RL. The resulting algorithm is model-agnostic to apply to any DGMs, and is flexible to adapt arbitrary constraints with the model jointly. Experiments on human image generation and templated sentence generation show models with learned knowledge constraints by our algorithm greatly improve over base generative models.", "full_text": "Deep Generative Models with Learnable\n\nKnowledge Constraints\n\nZhiting Hu, Zichao Yang, Ruslan Salakhutdinov,\n\nXiaodan Liang, Lianhui Qin, Haoye Dong, Eric P. Xing\n\n{zhitingh,zichaoy,rsalakhu,xiaodan1}@cs.cmu.edu, eric.xing@petuum.com\n\nCarnegie Mellon University, Petuum Inc.\n\nAbstract\n\nThe broad set of deep generative models (DGMs) has achieved remarkable ad-\nvances. However, it is often dif\ufb01cult to incorporate rich structured domain knowl-\nedge with the end-to-end DGMs. Posterior regularization (PR) offers a principled\nframework to impose structured constraints on probabilistic models, but has limited\napplicability to the diverse DGMs that can lack a Bayesian formulation or even\nexplicit density evaluation. PR also requires constraints to be fully speci\ufb01ed a\npriori, which is impractical or suboptimal for complex knowledge with learnable\nuncertain parts. In this paper, we establish mathematical correspondence between\nPR and reinforcement learning (RL), and, based on the connection, expand PR to\nlearn constraints as the extrinsic reward in RL. The resulting algorithm is model-\nagnostic to apply to any DGMs, and is \ufb02exible to adapt arbitrary constraints with\nthe model jointly. Experiments on human image generation and templated sentence\ngeneration show models with learned knowledge constraints by our algorithm\ngreatly improve over base generative models.\n\n1\n\nIntroduction\n\nGenerative models provide a powerful mechanism for learning data distributions and simulating\nsamples. Recent years have seen remarkable advances especially on the deep approaches [16, 25]\nsuch as Generative Adversarial Networks (GANs) [15], Variational Autoencoders (VAEs) [27],\nauto-regressive networks [29, 42], and so forth. However, it is usually dif\ufb01cult to exploit in these\nvarious deep generative models rich problem structures and domain knowledge (e.g., the human\nbody structure in image generation, Figure 1). Many times we have to hope the deep networks can\ndiscover the structures from massive data by themselves, leaving much valuable domain knowledge\nunused. Recent efforts of designing specialized network architectures or learning disentangled\nrepresentations [5, 23] are usually only applicable to speci\ufb01c knowledge, models, or tasks. It is\ntherefore highly desirable to have a general means of incorporating arbitrary structured knowledge\nwith any types of deep generative models in a principled way.\nOn the other hand, posterior regularization (PR) [13] is a principled framework to impose knowledge\nconstraints on posterior distributions of probabilistic models, and has shown effectiveness in regulating\nthe learning of models in different context. For example, [21] extends PR to incorporate structured\nlogic rules with neural classi\ufb01ers. However, the previous approaches are not directly applicable to the\ngeneral case of deep generative models, as many of the models (e.g., GANs, many auto-regressive\nnetworks) are not straightforwardly formulated with the probabilistic Bayesian framework and do not\npossess a posterior distribution or even meaningful latent variables. Moreover, PR has required a\npriori \ufb01xed constraints. That means users have to fully specify the constraints beforehand, which can\nbe impractical due to heavy engineering, or suboptimal without adaptivity to the data and models. To\nextend the scope of applicable knowledge and reduce engineering burden, it is necessary to allow\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fFigure 1: Two example applications of imposing learnable knowledge constraints on generative\nmodels. Left: Given a person image and a target pose (de\ufb01ned by key points), the goal is to generate\nan image of the person under the new pose. The constraint is to force the human parts (e.g., head) of\nthe generated image to match those of the true target image. Right: Given a text template, the goal is\nto generate a complete sentence following the template. The constraint is to force the match between\nthe in\ufb01lling content of the generated sentence with the true content. (See sec 5 for more details.)\n\nusers to specify only partial or fuzzy structures, while learning remaining parts of the constraints\njointly with the regulated model.\nTo this end, we establish formal connections between the PR framework with a broad set of algorithms\nin the control and reinforcement learning (RL) domains, and, based on the connections, transfer\nwell-developed RL techniques for constraint learning in PR. In particular, though the PR framework\nand the RL are apparently distinct paradigms applied in different context, we show mathematical\ncorrespondence between the model and constraints in PR with the policy and reward in entropy-\nregularized policy optimization [43, 45, 1], respectively. This thus naturally inspires to leverage\nrelevant approach from the RL domain (speci\ufb01cally, the maximum entropy inverse RL [56, 11]) to\nlearn the PR constraints from data (i.e., demonstrations in RL).\nBased on the uni\ufb01ed perspective, we drive a practical algorithm with ef\ufb01cient estimations and\nmoderate approximations. The algorithm is ef\ufb01cient to regularize large target space with arbitrary\nconstraints, \ufb02exible to couple adapting the constraints with learning the model, and model-agnostic to\napply to diverse deep generative models, including implicit models where generative density cannot\nbe evaluated [40, 15]. We demonstrate the effectiveness of the proposed approach in both image and\ntext generation (Figure 1). Leveraging domain knowledge of structure-preserving constraints, the\nresulting models improve over base generative models.\n\n2 Related Work\n\nIt is of increasing interest to incorporate problem structures and domain knowledge in machine\nlearning approaches [49, 13, 21]. The added structure helps to facilitate learning, enhance general-\nization, and improve interpretability. For deep neural models, one of the common ways is to design\nspecialized network architectures or features for speci\ufb01c tasks (e.g., [2, 34, 28, 33]). Such a method\ntypically has a limited scope of applicable tasks, models, or knowledge. On the other hand, for\nstructured probabilistic models, posterior regularization (PR) and related frameworks [13, 32, 4]\nprovide a general means to impose knowledge constraints during model estimation. [21] develops\niterative knowledge distillation based on PR to regularize neural networks with any logic rules.\nHowever, the application of PR to the broad class of deep generative models has been hindered, as\nmany of the models do not even possess meaningful latent variables or explicit density evaluation (i.e.,\nimplicit models). Previous attempts thus are limited to applying simple max-margin constraints [31].\nThe requirement of a priori \ufb01xed constraints has also made PR impractical for complex, uncertain\nknowledge. Previous efforts to alleviate the issue either require additional manual supervision [39] or\nis limited to regularizing small label space [22]. This paper develops a practical algorithm that is\ngenerally applicable to any deep generative models and any learnable constraints on arbitrary (large)\ntarget space.\nOur work builds connections between the Bayesian PR framework and reinforcement learning. A\nrelevant, broad research topic of formalizing RL as a probabilistic inference problem has been\nexplored in the RL literature [6, 7, 41, 30, 1, 48], where rich approximate inference tools are used\nto improve the modeling and reasoning for various RL algorithms. The link between RL and PR\n\n2\n\nStructured\tconsistencyConstraint \ud835\udc53\"source\timagetarget\tposeGenerativemodel \ud835\udc5d$truetargetgenerated\timageHuman\tpart\tparserLearnable\tmodule\t\ud835\udf19\u201cmeant\ttodnot\tto\t.\u201d\u201cIt\twasmeant\tto\tdazzlenot\tto\tmake\tit.\u201d\u201cIt\twas\tmeant\tto\tdazzle\tnot\tto\tmake\tsense.\u201dGenerativemodel \ud835\udc5d$true target:generated:Infilling\tcontentmatchingLearnable\tmodule\t\ud835\udf19Constraint \ud835\udc53\"template:\fComponents\n\nPR\n\nx data/generations\n\np(x)\nf (x)/R(x)\nq(x)\n\ngenerative model p\u03b8\nconstraint f\u03c6\nvariational distr. q, Eq.3\n\nEntropy-Reg RL\naction-state samples\n(old) policy p\u03c0\nreward R\n(new) policy q\u03c0\n\nMaxEnt IRL\ndemonstrations\n\u2014\nreward R\u03c6\npolicy q\u03c6\n\n(Energy) GANs\ndata/generations\ngenerator\ndiscriminator\n\u2014\n\nTable 1: Uni\ufb01ed perspective of the different approaches, showing mathematical correspondence\nof PR with the entropy-regularized RL (sec 3.2.1) and maximum entropy IRL (sec 3.2.2), and its\n(conceptual) relations to (energy-based) GANs (sec 4).\n\nhas not been previously studied. We establish the mathematical correspondence, and, differing from\nthe RL literature, we in turn transfer the tools from RL to expand the probabilistic PR framework.\nInverse reinforcement learning (IRL) seeks to learn a reward function from expert demonstrations.\nRecent approaches based on maximum-entropy IRL [56] are developed to learn both the reward and\npolicy [11, 10, 12]. We adopt the maximum-entropy IRL formulation to derive the constraint learning\nobjective in our algorithm, and leverage the unique structure of PR for ef\ufb01cient importance sampling\nestimation, which differs from these previous approaches.\n\n3 Connecting Posterior Regularization to Reinforcement Learning\n\n3.1 PR for Deep Generative Models\n\nPR [13] was originally proposed to provide a principled framework for incorporating constraints on\nposterior distributions of probabilistic models with latent variables. The formulation is not generally\napplicable to deep generative models as many of them (e.g., GANs and autoregressive models) are not\nformulated within the Bayesian framework and do not possess a valid posterior distribution or even\nsemantically meaningful latent variables. Here we adopt a slightly adapted formulation that makes\nminimal assumptions on the speci\ufb01cations of the model to regularize. It is worth noting that though\nwe present in the generative model context, the formulations, including the algorithm developed later\n(sec 4), can straightforwardly be extended to other settings such as discriminative models.\nConsider a generative model x \u223c p\u03b8(x) with parameters \u03b8. Note that generation of x can condition\non arbitrary other elements (e.g., the source image for image transformation) which are omitted for\nsimplicity of notations. Denote the original objective of p\u03b8(x) with L(\u03b8). PR augments the objective\nby adding a constraint term encoding the domain knowledge. Without loss of generality, consider\nconstraint function f (x) \u2208 R, such that a higher f (x) value indicates a better x in terms of the\nparticular knowledge. Note that f can also involve other factors such as latent variables and extra\nsupervisions, and can include a set of multiple constraints.\nA straightforward way to impose the constraint on the model is to maximize Ep\u03b8 [f (x)]. Such method\nis ef\ufb01cient only when p\u03b8 is a GAN-like implicit generative model or an explicit distribution that\ncan be ef\ufb01ciently reparameterized (e.g., Gaussian [27]). For other models such as the large set of\nnon-reparameterizable explicit distributions, the gradient \u2207\u03b8Ep\u03b8 [f (x)] is usually computed with\nthe log-derivative trick and can suffer from high variance. For broad applicability and ef\ufb01cient\noptimization, PR instead imposes the constraint on an auxiliary variational distribution q, which is\nencouraged to stay close to p\u03b8 through a KL divergence term:\n\nL(\u03b8, q) = KL(q(x)(cid:107)p\u03b8(x)) \u2212 \u03b1Eq [f (x)] ,\n\n(1)\nwhere \u03b1 is the weight of the constraint term. The PR objective for learning the model is written as:\n(2)\nwhere \u03bb is the balancing hyperparameter. As optimizing the original model objective L(\u03b8) is\nstraightforward and depends on the speci\ufb01c generative model of choice, in the following we omit the\ndiscussion of L(\u03b8) and focus on L(\u03b8, q) introduced by the framework.\nThe problem is solved using an EM-style algorithm [13, 21]. Speci\ufb01cally, the E-step optimizes Eq.(1)\nw.r.t q, which is convex and has a closed-form solution at each iteration given \u03b8:\n\nmin\u03b8,q L(\u03b8) + \u03bbL(\u03b8, q),\n\n\u2217\n\nq\n\n(x) = p\u03b8(x) exp{\u03b1f (x)} /Z,\n\n(3)\n\n3\n\n\fwhere Z is the normalization term. We can see q\u2217 as an energy-based distribution with the negative\nenergy de\ufb01ned by \u03b1f (x) + log p\u03b8(x). With q from the E-step \ufb01xed, the M-step optimizes Eq.(1)\nw.r.t \u03b8 with:\n\nmin\u03b8 KL(q(x)(cid:107)p\u03b8(x)) = min\u03b8 \u2212Eq [log p\u03b8(x)] + const.\n\n(4)\n\nConstraint f in PR has to be fully-speci\ufb01ed a priori and is \ufb01xed throughout the learning. It would be\ndesirable or even necessary to enable learnable constraints so that practitioners are allowed to specify\nonly the known components of f while leaving any unknown or uncertain components automatically\nlearned. For example, for human image generation in Figure 1, left panel, users are able to specify\nstructures on the parsed human parts, while it is impractical to also manually engineer the human part\nparser that involves recognizing parts from raw image pixels. It is favorable to instead cast the parser\nas a learnable module in the constraint. Though it is possible to pre-train the module and simply \ufb01x\nin PR, the lack of adaptivity to the data and model can lead to suboptimal results, as shown in the\nempirical study (Table 2). This necessitates to expand the PR framework to enable joint learning of\nconstraints with the model.\nDenote the constraint function with learnable components as f\u03c6(x), where \u03c6 can be of various\nforms that are optimizable, such as the free parameters of a structural model, or a graph structure to\noptimize.\nSimple way of learning the constraint. A straightforward way to learn the constraint is to directly\noptimize Eq.(1) w.r.t \u03c6 in the M-step, yielding\n\nmax\u03c6 Ex\u223cq(x)[f\u03c6(x)].\n\n(5)\nThat is, the constraint is trained to \ufb01t to the samples from the current regularized model q. However,\nsuch objective can be problematic as the generated samples can be of low quality, e.g., due to poor\nstate of the generative parameter \u03b8 at initial stages, or insuf\ufb01cient capability of the generative model\nper se.\nIn this paper, we propose to treat the learning of constraint as an extrinsic reward, as motivated by the\nconnections between PR with the reinforcement learning domain presented below.\n\n3.2 PR and RL\n\nRL or optimal control has been studied primarily for determining optimal action sequences or\nstrategies, which is signi\ufb01cantly different from the context of PR that aims at regulating generative\nmodels. However, formulations very similar to PR (e.g., Eqs.1 and 3) have been developed and\nwidely used, in both the (forward) RL for policy optimization and the inverse RL for reward learning.\nTo make the mathematical correspondence clearer, we intentionally re-use most of the notations from\nPR. Table 1 lists the correspondence. Speci\ufb01cally, consider a stationary Markov decision process\n(MDP). An agent in state s draws an action a following the policy p\u03c0(a|s). The state subsequently\ntransfers to s(cid:48) (with some transition probability of the MDP), and a reward is obtained R(s, a) \u2208 R.\nLet x = (s, a) denote the state-action pair, and p\u03c0(x) = \u00b5\u03c0(s)p\u03c0(a|s) where \u00b5\u03c0(s) is the stationary\nstate distribution [47].\n\n3.2.1 Entropy regularized policy optimization\n\nThe goal of policy optimization is to \ufb01nd the optimal policy that maximizes the expected reward.\nThe rich research line of entropy regularized policy optimization has augmented the objective with\ninformation theoretic regularizers such as KL divergence between the new policy and the old policy\nfor stabilized learning. With a slight abuse of notations, let q\u03c0(x) denote the new policy and p\u03c0(x)\nthe old one. A prominent algorithm for example is the relative entropy policy search (REPS) [43]\nwhich follows the objective:\n\nminq\u03c0 L(q\u03c0) = KL(q\u03c0(x)(cid:107)p\u03c0(x)) \u2212 \u03b1Eq\u03c0 [R(x)] ,\n\n(6)\nwhere the KL divergence prevents the policy from changing too rapidly. Similar objectives have also\nbeen widely used in other workhorse algorithms such as trust-region policy optimization (TRPO) [45],\nsoft Q-learning [17, 46], and others.\nWe can see the close resemblance between Eq.(6) with the PR objective in Eq.(1), where the generative\nmodel p\u03b8(x) in PR corresponds to the reference policy p\u03c0(x), while the constraint f (x) corresponds\n\n4\n\n\fto the reward R(x). The new policy q\u03c0 can be either a parametric distribution [45] or a non-parametric\ndistribution [43, 1]. For the latter, the optimization of Eq.(6) precisely corresponds to the E-step\nof PR, yielding the optimal policy q\u2217\n\u03c0(x) that takes the same form of q\u2217(x) in Eq.(3), with p\u03b8 and\nf replaced with the respective counterparts p\u03c0 and R, respectively. The parametric policy p\u03c0 is\nsubsequently updated with samples from q\u2217\n\u03c0, which is exactly equivalent to the M-step in PR (Eq.4).\nWhile the above policy optimization algorithms have assumed a reward function given by the external\nenvironment, just as the pre-de\ufb01ned constraint function in PR, the strong connections above inspire\nus to treat the PR constraint as an extrinsic reward, and utilize the rich tools in RL (especially the\ninverse RL) for learning the constraint.\n\n3.2.2 Maximum entropy inverse reinforcement learning\n\nMaximum entropy (MaxEnt) IRL [56] is among the most widely-used methods that induce the reward\nfunction from expert demonstrations x \u223c pd(x), where pd is the empirical demonstration (data)\ndistribution. MaxEnt IRL adopts the same principle as the above entropy regularized RL (Eq.6)\nthat maximizes the expected reward regularized by the relative entropy (i.e., the KL), except that,\nin MaxEnt IRL, p\u03c0 is replaced with a uniform distribution and the regularization reduces to the\nentropy of q\u03c0. Therefore, same as above, the optimal policy takes the form exp{\u03b1R(x)}/Z. MaxEnt\nIRL assumes the demonstrations are drawn from the optimal policy. Learning the reward function\nR\u03c6(x) with unknown parameters \u03c6 is then cast as maximizing the likelihood of the distribution\nq\u03c6(x) := exp{\u03b1R\u03c6(x)}/Z\u03c6:\n\n= arg max\u03c6 Ex\u223cpd [log q\u03c6(x)] .\n\n(7)\nGiven the direct correspondence between the policy q\u03c6\u2217 in MaxEnt IRL and the policy optimization\nsolution q\u2217\n\u03c0 of Eq.(6), plus the connection between the regularized distribution q\u2217 of PR (Eq.3) and q\u2217\nas built in sec 3.2.1, we can readily link q\u2217 and q\u03c6\u2217. This motivates to plug q\u2217 in the above maximum\n\u03c0\nlikelihood objective to learn the constraint f\u03c6(x) which is parallel to the reward function R\u03c6(x).\nWe present the resulting full algorithm in the next section. Table 1 summarizes the correspondence\nbetween PR, entropy regularized policy gradient, and maximum entropy IRL.\n\n\u2217\n\u03c6\n\n4 Algorithm\n\nWe have formally related PR to the RL methods. With the uni\ufb01ed view of these approaches, we\nderive a practical algorithm for arbitrary learnable constraints on any deep generative models. The\nalgorithm alternates the optimization of the constraint f\u03c6 and the generative model p\u03b8.\n\n4.1 Learning the Constraint f\u03c6\n\n\u2207\u03c6Ex\u223cpd [log q(x)] = \u2207\u03c6\n\nAs motivated in section 3.2, instead of directly optimizing f\u03c6 in the original PR objectives (Eq.5)\nwhich can be problematic, we treat f\u03c6 as the reward function to be induced with the MaxEnt IRL\nframework. That is, we maximize the data likelihood of q(x) (Eq.3) w.r.t \u03c6, yielding the gradient:\n\n(cid:2)Ex\u223cpd [\u03b1f\u03c6(x)] \u2212 log Z\u03c6\n\n(cid:3)\n\n(8)\nThe second term involves estimating the expectation w.r.t an energy-based distribution Eq(x)[\u00b7], which\nis in general very challenging. However, we can exploit the special structure of q \u221d p\u03b8 exp{\u03b1f\u03c6}\nfor ef\ufb01cient approximation. Speci\ufb01cally, we use p\u03b8 as the proposal distribution, and obtain the\nimportance sampling estimate of the second term as following:\n\n= Ex\u223cpd [\u03b1\u2207\u03c6f\u03c6(x)] \u2212 Eq(x) [\u03b1\u2207\u03c6f\u03c6(x)] .\n\n(cid:20) q(x)\n\n(cid:21)\n\nEq(x) [\u03b1\u2207\u03c6f\u03c6(x)] = Ex\u223cp\u03b8 (x)\n\n\u00b7 \u03b1\u2207\u03c6f\u03c6(x)\n\np\u03b8(x)\n\n= 1/Z\u03c6 \u00b7 Ex\u223cp\u03b8 (x) [exp{\u03b1f\u03c6(x)} \u00b7 \u03b1\u2207\u03c6f\u03c6(x)] .\n\nNote that the normalization Z\u03c6 =(cid:82) p\u03b8(x) exp{\u03b1f\u03c6(x)} can also be estimated ef\ufb01ciently with MC\nsampling: \u02c6Z\u03c6 = 1/N(cid:80)\n\nexp{\u03b1f\u03c6(xi)}, where xi \u223c p\u03b8. The base generative distribution p\u03b8 is a\nnatural choice for the proposal as it is in general amenable to ef\ufb01cient sampling, and is close to q\nas forced by the KL divergence in Eq.(1). Our empirical study shows low variance of the learning\nprocess (sec 5). Moreover, using p\u03b8 as the proposal distribution allows p\u03b8 to be an implicit generative\nmodel (as no likelihood evaluation of p\u03b8 is needed). Note that the importance sampling estimation is\nconsistent yet biased.\n\n(9)\n\nxi\n\n5\n\n\f4.2 Learning the Generative Model p\u03b8\n\nGiven the current parameter state (\u03b8 = \u03b8t, \u03c6 = \u03c6t), and q(x) evaluated at the parameters, we\ncontinue to update the generative model. Recall that optimization of the generative parameter \u03b8 is\nperformed by minimizing the KL divergence in Eq.(4), which we replicate here:\n\nmin\u03b8 KL(q(x)(cid:107)p\u03b8(x)) = min\u03b8 \u2212Eq(x) [log p\u03b8(x)] + const.\n\n(10)\nThe expectation w.r.t q(x) can be estimated as above (Eq.9). A drawback of the objective is the\nrequirement of evaluating the generative density p\u03b8(x), which is incompatible to the emerging\nimplicit generative models [40] that only permit simulating samples but not evaluating density.\nTo address the restriction, when it comes to regularizing implicit models, we propose to instead\nminimize the reverse KL divergence:\n\n(cid:20)\nmin\u03b8 KL (p\u03b8(x)(cid:107)q(x)) = min\u03b8 Ep\u03b8\n= min\u03b8 \u2212Ep\u03b8\n\n(cid:21)\n\np\u03b8 \u00b7 Z\u03c6t\n\nlog\n\np\u03b8t exp{\u03b1f\u03c6t}\n\n(cid:2)\u03b1f\u03c6t (x)(cid:3) + KL(p\u03b8(cid:107)p\u03b8t ) + const.\n\n(11)\n\n(cid:2)\u03b1f\u03c6t (x)(cid:3)|\u03b8=\u03b8t .\n\nBy noting that \u2207\u03b8KL (p\u03b8(cid:107)p\u03b8t)|\u03b8=\u03b8t = 0, we obtain the gradient w.r.t \u03b8:\n\n\u2207\u03b8KL (p\u03b8(x)(cid:107)q(x))|\u03b8=\u03b8t = \u2212\u2207\u03b8Ep\u03b8\n\n(12)\nThat is, the gradient of minimizing the reversed KL divergence equals the gradient of maximizing\nEp\u03b8 [\u03b1f\u03c6t(x)]. Intuitively, the objective encourages the generative model p\u03b8 to generate samples that\nthe constraint function assigns high scores. Though the objective for implicit model deviates the\noriginal PR framework, reversing KL for computationality was also used previously such as in the\nclassic wake-sleep method [19]. The resulting algorithm also resembles the adversarial learning in\nGANs, as we discuss in the next section. Empirical results on implicit models show the effectiveness\nof the objective.\nThe resulting algorithm is summarized in Alg.1.\n\nAlgorithm 1 Joint Learning of Deep Generative Model and Constraints\nInput: The base generative model p\u03b8(x)\n\nThe (set of) constraints f\u03c6(x)\n\nOptimize constraints \u03c6 with Eq.(8)\nif p\u03b8 is an implicit model then\n\n1: Initialize generative parameter \u03b8 and constraint parameter \u03c6\n2: repeat\n3:\n4:\n5:\n6:\n7:\n8:\n9: until convergence\nOutput: Jointly learned generative model p\u03b8\u2217 (x) and constraints f\u03c6\u2217 (x)\n\nOptimize model \u03b8 with Eq.(12) along with minimizing original model objective L(\u03b8)\nOptimize model \u03b8 with Eq.(10) along with minimizing L(\u03b8)\n\nend if\n\nelse\n\nConnections to adversarial learning For implicit generative models, the two objectives w.r.t \u03c6\nand \u03b8 (Eq.8 and Eq.12) are conceptually similar to the adversarial learning in GANs [15] and the\nvariants such as energy-based GANs [26, 55, 54, 50]. Speci\ufb01cally, the constraint f\u03c6(x) can be seen\nas being optimized to assign lower energy (with the energy-based distribution q(x)) to real examples\nfrom pd(x), and higher energy to fake samples from q(x) which is the regularized model of the\ngenerator p\u03b8(x). In contrast, the generator p\u03b8(x) is optimized to generate samples that confuse f\u03c6\nand obtain lower energy. Such adversarial relation links the PR constraint f\u03c6(x) to the discriminator\nin GANs (Table 1). Note that here fake samples are generated from q(x) and p\u03b8(x) in the two\nlearning phases, respectively, which differs from previous adversarial methods for energy-based\nmodel estimation that simulate only from a generator. Besides, distinct from the discriminator-centric\nview of the previous work [26, 54, 50], we primarily aim at improving the generative model by\nincorporating learned constraints. Last but not the least, as discussed in sec 3.1, the proposed\nframework and algorithm are more generally and ef\ufb01ciently applicable to not only implicit generative\nmodels as in GANs, but also (non-)reparameterizable explicit generative models.\n\n6\n\n\f5 Experiments\n\nWe demonstrate the applications and effectiveness of the algorithm in two tasks related to image and\ntext generation [24], respectively.\n\nMethod\n\nPumarola et al. [44]\n\n1 Ma et al. [38]\n2\n3 Ma et al. [37]\n4 Base model\n5 With \ufb01xed constraint\n6 With learned constraint\n\nSSIM Human\n0.614 \u2014\n0.747 \u2014\n0.762 \u2014\n0.676\n0.679\n0.727\n\n0.03\n0.12\n0.77\n\nTable 2: Results of image generation on Structural\nSimilarity (SSIM) [52] between generated and true\nimages, and human survey where the full model\nyields better generations than the base models (Rows\n5-6) on 77% test cases. See the text for more results\nand discussion.\n\nFigure 2: Training losses of the three mod-\nels. The model with learned constraint con-\nverges smoothly as base models.\n\nFigure 3: Samples generated by the models in Table 2. The model with learned human part constraint\ngenerates correct poses and preserves human body structure much better.\n\n5.1 Pose Conditional Person Image Generation\n\nGiven a person image and a new body pose, the goal is to generate an image of the same person under\nthe new pose (Figure 1, left). The task is challenging due to body self-occlusions and many cloth\nand shape ambiguities. Complete end-to-end generative networks have previously failed [37] and\nexisting work designed specialized generative processes or network architectures [37, 44, 38]. We\nshow that with an added body part consistency constraint, a plain end-to-end generative model can\nalso be trained to produce highly competitive results, signi\ufb01cantly improving over base models that\ndo not incorporate the problem structure.\nSetup. We follow the previous work [37] and obtain from DeepFashion [35] a set of triples (source\nimage, pose keypoints, target image) as supervision data. The base generative model p\u03c6 is an implicit\nmodel that transforms the input source and pose directly to the pixels of generated image (and\nhence de\ufb01nes a Dirac-delta distribution). We use the residual block architecture [51] widely-used in\nimage generation for the generative model. The base model is trained to minimize the L1 distance\nloss between the real and generated pixel values, as well as to confuse a binary discriminator that\ndistinguishes between the generation and the true target image.\nKnowledge constraint. Neither the pixel-wise distance nor the binary discriminator loss encode\nany task structures. We introduce a structured consistency constraint f\u03c6 that encourages each of the\nbody parts (e.g., head, legs) of the generated image to match the respective part of the true image.\nSpeci\ufb01cally, the constraint f\u03c6 includes a human parsing module that classi\ufb01es each pixel of a person\nimage into possible body parts. The constraint then evaluates cross entropies of the per-pixel part\n\n7\n\n040080012001600Iterations6.12.18.LossBase modelWith fixed constraintWith learned constraintsource\timagetarget\tposetarget\timageLearned\tconstraintBase\tmodelFixed\tconstraint\fModel\n\n1 Base model\n2 With binary D\n3 With constraint updated\n\nin M-step (Eq.5)\n\nPerplexity Human\n30.30\n30.01\n31.27\n\n0.19\n0.20\n0.15\n\n4 With learned constraint\nTable 3: Sentence generation results on test set per-\nplexity and human survey. Samples by the full model\nare considered as of higher quality in 24% cases.\n\n28.69\n\n0.24\n\nacting\nacting is the acting .\nacting is also very good .\n\nthe\nthe\n\nout of 10 .\n10 out of 10 .\nI will give the movie 7 out of 10 .\n\nTable 4: Two test examples, including the\ntemplate, the sample by the base model, and\nthe sample by the constrained model.\n\ndistributions between the generated and true images. The average negative cross entropy serves as\nthe constraint score. The parsing module is parameterized as a neural network with parameters \u03c6,\npre-trained on an external parsing dataset [14], and subsequently adapted within our algorithm jointly\nwith the generative model.\nResults. Table 2 compares the full model (with the learned constraint, Row 6) with the base model\n(Row 4) and the one regularized with the constraint that is \ufb01xed after pre-training (Row 5). Human\nsurvey is performed by asking annotators to rank the quality of images generated by the three models\non each of 200 test cases, and the percentages of ranked as the best are reported (Tied ranking is\ntreated as negative result). We can see great improvement by the proposed algorithm. The model\nwith \ufb01xed constraint fails, partially because pre-training on external data does not necessarily \ufb01t to\nthe current problem domain. This highlights the necessity of the constraint learning. Figure 3 shows\nexamples further validating the effectiveness of the algorithm.\nIn sec 4, we have discussed the close connection between the proposed algorithm and (energy-based)\nGANs. The conventional discriminator in GANs can be seen as a special type of constraint. With this\nconnection and given that the generator in the task is an implicit generative model, here we can also\napply and learn the structured consistency constraint using GANs, which is equivalent to replacing\nq(x) in Eq.(8) with p\u03b8(x). Such a variant produces a SSIM score of 0.716, slightly inferior to the\nresult of the full algorithm (Row 6). We suspect this is because fake samples by q (instead of p) can\nhelp with better constraint learning. It would be interesting to explore this in more applications.\nTo give a sense of the state of the task, Table 2 also lists the performance of previous work. It is worth\nnoting that these results are not directly comparable, as discussed in [44], due to different settings\n(e.g., the test splits) between each of them. We follow [37, 38] mostly, while our generative model is\nmuch simpler than these work with specialized, multi-stage architectures. The proposed algorithm\nlearns constraints with moderate approximations. Figure 2 validates that the training is stable and\nconverges smoothly as the base models.\n\n5.2 Template Guided Sentence Generation\n\nThe task is to generate a text sentence x that follows a given template t (Figure 1, right). Each missing\npart in the template can contain arbitrary number of words. This differs from previous sentence\ncompletion tasks [9, 57] which designate each masked position to have a single word. Thus directly\napplying these approaches to the task can be problematic.\nSetup. We use an attentional sequence-to-sequence (seq2seq) [3] model p\u03b8(x|t) as the base\ngenerative model for the task. Paired (template, sentence) data is obtained by randomly masking out\ndifferent parts of sentences from the IMDB corpus [8]. The base model is trained in an end-to-end\nsupervised manner, which allows it to memorize the words in the input template and repeat them\nalmost precisely in the generation. However, the main challenge is to generate meaningful and\ncoherent content to \ufb01ll in the missing parts.\nKnowledge constraint. To tackle the issue, we add a constraint that enforces matching between\nthe generated sentence and the ground-truth text in the missing parts. Speci\ufb01cally, let t\u2212 be the\nmasked-out true text. That is, plugging t\u2212 into the template t recovers the true complete sentence.\nThe constraint is de\ufb01ned as f\u03c6(x, t\u2212) which returns a high score if the sentence x matches t\u2212 well.\nThe actual implementation of the matching strategy can vary. Here we simply specify f\u03c6 as another\nseq2seq network that takes as input a sentence x and evaluates the likelihood of recovering t\u2212\u2014This\n\n8\n\n\fis all we have to specify, while the unknown parameters \u03c6 are learned jointly with the generative\nmodel. Despite the simplicity, the empirical results show the usefulness of the constraint.\nResults. Table 3 shows the results. Row 2 is the base model with an additional binary discriminator\nthat adversarial distinguishes between the generated sentence and the ground truth (i.e., a GAN\nmodel). Row 3 is the base model with the constraint learned in the direct way through Eq.(5). We see\nthat the improper learning method for the constraint harms the model performance, partially because\nof the relatively low-quality model samples the constraint is trained to \ufb01t. In contrast, the proposed\nalgorithm effectively improves the model results. Its superiority over the binary discriminator (Row 2)\nshows the usefulness of incorporating problem structures. Table 4 demonstrates samples by the base\nand constrained models. Without the explicit constraint forcing in-\ufb01lling content matching, the base\nmodel tends to generate less meaningful content (e.g., duplications, short and general expressions).\n\n6 Discussions: Combining Structured Knowledge with Black-box NNs\n\nWe revealed the connections between posterior regularization and reinforcement learning, which\nmotivates to learn the knowledge constraints in PR as reward learning in RL. The resulting algorithm\nis generally applicable to any deep generative models, and \ufb02exible to learn the constraints and model\njointly. Experiments on image and text generation showed the effectiveness of the algorithm.\nThe proposed algorithm, along with the previous work (e.g., [21, 22, 18, 36, 23]), represents a general\nmeans of adding (structured) knowledge to black-box neural networks by devising knowledge-inspired\nlosses/constraints that drive the model to learn the desired structures. This differs from the other\npopular way that embeds domain knowledge into speci\ufb01cally-designed neural architectures (e.g.,\nthe knowledge of translation-invariance in image classi\ufb01cation is hard-coded in the conv-pooling\narchitecture of ConvNet). While the specialized neural architectures can usually be very effective\nto capture the designated knowledge, incorporating knowledge via specialized losses enjoys the\nadvantage of generality and \ufb02exibility:\n\ne.g., ConvNets, RNNs, and other specialized ones [21].\n\n\u2022 Model-agnostic. The learning framework is applicable to neural models with any architectures,\n\u2022 Richer supervisions. Compared to the conventional end-to-end maximum likelihood learning\nthat usually requires fully-annotated or paired data, the knowledge-aware losses provide ad-\nditional supervisions based on, e.g., structured rules [21], other models [18, 22, 53, 20], and\ndatasets for other related tasks (e.g., the human image generation method in Figure 1, and [23]).\nIn particular, [23] leverages datasets of sentence sentiment and phrase tense to learn to control\nthe both attributes (sentiment and tense) when generating sentences.\n\u2022 Modularized design and learning. With the rich sources of supervisions, design and learning\nof the model can still be simple and ef\ufb01cient, because each of the supervision sources can be\nformulated independently to each other and each forms a separate loss term. For example, [23]\nseparately learns two classi\ufb01ers, one for sentiment and the other for tense, on two separate\ndatasets, respectively. The two classi\ufb01ers carry respective semantic knowledge, and are then\njointly applied to a text generation model for attribute control. In comparison, mixing and\nhard-coding multiple knowledge in a single neural architecture can be dif\ufb01cult and quickly\nbecoming impossible when the number of knowledge increases.\n\u2022 Generation with discrimination knowledge. In generation tasks, it can sometimes be dif\ufb01cult\nto incorporate knowledge directly in the generative process (or model architecture), i.e., de\ufb01ning\nhow to generate. In contrast, it is often easier to instead specify a evaluation metric that measures\nthe quality of a given sample in terms of the knowledge, i.e., de\ufb01ning what desired generation is.\nFor example, in the human image generation task (Figure 1), evaluating the structured human\npart consistency could be easier than designing a generator architecture that hard-codes the\nstructured generation process for the human parts.\n\nIt is worth noting that the two paradigms are not mutually exclusive. A model with knowledge-inspired\nspecialized architecture can still be learned by optimizing knowledge-inspired losses. Different types\nof knowledge can be best \ufb01t for either architecture hard-coding or loss optimization. It would be\ninteresting to explore the combination of both in the above tasks and others.\n\n9\n\n\fAcknowledgment This material is based upon work supported by the National Science Foundation\ngrant IIS1563887. Any opinions, \ufb01ndings and conclusions or recommendations expressed in this\nmaterial are those of the author(s) and do not necessarily re\ufb02ect the views of the National Science\nFoundation.\n\nReferences\n[1] A. Abdolmaleki, J. T. Springenberg, Y. Tassa, R. Munos, N. Heess, and M. Riedmiller. Maximum a\n\nposteriori policy optimisation. In ICLR, 2018.\n\n[2] J. Andreas, M. Rohrbach, T. Darrell, and D. Klein. Learning to compose neural networks for question\n\nanswering. arXiv preprint arXiv:1601.01705, 2016.\n\n[3] D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate.\n\narXiv preprint arXiv:1409.0473, 2014.\n\n[4] K. Bellare, G. Druck, and A. McCallum. Alternating projections for learning with expectation constraints.\n\nIn UAI, pages 43\u201350. AUAI Press, 2009.\n\n[5] X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel. InfoGAN: Interpretable\n\nrepresentation learning by information maximizing generative adversarial nets. In NeurIPS, 2016.\n\n[6] P. Dayan and G. E. Hinton. Using expectation-maximization for reinforcement learning. Neural Computa-\n\ntion, 9(2):271\u2013278, 1997.\n\n[7] M. P. Deisenroth, G. Neumann, J. Peters, et al. A survey on policy search for robotics. Foundations and\n\nTrends R(cid:13) in Robotics, 2(1\u20132):1\u2013142, 2013.\n\n[8] Q. Diao, M. Qiu, C.-Y. Wu, A. J. Smola, J. Jiang, and C. Wang. Jointly modeling aspects, ratings and\n\nsentiments for movie recommendation (JMARS). In KDD, pages 193\u2013202. ACM, 2014.\n\n[9] W. Fedus, I. Goodfellow, and A. M. Dai. MaskGAN: Better text generation via \ufb01lling in the _. arXiv\n\npreprint arXiv:1801.07736, 2018.\n\n[10] C. Finn, P. Christiano, P. Abbeel, and S. Levine. A connection between generative adversarial networks,\n\ninverse reinforcement learning, and energy-based models. arXiv preprint arXiv:1611.03852, 2016.\n\n[11] C. Finn, S. Levine, and P. Abbeel. Guided cost learning: Deep inverse optimal control via policy\n\noptimization. In ICML, pages 49\u201358, 2016.\n\n[12] J. Fu, K. Luo, and S. Levine. Learning robust rewards with adversarial inverse reinforcement learning.\n\narXiv preprint arXiv:1710.11248, 2017.\n\n[13] K. Ganchev, J. Gillenwater, B. Taskar, et al. Posterior regularization for structured latent variable models.\n\nJMLR, 11(Jul):2001\u20132049, 2010.\n\n[14] K. Gong, X. Liang, X. Shen, and L. Lin. Look into person: Self-supervised structure-sensitive learning\n\nand a new benchmark for human parsing. In CVPR, pages 6757\u20136765, 2017.\n\n[15] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio.\n\nGenerative adversarial nets. In NeurIPS, pages 2672\u20132680, 2014.\n\n[16] I. Goodfellow, Y. Bengio, and A. Courville. Deep Learning. MIT Press, 2016.\n\ndeeplearningbook.org.\n\nhttp://www.\n\n[17] T. Haarnoja, H. Tang, P. Abbeel, and S. Levine. Reinforcement learning with deep energy-based policies.\n\narXiv preprint arXiv:1702.08165, 2017.\n\n[18] G. Hinton, O. Vinyals, and J. Dean. Distilling the knowledge in a neural network. arXiv preprint\n\narXiv:1503.02531, 2015.\n\n[19] G. E. Hinton, P. Dayan, B. J. Frey, and R. M. Neal. The \u201cwake-sleep\u201d algorithm for unsupervised neural\n\nnetworks. Science, 268(5214):1158, 1995.\n\n[20] A. Holtzman, J. Buys, M. Forbes, A. Bosselut, D. Golub, and Y. Choi. Learning to write with cooperative\n\ndiscriminators. In ACL, 2018.\n\n[21] Z. Hu, X. Ma, Z. Liu, E. Hovy, and E. Xing. Harnessing deep neural networks with logic rules. In ACL,\n\n2016.\n\n10\n\n\f[22] Z. Hu, Z. Yang, R. Salakhutdinov, and E. P. Xing. Deep neural networks with massive learned knowledge.\n\nIn EMNLP, 2016.\n\n[23] Z. Hu, Z. Yang, X. Liang, R. Salakhutdinov, and E. P. Xing. Toward controlled generation of text. In\n\nICML, 2017.\n\n[24] Z. Hu, H. Shi, Z. Yang, B. Tan, T. Zhao, J. He, W. Wang, L. Qin, D. Wang, et al. Texar: A modularized,\n\nversatile, and extensible toolkit for text generation. arXiv preprint arXiv:1809.00794, 2018.\n\n[25] Z. Hu, Z. Yang, R. Salakhutdinov, and E. P. Xing. On unifying deep generative models. In ICLR, 2018.\n\n[26] T. Kim and Y. Bengio. Deep directed generative models with energy-based probability estimation. arXiv\n\npreprint arXiv:1606.03439, 2016.\n\n[27] D. P. Kingma and M. Welling. Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114, 2013.\n\n[28] M. J. Kusner, B. Paige, and J. M. Hern\u00e1ndez-Lobato. Grammar variational autoencoder. arXiv preprint\n\narXiv:1703.01925, 2017.\n\n[29] H. Larochelle and I. Murray. The neural autoregressive distribution estimator. In AISTATS, 2011.\n\n[30] S. Levine. Reinforcement learning and control as probabilistic inference: Tutorial and review. arXiv\n\npreprint arXiv:1805.00909, 2018.\n\n[31] C. Li, J. Zhu, T. Shi, and B. Zhang. Max-margin deep generative models. In NeurIPS, pages 1837\u20131845,\n\n2015.\n\n[32] P. Liang, M. I. Jordan, and D. Klein. Learning from measurements in exponential families. In ICML, pages\n\n641\u2013648. ACM, 2009.\n\n[33] X. Liang, Z. Hu, H. Zhang, C. Gan, and E. P. Xing. Recurrent topic-transition GAN for visual paragraph\n\ngeneration. In ICCV, 2017.\n\n[34] X. Liang, Z. Hu, and E. Xing. Symbolic graph reasoning meets convolutions. In NeurIPS, 2018.\n\n[35] Z. Liu, P. Luo, S. Qiu, X. Wang, and X. Tang. Deepfashion: Powering robust clothes recognition and\n\nretrieval with rich annotations. In CVPR, pages 1096\u20131104, 2016.\n\n[36] D. Lopez-Paz, L. Bottou, B. Sch\u00f6lkopf, and V. Vapnik. Unifying distillation and privileged information.\n\narXiv preprint arXiv:1511.03643, 2015.\n\n[37] L. Ma, X. Jia, Q. Sun, B. Schiele, T. Tuytelaars, and L. Van Gool. Pose guided person image generation.\n\nIn NeurIPS, pages 405\u2013415, 2017.\n\n[38] L. Ma, Q. Sun, S. Georgoulis, L. Van Gool, B. Schiele, and M. Fritz. Disentangled person image generation.\n\nIn CVPR, 2018.\n\n[39] S. Mei, J. Zhu, and J. Zhu. Robust regBayes: Selectively incorporating \ufb01rst-order logic domain knowledge\n\ninto bayesian models. In ICML, pages 253\u2013261, 2014.\n\n[40] S. Mohamed and B. Lakshminarayanan. Learning in implicit generative models. arXiv preprint\n\narXiv:1610.03483, 2016.\n\n[41] G. Neumann et al. Variational inference for policy search in changing situations. In ICML, pages 817\u2013824,\n\n2011.\n\n[42] A. v. d. Oord, N. Kalchbrenner, and K. Kavukcuoglu. Pixel recurrent neural networks. arXiv preprint\n\narXiv:1601.06759, 2016.\n\n[43] J. Peters, K. M\u00fclling, and Y. Altun. Relative entropy policy search. In AAAI, pages 1607\u20131612. Atlanta,\n\n2010.\n\n[44] A. Pumarola, A. Agudo, A. Sanfeliu, and F. Moreno-Noguer. Unsupervised person image synthesis in\n\narbitrary poses. In CVPR, 2018.\n\n[45] J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz. Trust region policy optimization. In ICML,\n\npages 1889\u20131897, 2015.\n\n[46] J. Schulman, X. Chen, and P. Abbeel. Equivalence between policy gradients and soft Q-learning. arXiv\n\npreprint arXiv:1704.06440, 2017.\n\n11\n\n\f[47] R. S. Sutton and A. G. Barto. Reinforcement learning: An introduction, volume 1. MIT press Cambridge,\n\n1998.\n\n[48] B. Tan, Z. Hu, Z. Yang, R. Salakhutdinov, and E. Xing. Connecting the dots between MLE and RL for text\n\ngeneration. 2018.\n\n[49] B. Taskar, C. Guestrin, and D. Koller. Max-margin Markov networks. In NeurIPS, pages 25\u201332, 2004.\n\n[50] D. Wang and Q. Liu. Learning to draw samples: With application to amortized MLE for generative\n\nadversarial learning. arXiv preprint arXiv:1611.01722, 2016.\n\n[51] T.-C. Wang, M.-Y. Liu, J.-Y. Zhu, A. Tao, J. Kautz, and B. Catanzaro. High-resolution image synthesis and\n\nsemantic manipulation with conditional GANs. arXiv preprint arXiv:1711.11585, 2017.\n\n[52] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: from error visibility\n\nto structural similarity. IEEE transactions on image processing, 13(4):600\u2013612, 2004.\n\n[53] Z. Yang, Z. Hu, C. Dyer, E. Xing, and T. Berg-Kirkpatrick. Unsupervised text style transfer using language\n\nmodels as discriminators. In NeurIPS, 2018.\n\n[54] S. Zhai, Y. Cheng, R. Feris, and Z. Zhang. Generative adversarial networks as variational training of energy\n\nbased models. arXiv preprint arXiv:1611.01799, 2016.\n\n[55] J. Zhao, M. Mathieu, and Y. LeCun. Energy-based generative adversarial network. arXiv preprint\n\narXiv:1609.03126, 2016.\n\n[56] B. D. Ziebart, A. L. Maas, J. A. Bagnell, and A. K. Dey. Maximum entropy inverse reinforcement learning.\n\nIn AAAI, volume 8, pages 1433\u20131438. Chicago, IL, USA, 2008.\n\n[57] G. Zweig and C. J. Burges. The Microsoft Research sentence completion challenge. Technical report,\n\nCiteseer, 2011.\n\n12\n\n\f", "award": [], "sourceid": 6726, "authors": [{"given_name": "Zhiting", "family_name": "Hu", "institution": "Carnegie Mellon University"}, {"given_name": "Zichao", "family_name": "Yang", "institution": null}, {"given_name": "Russ", "family_name": "Salakhutdinov", "institution": "Carnegie Mellon University"}, {"given_name": "LIANHUI", "family_name": "Qin", "institution": null}, {"given_name": "Xiaodan", "family_name": "Liang", "institution": "Sun Yat-sen University"}, {"given_name": "Haoye", "family_name": "Dong", "institution": "Sun Yat-sen University"}, {"given_name": "Eric", "family_name": "Xing", "institution": "Petuum Inc. /  Carnegie Mellon University"}]}