{"title": "Discrete Object Generation with Reversible Inductive Construction", "book": "Advances in Neural Information Processing Systems", "page_first": 10353, "page_last": 10363, "abstract": "The success of generative modeling in continuous domains has led to a surge of interest in generating discrete data such as molecules, source code, and graphs.\nHowever, construction histories for these discrete objects are typically not unique and so generative models must reason about intractably large spaces in order to learn.\nAdditionally, structured discrete domains are often characterized by strict constraints on what constitutes a valid object and generative models must respect these requirements in order to produce useful novel samples.\nHere, we present a generative model for discrete objects employing a Markov chain where transitions are restricted to a set of local operations that preserve validity. \nBuilding off of generative interpretations of denoising autoencoders, the Markov chain alternates between producing 1) a sequence of corrupted objects that are valid but not from the data distribution, and 2) a learned reconstruction distribution that attempts to fix the corruptions while also preserving validity.\nThis approach constrains the generative model to only produce valid objects, requires the learner to only discover local modifications to the objects, and avoids marginalization over an unknown and potentially large space of construction histories.\nWe evaluate the proposed approach on two highly structured discrete domains, molecules and Laman graphs, and find that it compares favorably to alternative methods at capturing distributional statistics for a host of semantically relevant metrics.", "full_text": "Discrete Object Generation\n\nwith Reversible Inductive Construction\n\nAri Seff\n\naseff@princeton.edu\n\nPrinceton University\n\nPrinceton, NJ\n\nWenda Zhou\n\nColumbia University\n\nNew York, NY\n\nwz2335@columbia.edu\n\nFarhan Damani\n\nPrinceton University\n\nPrinceton, NJ\n\nfdamani@princeton.edu\n\nAbigail Doyle\n\nPrinceton University\n\nPrinceton, NJ\n\nagdoyle@princeton.edu\n\nRyan P. Adams\n\nPrinceton University\n\nPrinceton, NJ\n\nrpa@princeton.edu\n\nAbstract\n\nThe success of generative modeling in continuous domains has led to a surge of\ninterest in generating discrete data such as molecules, source code, and graphs.\nHowever, construction histories for these discrete objects are typically not unique\nand so generative models must reason about intractably large spaces in order to\nlearn. Additionally, structured discrete domains are often characterized by strict\nconstraints on what constitutes a valid object and generative models must respect\nthese requirements in order to produce useful novel samples. Here, we present a\ngenerative model for discrete objects employing a Markov chain where transitions\nare restricted to a set of local operations that preserve validity. Building off of\ngenerative interpretations of denoising autoencoders, the Markov chain alternates\nbetween producing 1) a sequence of corrupted objects that are valid but not from\nthe data distribution, and 2) a learned reconstruction distribution that attempts to\n\ufb01x the corruptions while also preserving validity. This approach constrains the\ngenerative model to only produce valid objects, requires the learner to only discover\nlocal modi\ufb01cations to the objects, and avoids marginalization over an unknown\nand potentially large space of construction histories. We evaluate the proposed\napproach on two highly structured discrete domains, molecules and Laman graphs,\nand \ufb01nd that it compares favorably to alternative methods at capturing distributional\nstatistics for a host of semantically relevant metrics.\n\n1\n\nIntroduction\n\nMany applied domains of optimization and design would bene\ufb01t from accurate generative modeling\nof structured discrete objects. For example, a generative model of molecular structures may aid drug\nor material discovery by enabling an inexpensive search for stable molecules with desired properties.\nSimilarly, in computer-aided design (CAD), generative models may allow an engineer to sample\nnew parts or conditionally complete partially-speci\ufb01ed geometry. Indeed, recent work has aimed to\nextend the success of learned generative models in continuous domains, such as images and audio, to\ndiscrete data including graphs [38, 25], molecules [14, 21], and program source code [37, 30].\nHowever, discrete domains present particular challenges to generative modeling. Discrete data\nstructures often exhibit non-unique representations, e.g., up to n! equivalent adjacency matrix\nrepresentations for a graph with n nodes. Models that perform additive construction\u2014incrementally\nbuilding a graph from scratch [38, 25]\u2014are \ufb02exible but face the prospect of reasoning over an\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fInsert-Local Embedding\n\nC O\nC O\nC O\n\nLegal Operations\n\nInitial Message Passing\n\nH\n\nN\n\nO\n\nS\n\nN\n\nH\n\nN\n\nO\n\nS\n\nN\n\nC O\n\nLeaf-Local Embedding\n\nH\nH\nH\nH\n\nN\nN\nN\nN\n\nO\nO\nO\nO\n\nS\nS\nS\nS\n\nN\nN\nN\nN\n\n+\n+\n+\n\n+\n\n-\n-\n-\n\n-\n\nC O\nC\nN\n\nO\n\nC Br\n\nN\n\nO\n\nC\nN\nC\nN\nN\nC\nSTOP\nINSERT\nC Br\nC Br\nC Br\nC\nF\nC\nF\nC\nF\n\nN\n\nS\n\u2026\nN\nC\nN\nDELETE\n\nC\nO\n\nC Br\n\nC\n\nN\n\nS\nS\nS\n\nC\n\nS\n\nC\n\nFigure 1: Reconstruction model processing given an input molecule. Location-speci\ufb01c representations\ncomputed via message passing are passed through fully-connected layers outputting probabilities for\neach legal operation.\n\nO\n\nintractable number of possible construction paths. For example, You et al. [38] leverage a breadth-\n\ufb01rst-search (BFS) to reduce the number of possible construction sequences, while Simonovsky and\nKomodakis [34] avoid additive construction and instead directly decode an adjacency matrix from a\nlatent space, at the cost of requiring approximate graph matching to compute reconstruction error.\nIn addition, discrete domains are often accompanied by prespeci\ufb01ed hard constraints denoting what\nconstitutes a valid object. For example, molecular structures represented as SMILES strings [36]\nmust follow strict syntactic and semantic rules in order to be decoded to a real compound. Recent\nwork has aimed to improve the validity of generated samples by leveraging the SMILES grammar\n[21, 7] or encouraging validity via reinforcement learning [18]. Operating directly on chemical\ngraphs, Jin et al. [19] leverage chemical substructures encountered during training to build valid\nmolecular graphs and De Cao and Kipf [8] encourage validity for small molecules via adversarial\ntraining. In other graph-structured domains, strict topological constraints may be encountered. For\nexample, Laman graphs [23], a class of geometric constraint graphs, require the relative number of\nnodes and edges in each subgraph to meet certain conditions in order to represent well-constrained\ngeometry.\nIn this work we take the broad view that graphs provide a universal abstraction for reasoning about\nstructure and constraints on discrete spaces. This is not a new take on discrete spaces: graph-based\nrepresentations such as factor graphs [20], error-correcting codes [12], constraint graphs [28], and\nconditional random \ufb01elds [22] are all examples of ways that hard and soft constraints are regularly\nimposed on structured prediction tasks, satis\ufb01ability problems, and sets of random variables.\nWe propose to model discrete objects by constructing a Markov chain where each possible state\ncorresponds to a valid object. Learned transitions are restricted to a set of local inductive moves,\nde\ufb01ned as minimal insert and delete operations that maintain validity. Leveraging the generative\nmodel interpretation of denoising autoencoders [2], the chain employed here alternatingly samples\nfrom two conditional distributions: a \ufb01xed distribution over corrupting sequences and a learned\ndistribution over reconstruction sequences. The equilibrium distribution of the chain serves as the\ngenerative model, approximating the target data-generating distribution.\nThis simple framework allows the learned component\u2014the reconstruction model\u2014to be treated as\na standard supervised learning problem for multi-class classi\ufb01cation. Each reconstruction step is\nparameterized as a categorical distribution over adjacent objects, those that are one inductive move\naway from the input object. Given a local corrupter, the target reconstruction distribution is also local,\ncontaining fewer modes and potentially being easier to learn than the full data-generating distribution\n[2]. In addition, various hard constraints, such as validity rules or requiring the inclusion of a speci\ufb01c\nsubstructure, are incorporated naturally.\n\n2\n\nNOSNHNOSNH\fO\n\nO-\n\nNH\n\nN\n\nBr\n\nO\n\nNH\n\nO\n\nS\n\nN\n\nNH\n\nNH\n\nN\n\nO\nO\n\nO\n\nNH\nNH\n\nO-\n\nNH\nNH\n\nNH\n\nN\nN\n\nN\n\nCl\n\nS\nS\n\nBr\n\nSH\n\nS\n\nN\n\nNH\n\nN\n\nO\nNH\n\nCl\n\nNH\nO\n\nO\n\nO\n\nNH\n\nNH\n\nNH\n\nS\n\nBr\n\nN\n\nO-\n\nN\n\nN\n\nN\nH\n\nN\n\nO\n\nN\n\nO-\n\nN\nH\n\nN\nH\n\nN\n\nCorruption\n\nN\n\nNH\n\nO\n\nO\n\nNH\n\nO\n\nHN\n\nS\n\nO\n\nHN\n\nS\n\nO\n\nN\nH\n\nO\n\nN\nH\n\nO\n\nHN\n\nS\n\nCl\n\nS\n\nBr\n\nN\n\nNH\n\nNH\n\nO\n\nF\nCl\n\nS\nO-\n\nN\n\nN\nH\n\nSH\n\nO\n\nN\n\nCl\n\nN\n\nCl\n\nN\n\nO\n\nNH\n\nNH\n\nS\n\nO-\n\nN\n\nCl\n\nN\n\nO\n\nO\n\nO\n\nNH\n\nO-\n\nS\nHO\n\nO\n\nSH\n\nO\n\nO\n\nO\n\nO\n\nNH\n\nO\n\nN\n\nN\nH\n\nN\n\nO\nO\nO\n\nCl\n\nNH\n\nNH\nO\n\nO\n\nO\n\nO\n\nN\n\nOH\n\nS\n\nO\n\nO-\n\nO\n\nSH\nO-\n\nS\n\nN\n\nN\n\nO\nO\nO\n\nF\n\nO\n\nHN\n\nNH\n\nO\n\nN\nS\n\nS\nH\n\nO\n\nHN\n\nCl\n\nN\n\nNH\n\nN\nS\n\nS\nH\n\nO-\nS\n\nO\n\nN\nH\n\nN\n\nCl\n\nO\nNH\n\nS\n\nCl\n\nNH\n\nN\n\nN\nH\n\nO\n\nNH\nO\n\nHN\n\nF\n\nCl\n\nO\n\nN\nH\n\nN\n\nCl\n\nN\n\nNH\n\nNH\n\nReconstruction\n\nO-\n\nCl\n\nN\n\nO\n\nS\n\nSH\n\nN\n\nO\n\nN\n\nNH\n\nN\nH\n\nF\n\nNH\n\nO\n\nO\n\nHN\n\nN\n\nS\n\nO-\n\nCl\n\nO-\n\nS\n\nN\n\nO\nO\nO\n\nNH\nNH\nNH\n\nS\nS\nS\n\nNH\n\nN\n\nBr\n\nO\n\nN\n\nO\n\nNH\n\nO-\n\nNH\n\nN\n\nCl\n\nNH\n\nN\n\nO\n\nCl\n\nSH\n\nS\n\nN\n\nNH\n\nO\n\nNH\n\nS\n\nO\n\nNH\n\nS\n\nO-\n\nNH\n\nO\n\nNH\n\nS\n\nN\n\nCl\n\nFigure 2: Corruption and subsequent reconstruction of a molecular graph. Our method generates\ndiscrete objects by running a Markov chain that alternates between sampling from \ufb01xed corruption\nand learned reconstruction distributions that respect validity constraints.\n\nNH\nNH\nNH\n\nNH\nNH\nNH\n\nNH\nNH\nNH\n\nBr\nBr\nBr\n\nN\nN\nN\n\nN\nN\nN\n\nN\nN\nN\n\nNH\n\nNH\n\nNH\n\nO-\n\nCl\n\nCl\n\nCl\n\nCl\n\nCl\n\nO\n\nO\n\nN\n\nN\n\nN\n\nN\n\nN\n\nN\n\nS\n\nSH\n\nNH\n\nO-\n\nO\n\nHO\n\nO\n\nO\n\nO\n\nO\n\nS\n\nO\n\nOH\n\nS\n\nO-\n\nS\nSH\n\nSH\n\nNH\n\nNH\n\nNH\n\nS\n\nO-\n\nO-\n\nN\n\nN\n\nNH\n\nO\n\nNH\n\nF\n\nO\n\nF\n\nCl\n\nO\n\nHO\n\nO\n\nO\n\nO\nNH\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO-\n\nO\n\nOH\n\nS\n\nSH\n\nSH\n\nS\n\nN\n\nN\n\nNH\n\nNH\n\nO\n\nO\n\nF\n\nNH\n\nNH\n\nF\n\nO-\n\nO-\n\nO\n\nHO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nOH\n\nO\n\nN\n\nNH\n\nO\n\nF\n\nO\n\nF\n\nO\n\nO\n\nO\n\nO\n\nNH\n\nCl\n\nSH\n\nS\n\nN\n\nNH\n\nO\n\nN\n\nN\n\nN\n\nS\n\nS\n\nN\n\nN\n\nN\n\nN\n\nO\n\nO\n\nO\n\nO\n\nCl\n\nCl\n\nO-\n\nNH\n\nNH\n\nNH\n\nNH\n\nNH\n\nNH\n\nS\nS\nS\n\nN\nN\nN\n\nO\nO\nO\n\nO\nO\nO\n\nCl\nCl\nCl\n\nO-\nO-\nO-\n\nO-\nO-\nO-\n\nO\nO\n\nNH\nNH\nNH\n\nNH\nNH\nNH\n\nNH\nNH\nNH\n\nNH\nNH\nNH\n\nCorruption\n\nOne limitation of the proposed approach is its expensive sampling procedure, requiring Gibbs\nsampling at deployment time. Nevertheless, in many areas of engineering and design, it is the\ndownstream tasks following initial proposals that serve as the time bottleneck. For example, in drug\ndesign, wet lab experiments and controlled clinical trials are far more time intensive than empirically\nadequate mixing for the proposed method\u2019s Markov chain. In addition, as an implicit generative\nmodel, the proposed approach is not equipped to explicitly provide access to predictive probabilities.\nWe compare statistics for a host of semantically meaningful features from sets of generated samples\nwith the corresponding empirical distributions in order to evaluate the model\u2019s generative capabilities.\nWe test the proposed approach on two complex discrete domains: molecules and Laman graphs [23],\na class of geometric constraint graphs applied in CAD, robotics, and polymer physics. Quantitative\nevaluation indicates that the proposed method can effectively model highly structured discrete\ndistributions while adhering to strict validity constraints.\n\nReconstruction\n\nNH\nNH\n\nOH\nOH\nOH\n\nHO\nHO\nHO\n\nNH\nNH\nNH\n\nNH\nNH\nNH\n\nNH\nNH\nNH\n\nNH\nNH\nNH\n\nNH\nNH\nNH\n\nNH\nNH\nNH\n\nSH\nSH\nSH\n\nSH\nSH\nSH\n\nSH\nSH\nSH\n\nCl\nNH\n\nO-\nO-\nO-\n\nO-\nO-\nO-\n\nO-\nO-\nO-\n\nCl\nCl\nCl\n\nCl\nCl\nCl\n\nCl\nCl\nCl\n\nO\nNH\n\nO\nO\nO\n\nO\nO\nO\n\nO\nO\nO\n\nO\nO\nO\n\nO\nO\nO\n\nO\nO\nO\n\nO\nO\nO\n\nO\nO\nO\n\nO\nO\nO\n\nO\nO\nO\n\nO\nO\nO\n\nO\nO\nO\n\nO\nO\nO\n\nN\nN\nN\n\nN\nN\nN\n\nN\nN\nN\n\nN\nN\nN\n\nN\nN\nN\n\nN\nN\nN\n\nN\nN\nN\n\nO\nO\nO\n\nSH2\n\nSH2\n\nS\nS\nS\n\nS\nS\nS\n\nS\nS\nS\n\nS\nS\nS\n\nF\nF\nF\n\nF\nF\nF\n\nNH\nNH\n\nNH\n\nNH\n\nNH\n\nNH\n\nNH\n\nNH\n\nNH\n\nNH\n\nNH\n\nNH\n\nNH\n\nNH\n\nNH\n\nNH\n\nNH\n\nNH\n\nNH\n\nNH\n\nNH\n\nNH\n\nNH\n\nNH\n\nNH\n\nNH\n\nNH\n\nNH\n\nSH\n\nSH\n\nSH\n\nN\nN\n\nO-\n\nO-\n\nO-\n\nO-\n\nO-\n\nO-\n\nO-\n\nO-\n\nO-\n\nO-\n\nBr\n\nCl\n\nCl\n\nCl\n\nCl\n\nCl\n\nCl\n\nCl\n\nCl\n\nCl\n\nCl\n\nCl\n\nCl\n\nCl\n\nCl\n\nCl\n\nCl\n\nCl\n\nCl\n\nCl\n\nCl\n\nCl\n\nCl\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nN\n\nN\n\nO\n\nO\n\nO\n\nN\n\nN\n\nN\n\nN\n\n+\n\nS\n\nS\n\nS\n\nS\n\nS\n\nN\n\nN\n\nN\n\nN\n\nF\n\nN\n\nF\n\nN\n\nF\n\nN\n\nS\n\nN\n\nN\n\nN\n\nN\n\nS\n\nS\n\nS\n\nS\n\nS\n\nS\n\nF\n\n+\n\nCl\n\nF\n\n+\n\nSH2\nS\n\nN\n\nN\n\nN\n\nNH\n\nO-\n\nO\n\nHO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nOH\n\nO\n\nO-\n\nNH\n\nO\n\nNH\n\nS\n\nN\n\nCl\n\nO\n\nCl\n\nNH\n\nS\n\nO-\n\nNH\n\nO\n\nNH\n\nS\n\nN\n\nCl\n\nO\n\nCl\n\n+\n\nSH2\n\nNH\n\nN\n\nNH\n\nN\n\n+\n\nSH2\n\nO\n\nO\n\nBr\n\nBr\n\nNH\n\nNH\n\nN\n\nN\n\nO-\n\nO-\n\nN\n\nN\n\nNH\n\nNH\n\nO\n\nO\n\nNH\n\nNH\n\nS\n\nS\n\nO-\n\nO-\n\nCl\n\nCl\n\nN\n\nN\n\nO-\n\nO-\n\nN\n\nN\n\nF\n\nF\n\nNH\n\nNH\n\nNH\n\nNH\n\nO\n\nO\n\nS\n\nS\n\nO-\n\nO-\n\nCl\n\nCl\n\nNH\n\nNH\n\nN\n\nO-\n\nNH\n\nNH\n\nN\n\nN\n\nS\n\nSH\n\nNH\n\nCl\n\nNH\n\nN\n\nNH\nO\n\nN\n\nO-\n\n2 Reversible Inductive Construction\n\nO-\nO-\nO-\n\nCl\n\nCl\n\nN\n\nN\n\nN\n\nN\n\nS\nS\nS\n\nNH\nNH\nNH\n\nO\nO\nO\n\nN\nN\nN\nF\nF\nF\n\nNH\nNH\nNH\n\nS\n\nS\n\nSH\n\nSH\n\nN\n\nN\n\nNH\n\nNH\n\nO\n\nO\n\nF\n\nF\n\nNH\n\nNH\n\nO-\n\nO-\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nHO\n\nHO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nOH\n\nOH\n\nS\n\nS\n\nSH\n\nSH\n\nNH\n\nNH\n\nO-\n\nO-\n\nO-\nO-\nO-\n\nNH\nNH\nNH\n\nNH\n\nN\n\nS\nS\nS\n\nO\nO\nO\n\nNH\nNH\nNH\n\nNH\n\nNH\n\nN\n\nN\n\nO\nO\nO\n\nO\n\nCl\n\nO\n\nCl\n\nNH\nNH\nNH\n\nO-\nO-\nO-\n\nNH\n\nNH\n\nN\n\nF\n\nF\n\nF\n\nS\n\nS\n\nN\n\nN\n\nN\n\nN\n\nN\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nCl\n\nCl\n\nO-\n\nSH\n\nSH\n\nNH\n\nNH\n\nNH\n\nNH\n\nHO\n\nOH\n\nN\nN\nN\n\nCl\nCl\nCl\n\nCl\nCl\nCl\n\nCl\nCl\nCl\n\nLet p(x) be an unknown probability mass function over a discrete domain, D, from which we have\nobserved data. We assume there are constraints on what constitutes a valid object, where V \u2713 D\nis the subset of valid objects in D, and 8x /2 V, p(x) = 0. For example, in the case of molecular\ngraphs, an invalid object may violate atom-speci\ufb01c valence rules. Our goal is to learn a generative\nmodel p\u2713(x), approximating p(x), with support restricted to the valid subset.\nWe formulate our approach, generative reversible inductive construction (GenRIC)1, as the equi-\nlibrium distribution of a Markov chain that only visits valid objects, without a need for inef\ufb01cient\nrejection sampling. The chain\u2019s transitions are restricted to legal inductive moves. Here, an inductive\nmove is a local insert or delete operation that, when executed on a valid object, results in another\nvalid object. The Markov kernel then needs to be learned such that its equilibrium distribution\napproximates p(x) over the valid subspace.\n\n+\nSH2\n+\nSH2\n+\nSH2\n\nNH\nNH\nNH\n\nNH\nNH\nNH\n\nNH\nNH\nNH\n\nCl\nCl\nCl\n\nCl\nCl\nCl\n\nCl\nCl\nCl\n\nO\nO\nO\n\nO\nO\nO\n\nO\nO\nO\n\nN\nN\nN\n\nN\nN\nN\n\nN\nN\nN\n\nN\nN\nN\n\nNH\n\nNH\n\nNH\n\nNH\n\nNH\n\nNH\n\nNH\n\nNH\n\nNH\n\nNH\n\nNH\n\nNH\n\nNH\n\nO-\n\nO-\n\nO-\n\nO-\n\nO-\n\nO-\n\nCl\n\nCl\n\nCl\n\nCl\n\nCl\n\nCl\n\nCl\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nO\n\nN\n\nN\n\nN\n\nN\n\nN\n\nN\n\nN\n\nN\n\nS\n\nS\n\nS\n\nS\n\nF\n\nF\n\nN\n\nNH\n\nNH\n\nN\n\nN\n\nO\n\nO\n\nCl\n\nCl\n\nO\n\nO\n\nCl\n\nCl\n\nO\n\nO\n\nCl\n\nCl\n\nNH\n\n2.1 Learning the Markov kernel\n\nNH\n\nNH\n\nNH\n\nSH2\n\n+\nSH2\n\n+\n\nN\n\nN\n\nN\n\nN\n\n+\n\nN\n\nN\n\nN\n\nO\n\nO\n\nO\n\nCl\n\nCl\n\nCl\n\nNH\n\nNH\n\nNH\n\nSH2\n\nThe desired Markov kernel is formulated as successive sampling between two conditional distributions,\none \ufb01xed and one learned, a setup originally proposed to extract the generative model implicit\nin denoising autoencoders [2]. A single transition of the Markov chain involves \ufb01rst sampling\nfrom a \ufb01xed corrupting distribution c(\u02dcx | x) and then sampling from a learned reconstruction\ndistribution p\u2713(x | \u02dcx). While the corrupter is free to damage x, validity constraints are built into\nboth conditional distributions. The joint data-generating distribution over original and corrupted\nsamples is de\ufb01ned as p(x, \u02dcx) = c(\u02dcx | x)p(x), which is also uniquely de\ufb01ned by the corrupting\ndistribution and the target reconstruction distribution, p(x | \u02dcx). We use supervised learning to train a\nreconstruction distribution model p\u2713(x | \u02dcx) to approximate p(x | \u02dcx). Together, the corruption and\nlearned reconstruction distributions de\ufb01ne a Gibbs sampling procedure that asymptotically samples\nfrom marginal p\u2713(x), approximating the data marginal p(x).\nGiven a reasonable set of conditions on the support of these two conditionals and the consistency of\nthe employed learning algorithm, the learned joint distribution can be shown to be asymptotically\nconsistent over the Markov chain, converging to the true data-generating distribution in the limit of\nin\ufb01nite training data and modeling capacity [2]. However, in the more realistic case of estimation\n\n1https://github.com/PrincetonLIPS/reversible-inductive-construction\n\n3\n\n\fwith \ufb01nite training data and capacity, a valid concern arises regarding the effect of an imperfect\nreconstruction model on the chain\u2019s equilibrium distribution. To this end, Alain et al. [1] adapts\na result from perturbation theory [32] for \ufb01nite state Markov chains to show that as the learned\ntransition matrix becomes arbitrarily close to the target transition matrix, the equilibrium distribution\nalso becomes arbitrarily close to the target joint distribution. For the discrete domains of interest here,\nwe can enforce a \ufb01nite state space by simply setting a maximum object size.\n\n2.2 Sampling training sequences\nLet c(s| x) be a \ufb01xed conditional distribution over a sequence of corrupting opera-\ntions s = [s1, s2, ..., sk] where k is a random variable representing the total number of steps and\neach si 2 Ind(\u02dcxi) where Ind(\u02dcxi) is a set of legal inductive moves for a given \u02dcxi. The probability of\narriving at corrupted sample \u02dcx from x is\n\nc(\u02dcx | x) =Xs\n\nc(\u02dcx, s | x) = Xs2S(x,\u02dcx)\n\nc(s | x),\n\n(1)\n\nwhere S(x, \u02dcx) denotes the set of all corrupting sequences from x to \u02dcx. Thus, the joint data-generating\ndistribution is\n\np(x, s, \u02dcx) = c(\u02dcx, s | x)p(x)\n\n(2)\n\nwhere c(\u02dcx, s | x) = 0 if s /2 S(x, \u02dcx).\nGiven a corrupted sample, we aim to train a reconstruction distribution model p\u2713(x | \u02dcx) to maximize\nthe expected conditional probability of recovering the original, uncorrupted sample. Thus, we wish\nto \ufb01nd the parameters \u2713\u21e4 that minimize the expected KL divergence between the true p(x, s | \u02dcx) and\nlearned p\u2713(x, s | \u02dcx),\n(3)\n\n\u2713\u21e4 = argmin\n\nEp(x,s,\u02dcx) [DKL(p(s, x | \u02dcx) k p\u2713(s, x | \u02dcx))] ,\n\n\u2713\n\nwhich amounts to maximum likelihood estimation of p\u2713(s, x | \u02dcx) and likewise p\u2713(x | \u02dcx). The above\nis an expectation over the joint data-generating distribution, p(x, s, \u02dcx), which we can sample from by\ndrawing a data sample and then conditionally drawing a corruption sequence:\n\nx \u21e0 p(x),\n\n\u02dcx, s \u21e0 c(\u02dcx, s | x).\n\n(4)\n\n2.3 Fixed corrupter\n\nIn general, we are afforded \ufb02exibility when selecting a corruption distribution, given certain condi-\ntions for ergodicity are met. We implement a simple \ufb01xed distribution over corrupting sequences\napproximately following these steps: 1) Sample a number of moves k from a geometric distribution.\n2) For each move, sample a move type from {Insert, Delete}. 3) Sample from among the legal\noperations available for the given move type. We make minor adjustments to the weighting of\navailable operations for speci\ufb01c domains. See Appendix F for full details.\nThe geometric distribution over corruption sequence length ensures exponentially decreasing support\nwith edit distance, and likewise the support of the target reconstruction distribution is local to the\nconditioned corrupted object. The globally non-zero (yet exponentially decreasing) support of both\nthe corruption and reconstruction distributions trivially satisfy the conditions required in Corollary\nA2 from Alain et al. [1] for the chain de\ufb01ned by the corresponding Gibbs sampler to be ergodic.\nAlternatively, one could employ conditional distributions with truncated support after some edit\ndistance and still satisfy ergodicity conditions via the stronger Corollary A3 from Alain et al. [1].\nUnless otherwise stated, the results reported in Sections 3 and 4, use a geometric distribution with\n\ufb01ve expected steps for the corruption sequence length. In general, we observe shorter corruption\nlengths lead to better samples, though we did not seek to specially optimize this hyperparameter for\ngeneration quality. See Appendix A for some results with other step lengths.\n\n2.4 Reconstruction distribution\n\nA sequence of corrupting operations s = [s1, s2, ..., sk] corresponds to a sequence of visited corrupted\nobjects [\u02dcx1, \u02dcx2, ..., \u02dcxk] after execution on an initial sample x. We enforce the corrupter to be Markov\n\n4\n\n\fsuch that its distribution over the next corruption operation to perform depends only on the current\nobject. Likewise, the target reconstruction distribution is then also Markov, and we factorize the\nlearned reconstruction sequence model as the product of memoryless transitions culminating with a\nstop token:\n\np\u2713(srev | \u02dcx) = p\u2713(stop | x)p\u2713(x | \u02dcx1)\n\np\u2713(\u02dcxi | \u02dcxi+1)\n\n(5)\n\nk1Yi=1\n\nwhere srev = [skrev, sk1rev, ..., s1rev, stop], the reverse of the corrupting operation sequence. If a stop\ntoken is sampled from the model, reconstruction ceases and the next corruption sequence begins. For\nthe molecule model, an additional \u201crevisit\u201d stop criterion is also used: the reconstruction ceases when\na molecule is revisited (see Appendix D.1 for details).\nFor each individual step, the reconstruction model outputs a large conditional categorical distribution\nover Ind(\u02dcx), the set of legal modi\ufb01cation operations that can be performed on an input \u02dcx. We describe\nthe general architecture employed and include domain-speci\ufb01c details in Sections 3 and 4.\nAny operation in Ind(\u02dcx) may be de\ufb01ned in a general sense by a location on the object \u02dcx where\nthe operation is performed and a vocabulary element describing which vocabulary item (if any) is\ninvolved (Fig. 1). The prespeci\ufb01ed vocabulary consists of domain-speci\ufb01c substructures, a subset of\nwhich may be legally inserted or deleted from a given object. The model induces a distribution over\nall legal operations (which may be described as a subset of the Cartesian product of the locations\nand vocabulary elements) by computing location embeddings for an object and comparing those to\nlearned embeddings for each vocabulary element.\nFor the graph-structured domains explored here, location embeddings are generated using a message\npassing neural network structure similar to Duvenaud et al. [9], Gilmer et al. [13] (see Appendix C). In\nparallel, the set of vocabulary elements is also given a learned embedding vector. The unnormalized\nlog-probability for a given modi\ufb01cation is then obtained by computing the dot product of the\nembedding of the location where the modi\ufb01cation is performed and the embedding of the vocabulary\nelement involved. For most objects from the molecule and Laman graph domains, this de\ufb01nes a\ndistribution over a discrete set of operations with cardinality in the tens of thousands.\nWe note that although our model induces a distribution over a large discrete set, it does not do so\nthrough a traditional fully-connected softmax layer. Indeed, the action space of the model is heavily\nfactorized, ensuring that the computation is ef\ufb01cient. The factorization is present at two levels: the\nactions are separated into broad categories (e.g., insert at atom, insert at bond, delete, for molecules),\nthat do not interact except through the normalization. Additionally, actions are further factorized\nthrough a location component and vocabulary component, that only interact through a dot product,\nfurther simplifying the model.\n\n3 Application: Molecules\n\nMolecular structures can be de\ufb01ned by graphs where nodes represent individual atoms and edges\nrepresent bonds. In order for such graphs to be considered valid molecular structures by standard\nchemical informatics toolkits (e.g., RDKit [24]), certain constraints must be satis\ufb01ed. For example,\naromatic bonds can only exist within aromatic rings, and an atom can only engage in as many bonds\nas permitted by its valence. By restricting the corruption and reconstruction operations to a set of\nmodi\ufb01cations that respect these rules, we ensure that the resulting Markov chain will only visit valid\nmolecular graphs.\n\n3.1 Legal operations\nWhen altering one valid molecular graph into another, we restrict the set of possible modi\ufb01cations to\nthe insertion and deletion of valid substructures. The vocabulary of substructures consists of non-ring\nbonds, simple rings, and bridged compounds (simple rings with more than two shared atoms) present\nin training data. This is the same type of vocabulary proposed in Jin et al. [19]. The legal insertion\nand deletion operations are set as follows:\nInsertion For each atom and bond of a molecular graph, we determine the subset of the vocabulary\nthat would be chemically compatible for attachment. Then, for each compatible vocabulary sub-\nstructure, the possible assemblies of it with the atom or bond of interest are enumerated (keeping its\n\n5\n\n\falready-connected neighbors \ufb01xed). For example, when inserting a ring from the vocabulary via one\nof its bonds, there is often more than one valid bond to select from. Here, we only specify the 2D\ncon\ufb01guration of the molecular graph and do not account for stereochemistry.\nDeletion We de\ufb01ne the leaves of a molecule to be those substructures that can be removed while\nthe rest of the molecular graph remains connected. Here, the set of leaves consists of either non-ring\nbonds, rings, or bridged compounds whose neighbors have a non-zero atom intersection. The set of\npossible deletions is fully speci\ufb01ed by the set of leaf substructures. To perform a deletion, a leaf is\nselected and the atoms whose bonds are fully contained within the leaf node substructure are removed\nfrom the graph.\nThese two minimal operations provide enough support for the resulting Markov chain to be ergodic\nwithin the set of all valid molecular graphs constructible via the extracted vocabulary. As Jin et al.\n[19] \ufb01nd, although an arbitrary molecule may not be reachable, empirically the \ufb01nite vocabulary\nprovides broad coverage over organic molecules. Further details on the location and vocabulary\nrepresentations for each possible operation are given in the appendix.\n\n3.2 Data\n\nFor molecules we test the proposed approach on the ZINC dataset, which contains about 250K\ndrug-like molecules from the ZINC database [35]. The model is trained on 220K molecules according\nto the same train/test split as in Jin et al. [19], Kusner et al. [21].\n\n3.3 Distributional statistics\n\nWhile predictive probabilities are not available from the implicit generative model, we can perform\nposterior predictive checks on various semantically relevant metrics to compare our model\u2019s learned\ndistribution to the data distribution. Here, we leverage three commonly used quantities when assessing\ndrug molecules: the quantitative estimate of drug-likeness (QED) score (between 0 and 1) [4], the\nsynthetic accessibility (SA) score (between 1 and 10) [11], and the log octanol-water partition\ncoef\ufb01cient (logP) [6]. For QED, a higher value indicates a molecule is more likely to be drug-\nlike, while for SA, a lower value indicates a molecule is more likely to be easily synthesizable.\nlogP measures the hydrophobicity of a molecule, with a higher value indicating more hydrophobic.\nTogether these metrics take into account a wide array molecular features (ring count, charge, etc.),\nallowing for an aggregate comparison of distributional statistics.\nOur goal is not to optimize these statistics but to evaluate the quality of our generative model by\ncomparing the distribution that our model implies over these quantities to those in the original data. A\ngood generative model would have novel molecules but those molecules would have similar aggregate\nstatistics to real compounds. In Fig. 3, we display Gaussian kernel density estimates (KDE) of\nthe above metrics for generated sets of molecules from seven baseline methods, in addition to our\nown (see Appendix D for chain sampling details). A normalized histogram of the ZINC training\ndistribution is shown for visual comparison. For each method, we obtain 20K samples either by\nrunning pre-trained models [19, 14, 21], by accessing pre-sampled sets [26, 34, 25], or by training\nmodels from scratch [33]2. Only novel molecules (those not appearing in the ZINC training set) are\nincluded in the metric computation, to avoid rewarding memorization of the training data. In addition,\nTable 1 displays bootstrapped Kolmogorov\u2013Smirnov (KS) distances between the samples for each\nmethod and the ZINC training set.\nOur method is capable of generating novel molecules that have statistics closely matched to the\nempirical QED and logP distributions. The SA distribution seems to be more challenging, although\nwe still report lower mean KS distance than some recent methods. Because we allow the corrupter to\nuniformly select from the vocabulary, even if a particular vocabulary element occurs very rarely in\ntraining data, it can sometimes introduce molecules without an accessible synthetic route that the\nreconstructor does not immediately recover from. One could alter the corrupter and have it favor\ncommonly appearing vocabulary items to mitigate this. We also note that our approach lends itself to\nMarkov chain transitions re\ufb02ecting known (or learned) chemical reactions.\nInterestingly, the SMILES-based LSTM model [33] is effective at matching the ZINC dataset statistics,\nproducing a substantially better-matched SA distribution than the other methods. However, as noted in\n\n2We use the implementation provided by [5] for the SMILES LSTM [33].\n\n6\n\n\fFigure 3: Distributions of QED (left), SA (middle), and logP (right) for sampled molecules and\nZINC.\n\nSource\n\nChemVAE [14]\n\nGrammarVAE [21]\n\nGraphVAE [34]\nDeepGAR [25]\n\nSMILES LSTM [33]\n\nJT-VAE [19]\nCG-VAE [26]\n\nGenRIC\n\nQED KS\n1.00 (0.00)\n0.94 (0.00)\n0.52 (0.00)\n0.20 (0.00)\n0.022 (0.003)\n0.090 (0.003)\n0.27 (0.00)\n0.045 (0.003)\n\nSA KS\n\n1.00 (0.00)\n0.95 (0.00)\n0.23 (0.00)\n0.15 (0.00)\n0.051 (0.004)\n0.21 (0.00)\n0.56 (0.00)\n0.28 (0.00)\n\nlogP KS\n1.00 (0.00)\n0.95 (0.00)\n0.54 (0.00)\n0.062 (0.002)\n0.052 (0.004)\n0.20 (0.00)\n0.064 (0.002)\n0.057 (0.002)\n\n% valid\n\n0.7\n7.2\n13.5\n89.2\n96.1\n100\n100\n100\n\nTable 1: Molecular property distributional statistics. For each source, 20K molecules are sampled and\ncompared to the ZINC dataset. For SA, QED, and logP, we compute the two-sample Kolmogorov-\nSmirnov statistic (and its bootstrapped standard error) compared to the ZINC dataset. (Lower is better\nfor the KS statistic.) Self-reported validity percentages are also shown (the value for [14] is obtained\nfrom [21]).\n\n[26], by operating on the linear SMILES representation, the LSTM has limited ability to incorporate\nstructural constraints, e.g., enforcing the presence of a particular substructure.\nIn addition to the above metrics, we report a validity score (the percentage of samples that are\nchemically valid) for each method in Table 1. A sample is considered to be valid if it can be\nsuccessfully parsed by RDKit [24]. The validity scores displayed are the self-reported values from\neach method. Our method, like Jin et al. [19], Liu et al. [26], enforces valid molecular samples, and\nthe model does not have to learn these constraints. See Appendix G for additional evaluation using\nthe GuacaMol distribution-learning benchmarks [5].\nWe might also inquire how the reconstructed samples of the Markov chain compare to the corrupted\nsamples. See Fig. 6 in the supplementary material for a comparison. On average, we observe\ncorrupted samples that are less druglike and less synthesizable than their reconstructed counterparts.\nIn particular, the output reconstructed molecule has a 21% higher QED relative to the input corrupted\nmolecule on average. Running the corrupter repeatedly (with no reconstruction) leads to samples that\nseverely diverge from the data distribution.\n\n4 Application: Laman Graphs\n\nGeometric constraint graphs are widely employed in CAD, molecular modeling, and robotics. They\nconsist of nodes that represent geometric primitives (e.g., points, lines) and edges that represent\ngeometric constraints between primitives (e.g., specifying perpendicularity between two lines). To\nallow for easy editing and change propagation, best practices in parametric CAD encourage keeping a\npart well-constrained at all stages of design [3]. A useful generative model over CAD models should\nideally be restricted to sampling well-constrained geometry.\nLaman graphs describe two-dimensional geometry where the primitives have two degrees of freedom\nand the edges restrict one degree of freedom (e.g., a system of rods and joints) [23]. Representing\nminimally rigid systems, Laman graphs have the property that if any single edge is removed, the\n\n7\n\n\fSource\nE-R [10]\n\nGraphRNN [38]\n\nGenRIC\n\nDoD KS\n0.95 (0.03)\n0.96 (0.00)\n0.33 (0.01)\n\n% valid\n\n0.08 (0.02)\n0.15 (0.03)\n100 (0.00)\n\nTable 2: Laman graph distributional statistics.\nThe mean and, in parentheses, the standard\ndeviation, of the bootstrapped KS distance\nbetween the DoD distribution for each set\nof sampled graphs and the training data are\nshown.\nIn addition, we display mean and\nstandard deviations for bootstrapped validity\nscores.\n\nHenneberg type 1\n\nHenneberg type 2\n\nFigure 4: The legal inductive moves for\nLaman graphs, derived from Henneberg con-\nstruction [16].\n\nsystem becomes under-constrained. For a graph with n nodes to be a valid Laman graph, the following\ntwo simple conditions are necessary and suf\ufb01cient: 1) the graph must have exactly 2n  3 edges,\nand 2) each node-induced subgraph of k nodes can have no more than 2k  3 edges. Together, these\nconditions ensure that all structural degrees of freedom are removed (given that the constraints are all\nindependent), leaving one rotational and two translational degrees of freedom. In 3D, although the\ncorresponding Laman conditions are no longer suf\ufb01cient, they remain necessary for well-constrained\ngeometry.\n\n4.1 Legal operations\n\nHenneberg [16] describes two types of node-insertion operations, known as Henneberg moves, that\ncan be used to inductively construct any Laman graph (Fig. 4). We make these moves and their\ninverses (the delete versions) available to both the corrupter and reconstruction model. While moves\n#1 and #2 can always be reversed for any nodes of degree 2 and 3 respectively, a check has to be\nperformed to determine where the missing edge can be inserted for reverse move #2 [15]. Here, we\nuse the O(n2) Laman satisfaction check described in [17] to determine the set of legal neighbors. At\nthe rigidity transition, it runs in only O(n1.15).\n\n4.2 Data\n\nFor Laman graphs, we generate synthetic graphs randomly via Algorithm 7 from Moussaoui [29],\noriginally proposed for evaluating geometric constraint solvers embedded within CAD programs. This\nsynthetic generator allows us to approximately control a produced graph\u2019s degree of decomposability\n(DoD), a metric which indicates to what extent a Laman graph is composed of well-constrained sub-\ngraphs. Such subsystems are encountered in various applications, e.g., they correspond to individual\ncomponents in a CAD model or rigid substructures in a protein. The degree of decomposability is\nde\ufb01ned as DoD = g/n, where g is the number of well-constrained, node-induced subgraphs and n is\nthe total number of nodes. We generate 100K graphs each for a low and high decomposability setting\n(see Appendix E.1 for full details).\n\n4.3 Distributional statistics\n\nTable 2 displays statistics for Laman graphs generated by our model as well as by two baseline\nmethods all trained on the low decomposability dataset (we observe similar results in the high\ndecomposability setting). For each method, 20K graphs are sampled. The validity metric is de\ufb01ned\nthe same as for molecules (Section 3.3). In addition, bootstrapped KS distance between the sampled\ngraphs and training data for DoD distribution is shown for each method.\nWhile it is unsurprising that the simple Erd\u02ddos\u2013R\u00e9nyi model [10] fails to meet validity requirements\n(< 0.1% valid), we see that the recently proposed GraphRNN [38] fails to do much better. While deep\ngraph generative models have proven to be very effective at reproducing a host of graph statistics,\nLaman graphs represent a particularly strict topological constraint, imposing necessary conditions on\nevery subgraph. Today\u2019s \ufb02exible graph generative models, while effective at matching local statistics,\nare ill-equipped to handle this kind of global constraint. By leveraging domain-speci\ufb01c inductive\n\n8\n\n\fmoves, the proposed method does not have to learn what a valid Laman graph is, and instead learns\nto match the distributional DoD statistics within the set of valid graphs.\n\n5 Conclusion and Future Work\n\nIn this work we have proposed a new method for modeling distributions of discrete objects, which\nconsists of training a model to undo a series of local corrupting operations. The key to this method\nis to build both the corruption and reconstruction steps with support for reversible inductive moves\nthat preserve possibly-complicated validity constraints. Experimental evaluation demonstrates that\nthis simple approach can effectively capture relevant distributional statistics over complex and highly\nstructured domains, including molecules and Laman graphs, while always producing valid structures.\nOne weakness of this approach, however, is that the inductive moves must be identi\ufb01ed and speci\ufb01ed\nfor each new domain; one direction of future work is to learn these moves from data. In the case of\nmolecules, restricting the Markov chain\u2019s transitions to learned chemical reactions could improve\nthe synthesizability of generated samples. Future work can also explore enforcing additional hard\nconstraints besides structural validity. For example, if a particular core structure or scaffold with\nsome desired baseline functionality (e.g., benzodiazepines) should be included in a molecule, chain\ntransitions can be masked to respect this. Coupled with other techniques such as virtual screening,\nconditional generation may enable ef\ufb01cient searching of candidate drug compounds.\n\nAcknowledgements\n\nWe would like to thank Wengong Jin, Michael Galvin, Dieterich Lawson, and members of the\nPrinceton Laboratory for Intelligent Probabilistic Systems for valuable discussion and feedback. This\nwork was partially funded by the Alfred P. Sloan Foundation, NSF IIS-1421780, and the DataX\nProgram at Princeton University through support from the Schmidt Futures Foundation. AS was\nsupported by the Department of Defense through the National Defense Science and Engineering\nGraduate Fellowship (NDSEG) Program. We acknowledge computing resources from Columbia\nUniversity\u2019s Shared Research Computing Facility project, which is supported by NIH Research\nFacility Improvement Grant 1G20RR030893-01, and associated funds from the New York State\nEmpire State Development, Division of Science Technology and Innovation (NYSTAR) Contract\nC090171, both awarded April 15, 2010.\n\nReferences\n[1] Guillaume Alain, Yoshua Bengio, Li Yao, Jason Yosinski, Eric Thibodeau-Laufer, Saizheng\nZhang, and Pascal Vincent. GSNs: Generative stochastic networks. Information and Inference,\n2016.\n\n[2] Yoshua Bengio, Li Yao, Guillaume Alain, and Pascal Vincent. Generalized denoising auto-\nencoders as generative models. In Advances in Neural Information Processing Systems, 2013.\n\n[3] Bernhard Bettig and Christoph M Hoffmann. Geometric constraint solving in parametric\ncomputer-aided design. Journal of Computing and Information Science in Engineering, 11(2):\n021001, 2011.\n\n[4] G. Richard Bickerton, Gaia V. Paolini, J\u00e9r\u00e9my Besnard, Sorel Muresan, and Andrew L. Hopkins.\n\nQuantifying the chemical beauty of drugs. Nature Chemistry, 4, 2012.\n\n[5] Nathan Brown, Marco Fiscato, Marwin H.S. Segler, and Alain C. Vaucher. Guacamol: Bench-\nmarking models for de novo molecular design. Journal of Chemical Information and Modeling,\n59(3):1096\u20131108, 2019.\n\n[6] John Comer and Kin Tam. Lipophilicity Pro\ufb01les: Theory and Measurement, pages 275\u2013304.\n\nJohn Wiley & Sons, Ltd, 2007.\n\n[7] Hanjun Dai, Yingtao Tian, Bo Dai, Steven Skiena, and Le Song. Syntax-directed variational\nautoencoder for structured data. In International Conference on Learning Representations,\n2018.\n\n9\n\n\f[8] N. De Cao and T. Kipf. MolGAN: An implicit generative model for small molecular graphs.\nIn ICML Workshop on Theoretical Foundations and Applications of Deep Generative Models,\n2018.\n\n[9] David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel,\nAlan Aspuru-Guzik, and Ryan P Adams. Convolutional networks on graphs for learning\nmolecular \ufb01ngerprints. In Advances in Neural Information Processing Systems, 2015.\n\n[10] P. Erd\u00f6s and A. R\u00e9nyi. On random graphs i. Publicationes Mathematicae Debrecen, 6:290,\n\n1959.\n\n[11] Peter Ertl and Ansgar Schuffenhauer. Estimation of synthetic accessibility score of drug-like\nmolecules based on molecular complexity and fragment contributions. Journal of Cheminfor-\nmatics, 1:8, 2009.\n\n[12] Robert Gallager. Low-density parity-check codes. IRE Transactions on Information Theory, 8\n\n(1):21\u201328, 1962.\n\n[13] Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl.\nIn International Conference on Machine\n\nNeural message passing for quantum chemistry.\nLearning, 2017.\n\n[14] Rafael G\u00f3mez-Bombarelli, David K. Duvenaud, Jos\u00e9 Miguel Hern\u00e1ndez-Lobato, Jorge Aguilera-\nIparraguirre, Timothy D. Hirzel, Ryan P. Adams, and Al\u00e1n Aspuru-Guzik. Automatic chemical\ndesign using a data-driven continuous representation of molecules. In ACS Central Science,\n2018.\n\n[15] Ruth Haas, David Orden, G\u00fcnter Rote, Francisco Santos, Brigitte Servatius, Herman Servatius,\nDiane Souvaine, Ileana Streinu, and Walter Whiteley. Planar minimally rigid graphs and pseudo-\ntriangulations. Computational Geometry, 31(1):31 \u2013 61, 2005. Special Issue on the 19th Annual\nSymposium on Computational Geometry - SoCG 2003.\n\n[16] L. Henneberg. Die graphische statik der starren systeme. Johnson Reprint 1968, 1911.\n\n[17] Donald J. Jacobs and Bruce Hendrickson. An algorithm for two-dimensional rigidity percolation:\n\nThe pebble game. Journal of Computational Physics, 137(2):346 \u2013 365, 1997.\n\n[18] Dave Janz, Jos van der Westhuizen, Brooks Paige, Matt Kusner, and Jos\u00e9 Miguel Hern\u00e1ndez-\nLobato. Learning a generative model for validity in complex discrete structures. In International\nConference on Learning Representations, 2018.\n\n[19] Wengong Jin, Regina Barzilay, and Tommi S. Jaakkola. Junction tree variational autoencoder\n\nfor molecular graph generation. International Conference on Machine Learning, 2018.\n\n[20] Frank R Kschischang, Brendan J Frey, Hans-Andrea Loeliger, et al. Factor graphs and the\n\nsum-product algorithm. IEEE Transactions on Information Theory, 47(2):498\u2013519, 2001.\n\n[21] Matt J. Kusner, Brooks Paige, and Jos\u00e9 Miguel Hern\u00e1ndez-Lobato. Grammar variational\n\nautoencoder. In International Conference on Machine Learning, 2017.\n\n[22] John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. Conditional random \ufb01elds:\nProbabilistic models for segmenting and labeling sequence data. In International Conference\non Machine Learning, pages 282\u2013289, 2001.\n\n[23] G. Laman. On graphs and rigidity of plane skeletal structures. Journal of Engineering Mathe-\n\nmatics, 4(4):331\u2013340, Oct 1970.\n\n[24] Greg Landrum et al. RDKIT: Open-source cheminformatics, 2006.\n\n[25] Yujia Li, Oriol Vinyals, Chris Dyer, Razvan Pascanu, and Peter Battaglia. Learning deep\n\ngenerative models of graphs. In International Conference on Machine Learning, 2018.\n\n10\n\n\f[26] Qi Liu, Miltiadis Allamanis, Marc Brockschmidt, and Alexander Gaunt. Constrained graph\nvariational autoencoders for molecule design. In S. Bengio, H. Wallach, H. Larochelle, K. Grau-\nman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing\nSystems, pages 7795\u20137804. 2018.\n\n[27] David Mendez, Anna Gaulton, A Patr\u00edcia Bento, Jon Chambers, Marleen De Veij, Eloy F\u00e9lix,\nMar\u00eda Paula Magari\u00f1os, Juan F Mosquera, Prudence Mutowo, Micha\u0142 Nowotka, Mar\u00eda Gordillo-\nMara\u00f1\u00f3n, Fiona Hunter, Laura Junco, Grace Mugumbate, Milagros Rodriguez-Lopez, Francis\nAtkinson, Nicolas Bosc, Chris J Radoux, Aldo Segura-Cabrera, Anne Hersey, and Andrew R\nLeach. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Research, 47(D1):\nD930\u2013D940, 11 2018.\n\n[28] Ugo Montanari. Networks of constraints: Fundamental properties and applications to picture\n\nprocessing. Information Sciences, 7:95\u2013132, 1974.\n\n[29] Adel Moussaoui. Geometric Constraint Solver. (Solveur de syst\u00e8mes de contraintes\ng\u00e9om\u00e9triques). PhD thesis, Ecole nationale Sup\u00e9rieure d\u2019Informatique, Algiers, Algeria, 2016.\n[30] Vijayaraghavan Murali, Letao Qi, Swarat Chaudhuri, and Chris Jermaine. Neural sketch learning\nfor conditional program generation. In International Conference on Learning Representations,\n2018.\n\n[31] Kristina Preuer, Philipp Renz, Thomas Unterthiner, Sepp Hochreiter, and G\u00fcnter Klambauer.\nFr\u00e9chet chemnet distance: A metric for generative models for molecules in drug discovery.\nJournal of Chemical Information and Modeling, 58(9):1736\u20131741, 2018.\n\n[32] Paul J. Schweitzer. Perturbation theory and \ufb01nite Markov chains. Journal of Applied Probability,\n\n5(2):401\u2013413, 1968.\n\n[33] Marwin H. S. Segler, Thierry Kogej, Christian Tyrchan, and Mark P. Waller. Generating focused\nmolecule libraries for drug discovery with recurrent neural networks. ACS Central Science, 4\n(1):120\u2013131, 2018.\n\n[34] Martin Simonovsky and Nikos Komodakis. GraphVAE: Towards generation of small graphs\n\nusing variational autoencoders. In ICAN, 2018.\n\n[35] Teague Sterling and John J. Irwin. ZINC 15 \u2013 ligand discovery for everyone. Journal of\n\nChemical Information and Modeling, 55(11):2324\u20132337, 2015.\n\n[36] David Weininger. SMILES, a chemical language and information system. 1. introduction to\nmethodology and encoding rules. Journal of Chemical Information and Computer Sciences, 28\n(1):31\u201336, 1988.\n\n[37] Pengcheng Yin and Graham Neubig. A syntactic neural model for general purpose code\ngeneration. In Proceedings of the 55th Annual Meeting of the Association for Computational\nLinguistics. Association for Computational Linguistics, 2017.\n\n[38] Jiaxuan You, Rex Ying, Xiang Ren, William Hamilton, and Jure Leskovec. GraphRNN:\nGenerating realistic graphs with deep auto-regressive models. In International Conference on\nMachine Learning, 2018.\n\n11\n\n\f", "award": [], "sourceid": 5452, "authors": [{"given_name": "Ari", "family_name": "Seff", "institution": "Princeton University"}, {"given_name": "Wenda", "family_name": "Zhou", "institution": "Columbia University"}, {"given_name": "Farhan", "family_name": "Damani", "institution": "Princeton University"}, {"given_name": "Abigail", "family_name": "Doyle", "institution": "Princeton University"}, {"given_name": "Ryan", "family_name": "Adams", "institution": "Princeton University"}]}