{"title": "Neural Guided Constraint Logic Programming for Program Synthesis", "book": "Advances in Neural Information Processing Systems", "page_first": 1737, "page_last": 1746, "abstract": "Synthesizing programs using example input/outputs is a classic problem in artificial intelligence. We present a method for solving Programming By Example (PBE) problems by using a neural model to guide the search of a constraint logic programming system called miniKanren. Crucially, the neural model uses miniKanren's internal representation as input; miniKanren represents a PBE problem as recursive constraints imposed by the provided examples. We explore Recurrent Neural Network and Graph Neural Network models. We contribute a modified miniKanren, drivable by an external agent, available at https://github.com/xuexue/neuralkanren. We show that our neural-guided approach using constraints can synthesize programs faster in many cases, and importantly, can generalize to larger problems.", "full_text": "Neural Guided Constraint Logic Programming for\n\nProgram Synthesis\n\nLisa Zhang1,2, Gregory Rosenblatt4, Ethan Fetaya1,2, Renjie Liao1,2,3, William E. Byrd4,\n\nMatthew Might4, Raquel Urtasun1,2,3, Richard Zemel1,2\n\n1University of Toronto, 2Vector Institute, 3Uber ATG, 4University of Alabama at Birmingham\n\n1{lczhang,ethanf,rjliao,urtasun,zemel}@cs.toronto.edu\n\n4{gregr,webyrd,might}@uab.edu\n\nAbstract\n\nSynthesizing programs using example input/outputs is a classic problem in arti\ufb01cial\nintelligence. We present a method for solving Programming By Example (PBE)\nproblems by using a neural model to guide the search of a constraint logic program-\nming system called miniKanren. Crucially, the neural model uses miniKanren\u2019s\ninternal representation as input; miniKanren represents a PBE problem as recursive\nconstraints imposed by the provided examples. We explore Recurrent Neural Net-\nwork and Graph Neural Network models. We contribute a modi\ufb01ed miniKanren,\ndrivable by an external agent, available at https://github.com/xuexue/neuralkanren.\nWe show that our neural-guided approach using constraints can synthesize pro-\ngrams faster in many cases, and importantly, can generalize to larger problems.\n\n1\n\nIntroduction\n\nProgram synthesis is a classic area of arti\ufb01cial intelligence that has captured the imagination of many\ncomputer scientists. Programming by Example (PBE) is one way to formulate program synthesis\nproblems, where example input/output pairs specify a target program. In a sense, supervised learning\ncan be considered program synthesis, but supervised learning via successful models like deep neural\nnetworks famously lacks interpretability. The clear interpretability of programs as code means that\nsynthesized results can be compared, optimized, translated, and proved correct. The manipulability\nof code makes program synthesis continue to be relevant today.\nCurrent state-of-the-art approaches use symbolic techniques developed by the programming languages\ncommunity. These methods use rule-based, exhaustive search, often manually optimized by human\nexperts. While these techniques excel for small problems, they tend not to scale. Recent works by the\nmachine learning community explore a variety of statistical methods to solve PBE problems more\nquickly. Works generally fall under three categories: differentiable programming [1, 2, 3], direct\nsynthesis [4, 5], and neural guided search [6, 7].\nThis work falls under neural guided search,\nwhere the machine learning model guides a sym-\nbolic search. We take integrating with a sym-\nbolic system further: we use its internal rep-\nresentation as input to the neural model. The\nsymbolic system we use is a constraint logic pro-\ngramming system called miniKanren1[8], cho-\nsen for its ability to encode synthesis problems\nthat are dif\ufb01cult to express in other systems.\n\nFigure 1: Neural Guided Synthesis Approach\n\nminiKanren ML Agent\n\nexamples\ninput\n\noutput\nprogram\n\nexpands candidate\n\nchooses candidate\n\n1The name \u201cKanren\u201d comes from the Japanese word for \u201crelation\u201d.\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fSpeci\ufb01cally, miniKanren does not rely on types, is able to to complete partially speci\ufb01ed pro-\ngrams, and has a straightforward implementation [9]. miniKanren searches for a candidate program\nthat satis\ufb01es the recursive constraints imposed by the input/output examples. Our model uses these\nconstraints to score candidate programs and guide miniKanren\u2019s search.\nNeural guided search using constraints is promising for several reasons. First, while symbolic\napproaches outperform statistical methods, they have not demonstrated an ability to scale to larger\nproblems; neural guidance may help navigate the exponentially growing search space. Second,\nsymbolic systems exploit the compositionality of synthesis problems: miniKanren\u2019s constraints select\nportions of the input/output examples relevant to a subproblem, akin to having a symbolic attention\nmechanism. Third, constraint lengths are relatively stable even as we synthesize more complex\nprograms; our approach should be able to generalize to programs larger than those seen in training.\nTo summarize, we contribute a novel form of neural guided synthesis, where we use a symbolic\nsystem\u2019s internal representations to solve an auxiliary problem of constraint scoring using neural\nembeddings. We explore two models for scoring constraints: Recurrent Neural Network (RNN) and\nGraph Neural Network (GNN) [10]. We also present a \u201ctransparent\u201d version of miniKanren with\nvisibility into its internal constraints, available at https://github.com/xuexue/neuralkanren.\nOur experiments focus on synthesizing programs in a subset of Lisp, and show that scoring con-\nstraints help. More importantly, we test the generalizability of our approach on three families of\nsynthesis problems. We compare against state-of-the-art systems \u03bb2 [11], Escher [12], Myth [13],\nand RobustFill [4]. We show that our approach has the potential to generalize to larger problems.\n\n2 Related Work\n\nProgramming by example (PBE) problems have a long history dating to the 1970\u2019s [14, 15]. Along the\nlines of early works in program synthesis, the programming languages community developed search\ntechniques that enumerate possible programs, with pruning strategies based on types, consistency,\nand logical reasoning to improve the search. Several state-of-the-art methods are described in Table 1.\n\nTable 1: Symbolic Methods\n\nDirection\nMethod\nminiKanren [8, 16] Top-down\nTop-down\n\u03bb2 [11]\nBottom-up\nEscher [12]\nMyth [13]\nTop-down\n\nSearch Strategy\nBiased-Interleaving\nTemplate Complexity\nForward Search / Conditional Inference\nIterative Deepening\n\nType Discipline\nDynamic\nStatic\nStatic\nStatic\n\nThe method \u03bb2 [11] is most similar to miniKanren, but specializes in numeric, statically-typed inputs\nand outputs. Escher [12] is built as an active learner, and relies on the presence of an oracle to supply\noutputs for new inputs that it chooses. Myth [13] searches for the smallest program satisfying a set\nof examples, and guarantees parsimony. These methods all use functional languages based on the\n\u03bb-calculus as their target language, and aim to synthesize general, recursive functions.\nContributions by the machine learning community have grown in the last few years. Interestingly,\nwhile PBE problems can be thought of as a meta-learning problem, few works explore this relationship.\nEach synthesis problem can be thought of as a learning problem [17], so learning the synthesizer can\nbe thought of as meta-learning. Instead, works generally fall under direct synthesis, differentiable\nprogramming, and neural guided synthesis.\n\nDirect Synthesis\nIn direct synthesis, the program is produced directly as a sequence or tree. One\ndomain where this has been successful is string manipulation as applied to spreadsheet completion,\nas in FlashFill [18] and its descendants [5, 4, 19]. FlashFill [18] uses a combination of search\nand carefully crafted heuristics. Later works like [5] introduce a \u201cRecursive-Reverse-Recursive\nNeural Network\u201d to generate a program tree conditioned on input/output embeddings. More recently,\nRobustFill [4] uses bi-directional Long Short-Term Memory (LSTM) with attention, to generate\nprograms as sequences. Despite \ufb02attening the tree structure, RobustFill achieved much better results\n(92% vs 38%) on the FlashFill benchmark. While these approaches succeed in the practical domain\nof string manipulation, we are interested in exploring manipulations of richer data structures.\n\n2\n\n\fDifferentiable Programming Differentiable programming involves building a differentiable inter-\npreter, then backpropagating through the interpreter to learn a latent program. The goal is to infer\ncorrect outputs for new inputs. Work in differentiable programming began with the Neural Turing\nMachine [3], a neural architecture that augments neural networks with external memory and atten-\ntion. Neural Programmer [1] and Neural Programmer-Interpreter [2] extend the work with reusable\noperations, and build programs compositionally. While differentiable approaches are appealing, [20]\nshowed that this approach still underperforms discrete search-based techniques.\n\nNeural Guided Search A recent line of work uses statistical techniques to guide a discrete search.\nFor example, DeepCoder [6] uses an encoding of the input/output examples to predict functions that\nare likely to appear in the program, to prioritize programs containing those functions. More recently,\n[7] uses an LSTM to guide the symbolic search system PROSE (Microsoft Program Synthesis using\nExamples). The search uses a \u201cbranch and bound\u201d technique. The neural model learns the choices\nthat maximize the bounding function h introduced in [18] and used for FlashFill problems. These\napproaches attempt to be search system agnostic, whereas we integrate deeply with one symbolic\napproach, taking advantage of its internal representation and compositional reasoning.\nOther work in related domains shares similarities with our contribution. For example, [21] uses\nconstraint-based solver to sample terms in order to complete a program sketch, but is not concerned\nwith synthesizing entire programs. Further, [22] implements differentiable logic programming to\ndo fuzzy reasoning and induce soft inference rules. They use Prolog\u2019s depth-\ufb01rst search as-is and\nlearn constraint validation (approximate uni\ufb01cation), whereas we learn the search strategy and use\nminiKanren\u2019s constraint validation as-is.\n\n3 Constraint Logic Programming with miniKanren\n\nThis section describes the constraint logic programming language miniKanren and its use for program\nsynthesis. Figure 1 summarizes the relationship between miniKanren and the neural agent.\n\n3.1 Background\n\nThe constraint logic programming language miniKanren uses the relational programming paradigm,\nwhere programmers write relations instead of functions. Relations are a generalization of functions: a\nfunction f with n parameters can be expressed as a relation R with n + 1 parameters, e.g., (f x) = y\nimplies (R x y). The notation (R x y) means that x and y are related by R.\nIn miniKanren queries, data \ufb02ow is not directionally biased: any input to a relation can be unknown.\nFor example, a query (R X y) where y is known and X is an unknown, called a logic variable, \ufb01nds\nvalues X where X and y are related by R. In other words, given R and f de\ufb01ned as before, the query\n\ufb01nds inputs X to f such that (f X) = y. This property allows the relational translation of a function\nto run computations in reverse [16]. We refer to such uses of relations containing logic variables as\nconstraints.\nIn this work, we are interested in using a re-\nlational form evalo of an interpreter EVAL to\nperform program synthesis2. In the functional\ncomputation (EVAL P I) = O, program P and in-\nput I are known, and the output O is the result to\nbe computed. The same computation can be ex-\npressed relationally with (evalo P I O ) where\nP and I are known and O is an unknown. We can\nalso synthesize programs from inputs and out-\nputs, expressed relationally with (evalo P I O)\nwhere P is unknown while I and O are known. While ordinary evaluation is deterministic, there may\nbe many valid programs P for any pair of I and O. Multiple uses of evalo, involving the same P but\ndifferent pairs I and O can be combined in a conjunction, further constraining P . This is how PBE\ntasks are encoded using an implementation of evalo for the target synthesis language.\n\n\u2192 (evalo (car B ) I O)\n\u2192 (evalo (cdr C ) I O)\n\u2192 (evalo (cons D E ) I O)\n\u2192 (evalo (var F ) I O)\n\nFigure 2: Expansion of an evalo constraint\n\n(evalo P I O)\n\n\u21d2 DISJ \u2192 (evalo (quote A ) I O)\n\n. . .\n\n2In miniKanren convention, a relation is named after the corresponding function, with an \u2018o\u2019 at the end.\n\nSupplementary Material A provides a de\ufb01nition of evalo used in our experiments.\n\n3\n\n\fA miniKanren program internally represents a query as a constraint tree built out of conjunctions,\ndisjunctions, and calls to relations (constraints). A relation like evalo is recursive, that is, de\ufb01ned\nin terms of invocations of other constraints including itself. Search involves unfolding a recursive\nconstraint by replacing the constraint with its de\ufb01nition in terms of other constraints. For example, in\na Lisp interpreter, a program P can be a constant, a function call, or another expression. Unfolding\nreveals these possibilities as clauses of a disjunction that replaces evalo. Figure 2 shows a partial\nunfolding of (evalo P I O).\nAs we unfold more nodes, branches of the constraint tree constrain P to be more speci\ufb01c. We call a\npartial speci\ufb01cation of P as a \u201ccandidate\u201d partial program. If at some point we \ufb01nd a fully speci\ufb01ed\nP that satis\ufb01es all relevant constraints, then P is a solution to the PBE problem.\nIn Figure 3, we show portions of the constraint tree representing a PBE problem with two input/output\npairs. Each of the gray boxes corresponds to a separate disjunct in the constraint tree, representing\na candidate. Each disjunct is a conjunction of constraints, shown one on each line. A candidate is\nviable only if the entire conjunction can be satis\ufb01ed. In the left column (a) certain \u201cobviously\u201d failing\ncandidates like (quote M ) are omitted from consideration. The right column (c) also shows the\nunfolding of the selected disjunct for (cons D E ), where D is replaced by its possible values.\nBy default, miniKanren uses a biased interleaving search [16], alternating between disjuncts to unfold.\nThe alternation is \u201cbiased\u201d towards disjuncts that have more of their constraints already satis\ufb01ed.\nThis search is complete: if a solution exists, it will eventually be found, time and memory permitting.\n\n3.2 Transparent constraint representation\n\nTypical implementations of miniKanren represent constraint trees as \u201cgoals\u201d [16] built from opaque,\nsuspended computations. These suspensions entangle both constraint simpli\ufb01cation and the implicit\nsearch policy, making it dif\ufb01cult to inspect a constraint tree and experiment with alternative search\npolicies.\nOne of our contributions is a miniKanren implementation that represents the constraint tree as a\ntransparent data structure. It provides an interface for choosing the next disjunct to unfold, making it\npossible to de\ufb01ne custom search policies driven by external agents. Our implementation is available\nat https://github.com/xuexue/neuralkanren.\nLike the standard miniKanren, this transparent version is implemented in Scheme. To interface with\nan external agent, we have implemented a Python interface that can drive the miniKanren process\nvia stdin/stdout. Users start by submitting a query, then alternate between receiving constraint tree\nupdates and choosing the next disjunct to unfold.\n\n4 Neural Guided Constraint Logic Programming\n\nWe present our neural guided synthesis approach summarized in Figure 3. To begin, miniKanren\nrepresents the PBE problem in terms of a disjunction of candidate partial programs, and the constraints\nthat must be satis\ufb01ed for the partial program to be consistent with the examples. A machine learning\nagent makes discrete choices amongst the possible candidates. The symbolic system then expands\nthe chosen candidate, adding expansions of the candidate to the list of partial programs.\nThe machine learning model follows these steps:\n\n1. Embed the constraints. Sections 4.1 and 4.2 discuss two methods for embedding constraints\n\nthat trade off ease of training and accounting for logic variable identity.\n\n2. Score each constraint. Each constraint embedding is scored independently, using a multi-\n\nlayer perceptron (MLP).\n\n3. Pool scores together. We pool constraint scores for each candidate. We pool hierarchically\nusing the structure of the constraint tree, max-pooling along a disjunction and average-\npooling along a conjunction. We \ufb01nd that using average-pooling instead of min-pooling\nhelps gradient \ufb02ow. In Figure 3 there are no internal disjunctions.\n\n4. Choose a candidate. We use a softmax distribution over candidates during training and\n\nchoose greedily during test.\n\n4\n\n\fFigure 3: Steps for synthesizing a program that repeats a symbol three times using a subset of Lisp: (a)\nminiKanren builds constraints representing the PBE problem; candidate programs contain unknowns,\nwhose values are restricted by constraints; (b) a neural agent operating on the constraints scores\ncandidates; each constraint is embedded and scored separately, then pooled per candidate; scores\ndetermine which candidate to expand; (c) miniKanren expands the chosen candidate (cons D E );\npossible completions of unknown D are added to the set of candidates; (d) this process continues\nuntil a fully-speci\ufb01ed program (with no logic variables) is found.\n\nIntuitively, the pooled score for each candidate represents the plausibility of constraints associated\nwith a candidate partial program being satis\ufb01ed. So in some sense we are learning a neural constraint\nsatisfaction system in order to solve synthesis problems.\n\n4.1 Recurrent Neural Network Model (RNN)\n\nOne way to embed the constraints is using an RNN operating on each constraint as a sequence.\nWe use an RNN with bi-directional LSTM units [23] to score constraints, with each constraint\nseparately tokenized and embedded. The tokenization process removes identifying information of\nlogic variables, and treats all logic variables as the same token. While logic variable identity is\nimportant, since each constraint is embedded and scored separately, the logic variable identity is lost.\nWe learn separate RNN weights for each relation (evalo, lookupo, etc). The particular set of\nconstraint types differs depending on the target synthesis language.\n\n4.2 Graph Neural Network Model (GNN)\n\nIn the RNN model, we lose considerable information by removing the identity of logic variables. Two\nconstraints associated with a logic variable may independently be satis\ufb01able, but may be obviously\nunsatis\ufb01able together.\nTo address this, we use a GNN model that embeds all constraints simultaneously. The use of graph or\ntree structure to represent programs [24, 25] and constraints [26] is not unprecedented. An example\ngraph structure is shown in Figure 4. Each constraint is represented as a tree, but since logic variable\nleaf nodes may be shared by multiple constraints, the constraint graph is in general a Directed Acyclic\nGraph (DAG). We do not include the constraint tree structure (disjunctions and conjunctions) in the\ngraph structure since they are handled during pooling.\nThe speci\ufb01c type of GNN model we use is a Gated Graph Neural Network (GGNN) [27]. Each node\nhas an initial embedding, which is re\ufb01ned through message passing along the edges. The \ufb01nal root\nnode embedding of each constraint is taken to be the embedding representation of the constraint.\nSince the graph structure is a DAG, we use a synchronous message schedule for message passing.\n\n5\n\n(car\u00a0\u00a0)(cons\u00a0\u00a0\u00a0\u00a0)(evalo\u00a0\u00a0\u00a0(1)\u00a0(cons\u00a0(1\u00a01\u00a01)\u00a0\u00a0))(evalo\u00a0\u00a0\u00a0(a)\u00a0(cons\u00a0(a\u00a0a\u00a0a)\u00a0\u00a0))(evalo\u00a0\u00a0\u00a0(1)\u00a0(1\u00a01))(evalo\u00a0\u00a0\u00a0(1)\u00a01)(evalo\u00a0\u00a0\u00a0(a)\u00a0(a\u00a0a))(evalo\u00a0\u00a0\u00a0(a)\u00a0a)...candidate (partial) programscandidate's constraintsembedembedembedembedembedembedRNN/ GNNscorescorescorescorescorescoreMLPPoolingSoftmax0.10.8embedembedscorescore...(car\u00a0\u00a0)(cons\u00a0(var\u00a0\u00a0)\u00a0\u00a0)......(evalo\u00a0\u00a0\u00a0(1)\u00a0(1\u00a01))(lookupo\u00a0\u00a0\u00a0(1)\u00a01)(evalo\u00a0\u00a0\u00a0(a)\u00a0(a\u00a0a))(lookupo\u00a0\u00a0\u00a0(a)\u00a0a)(cons\u00a0(app\u00a0\u00a0\u00a0\u00a0)\u00a0\u00a0)...Problem:Input\u00a0Output(1)\u00a0\u00a0\u00a0(1\u00a01\u00a01)(a)\u00a0\u00a0\u00a0(a\u00a0a\u00a0a)...(a)\u00a0miniKanren\u00a0builds\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0constraints\u00a0(b)\u00a0ML\u00a0agent\u00a0scores,\u00a0chooses\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0candidate\u00a0programs(c)\u00a0miniKanren\u00a0expands\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0chosen\u00a0candidate(d)\u00a0repeat\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0b\u00a0and\u00a0c\u00a0\u00a0\u00a0\u00a0\u00a0until\u00a0\u00a0\u00a0\u00a0\u00a0program\u00a0\u00a0\u00a0\u00a0\u00a0found...AAABCDEDDEEEEEEFFFGEHAunknownschosen candidate expandedcandidates not chosen remain\fevalo\n\nevalo\n\nA\n\ncons\n\ncons\n\ncons\n\ncons\n\n1\n\n()\n\ncons\n\nB\n\n1\n\ncons\n\n1\n\ncons\n\n1\n\n()\n\na\n\n()\n\ncons\n\nC\n\na\n\ncons\n\na\n\ncons\n\na\n\n()\n\nFigure 4: Graph representation of constraints (evalo A (1) (cons (1 1 1) B )) and\n(evalo A (a) (cons (a a a) C ))\n\nOne difference between our algorithm and a typical GGNN is the use of different node types. Each\ntoken in the constraint tree (e.g. evalo, cons, logic variable) has its own aggregation function and\nGated Recurrent Unit weights. Further, the edge types will also follow the node type of the parent\nnode. Most node types will have asymmetric children, so the edge type will also depend on the\nposition of the child.\nTo summarize, the GNN model has the following steps:\n\n1. Initialization of each node, depending on the node type and label. The initial embeddings\n\nelabel are learned parameters of the model.\n\n2. Upward Pass, which is ordered leaf-to-root, so that a node receives all messages from\nits children and updates its embedding before sending a message to its parents. Since a\nnon-leaf node always has a \ufb01xed number of children, the merge function is parameterized as\na multi-layer perceptron (MLP) with a \ufb01xed size input.\n\n3. Downward Pass, which is ordered root-to-leaf, so that a node receives all messages from\nits parents and updates its embedding before sending a message to its children. Nodes that\nare not logic variables will only have one parent, so no merge function is required. Constant\nembeddings are never updated. Logic variables can have multiple parents, so an average\npooling is used as a merge function.\n\n4. Repeat. The number of upward/downward passes is a hyperparameter. We end on an\n\nupward pass so that logic variable updates are re\ufb02ected in the root node embeddings.\n\nWe extract the \ufb01nal embedding of the constraint root nodes for scoring, pooling, and choosing.\n\n4.3 Training\n\nWe note the similarity in the setup to a Reinforcement Learning problem. The candidates can be\nthought of as possible actions, the ML model as the policy, and miniKanren as the non-differentiable\nenvironment which produces the states or constraints. However, during training we have access to the\nground-truth optimal action at each step, and therefore use a supervised cross-entropy loss.\nWe do use other techniques from the Reinforcement Learning literature. We use curriculum learning,\nbeginning with simpler training problems. We generate training states by using the current model\nparameters to make action choices at least some of the time. We use scheduled sampling [28] with a\nlinear schedule, to increase exploration and reduce teacher-forcing as training progresses. We use\nprioritized experience replay [29] to reduce correlation in a minibatch, and re-sample more dif\ufb01cult\nstates. To prevent an exploring agent from becoming \u201cstuck\u201d, we abort episodes after 20 consecutive\nincorrect choices. For optimization we use RMSProp [30], with weight decay for regularization.\nImportantly, we choose to expand two candidates per step during training, instead of the single\ncandidate as described earlier. We \ufb01nd that expanding two candidates during training allows a better\nbalance of exploration / exploitation during training, leading to a more robust model. During test\ntime, we resume expanding one candidate per step, and use a greedy policy.\n\n6\n\n\f5 Experiments\n\nFollowing the programming languages community, we focus on tree manipulation as a natural starting\npoint towards expressive computation. We use a small subset of Lisp as our target language. This\nsubset consists of cons, car, cdr, along with several constants and function application. The full\ngrammar is shown in Figure 5.\n\ndatum (D)\nvariable-name (V) ::= () | (s . V)\nexpression (E)\n\n::= () | #t | #f | 0 | 1 | x | y | a | b | s | (D . D)\n\n::= (var V) | (app E E) | (lambda E) | (quote D)\n| (cons E E) | (car E) | (cdr E) | (list E ...)\n\nFigure 5: Subset of Lisp used in this work\n\nWe present two experiments. First, we test on programmatically generated synthesis problems held\nout from training. We compare two miniKanren search strategies that do not use a neural guide,\nthree of our neural-guided models, and RobustFill with a generous beam size. Then, we test the\ngeneralizability of these approaches on three families of synthesis problems. In this second set of\nexperiments we additionally compare against state-of-the-art systems \u03bb2, Escher, and Myth. All test\nexperiments are run on Intel i7-6700 3.40GHz CPU with 16GB RAM.\n\n5.1 Tree Manipulation Problems\n\nWe programmatically generate training data by querying (evalo P I O ) in miniKanren, where\nthe program, inputs, and outputs are all unknown. We put several other restrictions on the inputs\nand outputs so that the examples are suf\ufb01ciently expressive. When input/output expressions contain\nconstants, we choose random constants to ensure variety. We use 500 generated problems for training,\neach with 5 input/output examples. In this section, we report results on 100 generated test problems.\nWe report results for several symbolic and neural guided models. Sample generated problems are\nincluded in Supplementary Material B.\nWe compare two variants of symbolic methods that use miniKanren. The \u201cNaive\u201d model uses\nbiased-interleaving search, as described in [31]. The \u201c+ Heuristic\u201d model uses additional hand tuned\nheuristics described in [16]. The neural guided models include the RNN+Constraints guided search\ndescribed in Section 4.1 and the GNN+Constraints guided search in Section 4.2. The RNN model\nuses 2-layer bi-directional LSTMs with embedding size of 128. The GNN model uses a single\nup/down/up pass with embedding size 64 and message size 128. Increasing the number of passes\ndid not yield improvements. Further, we compare against a baseline RNN model that does not take\nconstraints as input: instead, it computes embeddings of the input, output, and the candidate partial\nprogram using an LSTM, then scores the concatenated embeddings using a MLP. This baseline model\nalso uses 2-layer bi-directional LSTMs with embedding size of 128. All models use a 2-layer neural\nnetwork with ReLU activation as the scoring function.\nTable 2 reports the percentage of problems solved within 200 steps. The maximum time the RNN-\nGuided search used was 11 minutes, so we allow the symbolic models up to 30 minutes. The\nGNN-Guided search is signi\ufb01cantly more computationally expensive, and the RNN baseline model\n(without constraints) is comparable to the RNN-Guided models (with constraints as inputs).\n\nTable 2: Synthesis Results on Tree Manipulation Problems\n\nMethod\nNaive [31]\n+Heuristics (Barliman) [16]\nRNN-Guided (No Constraints)\nGNN-Guided + Constraints\nRNN-Guided + Constraints\nRobustFill [4] beam 1000+\n\nPercent Solved Average Steps\n27%\n82%\n93%\n88%\n99%\n100%\n\nN/A\nN/A\n46.7\n44.5\n37.0\nN/A\n\nAll three neural guided models performed better than symbolic methods in our tests, with the\nRNN+Constraints model solving all but one problem. The RNN model without constraints also\n\n7\n\n\fperformed reasonably, but took more steps on average than other models. RobustFill [4] Attention-C\nwith large beam size solves one more problem than RNN+Constraints on a \ufb02attened representation of\nthese problems. Exploration of beam size is in Supplementary Material D. We defer comparison with\nother symbolic systems because problems in this section involve dynamically-typed, improper list\nconstruction.\n\n5.2 Generalizability\n\n2 N 2 + 5\n\n2 N 2 + 7\n\nIn this experiment, we explore generalizability. We use the same model weights as above to synthesize\nthree families of programs of varying complexity: Repeat(N) which repeats a token N times,\nDropLast(N) which drops the last element in an N element list, and BringToFront(N) which\nbrings the last element to the front in an N element list. As a measure of how synthesis dif\ufb01culty\nincreases with N, Repeat(N) takes 4 + 3N steps, DropLast(N) takes 1\n2 N + 1 steps, and\nBringToFront(N) takes 1\n2 N + 4 steps. The largest training program takes optimally 22\nsteps to synthesize. The number of optimal steps in synthesis correlates linearly with program size.\nWe compare against state-of-the-art systems \u03bb2, Escher, and Myth. It is dif\ufb01cult to compare our\nmodels against other systems fairly, since these symbolic systems use type information, which\nprovides an advantage. Further, \u03bb2 assumes advanced language constructs like fold that other\nmethods do not. Escher is built as an active learner, and requires an \u201coracle\u201d to provide outputs for\nadditional inputs. We do not enable this functionality of Escher, and limit the number of input/output\nexamples to 5 for all methods. We allow every method up to 30 minutes. We also compare against\nRobustFill Attention-C with a beam size of 5000, the largest beam size supported by our test hardware.\nOur model is further restricted to 200 steps for consistency with Section 5.1.\nNote that if given the full 30 minutes, the RNN+Constraints model is able to synthesize DropLast(7)\nand BringToFront(6), and the GNN+Constraints model is also able to synthesize DropLast(7).\nMyth solves Repeat(N) much faster than our model, taking less than 15ms per problem, but fails on\nDropLast and BringToFront. Results are shown in Table 3.\nIn summary, the RNN+Constraints and GNN+Constraints models both solve problems much larger\nthan those seen in training. The results suggest that using constraints helps generalization: though\nRobustFill performs best in Section 5.1, it does not generalize to larger problems out of distribution;\nthough RNN+Constraints and RNN-without-constraints perform comparably in Section 5.1, the\nformer shows better generalizability. This is consistent with the observation that as program sizes\ngrow, the corresponding constraints grow more slowly.\n\nTable 3: Generalization Results: largest N for which synthesis succeeded, and failure modes (out of\ntime, out of memory, requires oracle, other error)\n\nMethod\nNaive [31]\n+Heuristics [16]\nRNN-Guided + Constraints\nGNN-Guided + Constraints\nRNN-Guided (no constraints)\n\u03bb2 [11]\nEscher [12]\nMyth [13]\nRobustFill [4] beam 1000\nRobustFill [4] beam 5000\n\nRepeat(N)\n6 (time)\n11 (time)\n20+\n20+\n9 (time)\n4 (memory)\n10 (error)\n20+\n1\n3\n\nDropLast(N) BringToFront(N)\n2 (time)\n3 (time)\n6 (time)\n6 (time)\n3 (time)\n3 (error)\n1 (oracle)\n- (error)\n1\n1\n\n- (time)\n- (time)\n5 (time)\n6 (time)\n2 (time)\n3 (error)\n- (oracle)\n- (error)\n- (error)\n- (error)\n\n6 Conclusion\n\nrepresentations, and a transparent\n\nthat works directly with miniKanren\u2019s\nWe have built a neural guided synthesis model\nimplementation of miniKanren available at\nconstraint\nhttps://github.com/xuexue/neuralkanren. We have demonstrated the success of our approach on\nchallenging tree manipulation and, more importantly, generalization tasks. These results indicate that\nour approach is a promising stepping stone towards more general computation.\n\n8\n\n\fAcknowledgments\n\nResearch reported in this publication was supported in part by the Natural Sciences and Engineering\nResearch Council of Canada, and the National Center For Advancing Translational Sciences of\nthe National Institutes of Health under Award Number OT2TR002517. R.L. was supported by\nConnaught International Scholarship. The content is solely the responsibility of the authors and does\nnot necessarily represent the of\ufb01cial views of the funding agencies.\n\nReferences\n[1] Arvind Neelakantan, Quoc V. Le, and Ilya Sutskever. Neural programmer: Inducing latent\nprograms with gradient descent. International Conference on Learning Representations, 2016.\n\n[2] Scott Reed and Nando de Freitas. Neural programmer-interpreters. International Conference\n\non Learning Representations, 2016.\n\n[3] Alex Graves, Greg Wayne, and Ivo Danihelka. Neural turing machines. arXiv preprint\n\narXiv:1410.5401, 2014.\n\n[4] Jacob Devlin, Jonathan Uesato, Surya Bhupatiraju, Rishabh Singh, Abdel-rahman Mohamed,\nand Pushmeet Kohli. RobustFill: Neural program learning under noisy I/O. In Doina Precup and\nYee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learn-\ning, volume 70 of Proceedings of Machine Learning Research, pages 990\u2013998, International\nConvention Centre, Sydney, Australia, 06\u201311 Aug 2017. PMLR.\n\n[5] Emilio Parisotto, Abdel-rahman Mohamed, Rishabh Singh, Lihong Li, Dengyong Zhou, and\nPushmeet Kohli. Neuro-symbolic program synthesis. International Conference on Learning\nRepresentations, 2017.\n\n[6] Matej Balog, Alexander L. Gaunt, Marc Brockschmidt, Sebastian Nowozin, and Daniel Tarlow.\nDeepcoder: Learning to write programs. International Conference on Learning Representations,\n2017.\n\n[7] Ashwin Kalyan, Abhishek Mohta, Oleksandr Polozov, Dhruv Batra, Prateek Jain, and Sumit\nGulwani. Neural-guided deductive search for real-time program synthesis from examples.\nInternational Conference on Learning Representations, 2018.\n\n[8] William E. Byrd and Daniel P. Friedman. From variadic functions to variadic relations. In\nProceedings of the 2006 Scheme and Functional Programming Workshop, University of Chicago\nTechnical Report TR-2006-06, pages 105\u2013117, 2006.\n\n[9] Jason Hemann and Daniel P. Friedman. \u00b5kanren: A minimal functional core for relational\n\nprogramming. In Scheme and Functional Programming Workshop 2013, 2013.\n\n[10] Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini.\nThe graph neural network model. IEEE Transactions on Neural Networks, 20(1):61\u201380, 2009.\n\n[11] John K. Feser, Swarat Chaudhuri, and Isil Dillig. Synthesizing data structure transformations\nfrom input-output examples. In ACM SIGPLAN Notices, volume 50, pages 229\u2013239. ACM,\n2015.\n\n[12] Aws Albarghouthi, Sumit Gulwani, and Zachary Kincaid. Recursive program synthesis. In\n\nInternational Conference on Computer Aided Veri\ufb01cation, pages 934\u2013950. Springer, 2013.\n\n[13] Peter-Michael Osera and Steve Zdancewic. Type-and-example-directed program synthesis. In\n\nACM SIGPLAN Notices, volume 50, pages 619\u2013630. ACM, 2015.\n\n[14] Phillip D. Summers. A methodology for lisp program construction from examples. Journal of\n\nthe ACM (JACM), 24(1):161\u2013175, 1977.\n\n[15] Alan W. Biermann. The inference of regular lisp programs from examples. IEEE transactions\n\non Systems, Man, and Cybernetics, 8(8):585\u2013600, 1978.\n\n9\n\n\f[16] William E. Byrd, Michael Ballantyne, Gregory Rosenblatt, and Matthew Might. A uni\ufb01ed\napproach to solving seven programming problems (functional pearl). Proceedings of the ACM\non Programming Languages, 1(ICFP):8, 2017.\n\n[17] Xinyun Chen, Chang Liu, and Dawn Song. Towards synthesizing complex programs from\n\ninput-output examples. International Conference on Learning Representations, 2018.\n\n[18] Sumit Gulwani. Automating string processing in spreadsheets using input-output examples. In\n\nACM SIGPLAN Notices, volume 46, pages 317\u2013330. ACM, 2011.\n\n[19] Surya Bhupatiraju, Rishabh Singh, Abdel-rahman Mohamed, and Pushmeet Kohli. Deep API\n\nprogrammer: Learning to program with APIs. arXiv preprint arXiv:1704.04327, 2017.\n\n[20] Alexander L. Gaunt, Marc Brockschmidt, Rishabh Singh, Nate Kushman, Pushmeet Kohli,\nJonathan Taylor, and Daniel Tarlow. Terpret: A probabilistic programming language for program\ninduction. arXiv preprint arXiv:1608.04428, 2016.\n\n[21] Kevin Ellis, Armando Solar-Lezama, and Josh Tenenbaum. Sampling for bayesian program\n\nlearning. In Advances in Neural Information Processing Systems, pages 1297\u20131305, 2016.\n\n[22] Tim Rockt\u00e4schel and Sebastian Riedel. End-to-end differentiable proving. In Advances in\n\nNeural Information Processing Systems, pages 3788\u20133800, 2017.\n\n[23] Sepp Hochreiter and J\u00fcrgen Schmidhuber. Long short-term memory. Neural computation,\n\n9(8):1735\u20131780, 1997.\n\n[24] Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. Learning to represent\n\nprograms with graphs. In International Conference on Learning Representations, 2018.\n\n[25] Xinyun Chen, Chang Liu, and Dawn Song. Tree-to-tree neural networks for program translation.\n\nInternational Conference on Learning Representations, 2018.\n\n[26] Daniel Selsam, Matthew Lamm, Benedikt Bunz, Percy Liang, Leonardo de Moura, and David L\nDill. Learning a sat solver from single-bit supervision. International Conference on Learning\nRepresentations, 2018.\n\n[27] Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. Gated graph sequence neural\n\nnetworks. International Conference on Learning Representations, 2016.\n\n[28] Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. Scheduled sampling for\nIn Advances in Neural Information\n\nsequence prediction with recurrent neural networks.\nProcessing Systems, pages 1171\u20131179, 2015.\n\n[29] Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. Prioritized experience replay.\n\nInternational Conference on Learning Representations, 2016.\n\n[30] T. Tieleman and G. Hinton. Lecture 6.5\u2014RmsProp: Divide the gradient by a running average\n\nof its recent magnitude. COURSERA: Neural Networks for Machine Learning, 2012.\n\n[31] William E. Byrd, Eric Holk, and Daniel P. Friedman. miniKanren, live and untagged: Quine\ngeneration via relational interpreters (programming pearl). In Proceedings of the 2012 Annual\nWorkshop on Scheme and Functional Programming, pages 8\u201329. ACM, 2012.\n\n10\n\n\f", "award": [], "sourceid": 876, "authors": [{"given_name": "Lisa", "family_name": "Zhang", "institution": "University of Toronto"}, {"given_name": "Gregory", "family_name": "Rosenblatt", "institution": "University of Alabama at Birmingham"}, {"given_name": "Ethan", "family_name": "Fetaya", "institution": "University of Toronto"}, {"given_name": "Renjie", "family_name": "Liao", "institution": "University of Toronto"}, {"given_name": "William", "family_name": "Byrd", "institution": "University of Alabama at Birmingham"}, {"given_name": "Matthew", "family_name": "Might", "institution": "University of Alabama at Birmingham"}, {"given_name": "Raquel", "family_name": "Urtasun", "institution": "University of Toronto"}, {"given_name": "Richard", "family_name": "Zemel", "institution": "Vector Institute/University of Toronto"}]}