{"title": "Learning Loop Invariants for Program Verification", "book": "Advances in Neural Information Processing Systems", "page_first": 7751, "page_last": 7762, "abstract": "A fundamental problem in program verification concerns inferring loop invariants. The problem is undecidable and even practical instances are challenging. Inspired by how human experts construct loop invariants, we propose a reasoning framework Code2Inv that constructs the solution by multi-step decision making and querying an external program graph memory block. By training with reinforcement learning, Code2Inv captures rich program features and avoids the need for ground truth solutions as supervision. Compared to previous learning tasks in domains with graph-structured data, it addresses unique challenges, such as a binary objective function and an extremely sparse reward that is given by an automated theorem prover only after the complete loop invariant is proposed. We evaluate Code2Inv on a suite of 133 benchmark problems and compare it to three state-of-the-art systems. It solves 106 problems compared to 73 by a stochastic search-based system, 77 by a heuristic search-based system, and 100 by a decision tree learning-based system. Moreover, the strategy learned can be generalized to new programs: compared to solving new instances from scratch, the pre-trained agent is more sample efficient in finding solutions.", "full_text": "Learning Loop Invariants for Program Veri\ufb01cation\n\nXujie Si\u2217\n\nUniversity of Pennsylvania\n\nxsi@cis.upenn.edu\n\nHanjun Dai \u2217\nGeorgia Tech\n\nhanjundai@gatech.edu\n\nMukund Raghothaman\nUniversity of Pennsylvania\nrmukund@cis.upenn.edu\n\nMayur Naik\n\nUniversity of Pennsylvania\n\nmhnaik@cis.upenn.edu\n\nLe Song\n\nGeorgia Tech and Ant Financial\n\nlsong@cc.gatech.edu\n\nAbstract\n\nA fundamental problem in program veri\ufb01cation concerns inferring loop invariants.\nThe problem is undecidable and even practical instances are challenging. Inspired\nby how human experts construct loop invariants, we propose a reasoning framework\nCODE2INV that constructs the solution by multi-step decision making and querying\nan external program graph memory block. By training with reinforcement learning,\nCODE2INV captures rich program features and avoids the need for ground truth\nsolutions as supervision. Compared to previous learning tasks in domains with\ngraph-structured data, it addresses unique challenges, such as a binary objective\nfunction and an extremely sparse reward that is given by an automated theorem\nprover only after the complete loop invariant is proposed. We evaluate CODE2INV on\na suite of 133 benchmark problems and compare it to three state-of-the-art systems.\nIt solves 106 problems compared to 73 by a stochastic search-based system, 77 by\na heuristic search-based system, and 100 by a decision tree learning-based system.\nMoreover, the strategy learned can be generalized to new programs: compared to\nsolving new instances from scratch, the pre-trained agent is more sample ef\ufb01cient\nin \ufb01nding solutions.\n\nIntroduction\n\n1\nThe growing ubiquity and complexity of software has led to a dramatic increase in software bugs and\nsecurity vulnerabilities that pose enormous costs and risks. Program veri\ufb01cation technology enables\nprogrammers to prove the absence of such problems at compile-time before deploying their program.\nOne of the main activities underlying this technology involves inferring a loop invariant\u2014a logical\nformula that constitutes an abstract speci\ufb01cation of a loop\u2014for each loop in the program. Obtaining\nloop invariants enables a broad and deep range of correctness and security properties to be proven\nautomatically by a variety of program veri\ufb01cation tools spanning type checkers, static analyzers, and\ntheorem provers. Notable examples include Microsoft Code Contracts for .NET programs [1] and\nthe Veri\ufb01ed Software Toolchain spanning C source code to machine language [2].\nMany different approaches have been proposed in the literature to infer loop invariants. The problem\nis undecidable, however, and even practical instances are challenging, which greatly limits the\nbene\ufb01ts of program veri\ufb01cation technology. Existing approaches suffer from key drawbacks: they\nare purely search-based, or they use hand-crafted features, or they are based on supervised learning.\nThe performance of search-based approaches is greatly hindered by their inability to learn from past\nmistakes. Hand-crafted features limit the space of possible invariants, e.g., Garg et al. [3] is limited\nto features of the form x\u00b1y\u2264 c where c is a constant, and thus cannot handle invariants that involve\nx+y\u2264 z for program variables x,y,z. Finally, obtaining ground truth solutions needed by supervised\nlearning is hindered by the undecidability of the loop invariant generation problem.\nIn this paper, we propose CODE2INV, an end-to-end learning-based approach to infer loop invariants.\nCODE2INV has the ability to automatically learn rich latent representations of desirable invariants,\n\n\u2217Both authors contributed equally to the paper.\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fand can avoid repeating similar mistakes. Furthermore, it leverages reinforcement learning to discover\ninvariants by partial feedback from trial-and-error, without needing ground truth solutions for training.\nThe design of CODE2INV is inspired by the reasoning exercised by human experts. Given a program, a\nhuman expert \ufb01rst maps the program to a well-organized structural representation, and then composes\nthe loop invariant step by step. Based on such reasoning, different parts of the representation get\nhighlighted at each step. To mimic this procedure, we utilize a graph neural network model (GNN)\nto construct the structural external memory representation of the program. The multi-step decision\nmaking is implemented by an autoregressive model, which queries the external memory using an\nattention mechanism. The decision at each step is a syntax- and semantics-guided decoder which\ngenerates subparts of the loop invariant.\nCODE2INV employs a reinforcement learning approach since it is computationally intensive to ob-\ntain ground truth solutions. Although reinforcement learning algorithms have shown remarkable\nsuccess in domains like combinatorial optimization [4, 5] (see Section 6 for more discussion on\nrelated work), our setting differs in two crucial ways: \ufb01rst, it has a non-continuous objective func-\ntion (i.e., a proposed loop invariant is correct or not); and second, the positive reward is extremely\nsparse and given only after the correct loop invariant is proposed, by an automated theorem prover\n[6]. We therefore model the policy learning as a multi-step decision making process: it provides a\n\ufb01ne-grained reward at each step of building the loop invariant, followed by continuous feedback in\nthe last step based on counterexamples collected by the agent itself during trial-and-error learning.\nWe evaluate CODE2INV on a suite of 133 benchmark problems from recent works [3, 7, 8] and the\n2017 SyGuS program synthesis competition [9]. We also compare it to three state-of-the-art systems:\na stochastic search-based system C2I [10], a heuristic search-based system LOOPINVGEN [8], and\nand a decision tree learning-based system ICE-DT [3]. CODE2INV solves 106 problems, versus 73 by\nC2I, 77 by LOOPINVGEN, and 100 by ICE-DT. Moreover, CODE2INV exhibits better learning, making\norders-of-magnitude fewer calls to the theorem prover than these systems.\n2 Background\nWe formally de\ufb01ne the loop invariant inference and learning problems by introducing Hoare logic [11],\nwhich comprises a set of axioms and inference rules for proving program correctness assertions. Let P\nand Q denote predicates over program variables and let S denote a program. We say that Hoare triple\n{P} S {Q} is valid if whenever S begins executing in a state that satis\ufb01es P and \ufb01nishes executing,\nthen the resulting state satis\ufb01es Q. We call P and Q the pre-condition and post-condition respectively\nof S. Hoare rules allow to derive such triples inductively over the structure of S. The rule most relevant\nfor our purpose is that for loops:\nP \u21d2 I (pre)\n\n(I\u2227\u00acB)\u21d2 Q (post)\n\n{I\u2227B} S {I} (inv)\n{P} while B do S {Q}\n\nPredicate I is called a loop invariant, an assertion that holds before and after each iteration, as shown\nin the premise of the rule. We can now formally state the loop invariant inference problem:\nProblem 1 (Loop Invariant Inference): Given a pre-condition P , a post-condition Q and a program\nS containing a single loop, can we \ufb01nd a predicate I such that {P} S {Q} is valid?\nGiven a candidate loop invariant, it is straightforward for an automated theorem prover such as Z3 [6]\nto check whether the three conditions denoted pre, inv, and post in the premise of the above rule hold,\nand thereby prove the property asserted in the conclusion of the rule. If any of the three conditions\nfails to hold, the theorem prover returns a concrete counterexample witnessing the failure.\nThe loop invariant inference problem is undecidable. Moreover, even seemingly simple instances are\nchallenging, as we illustrate next using the program in Figure 1(a). The goal is to prove that assertion\n(y > 0) holds at the end of the program, for every input value of integer variable y. In this case, the\npre-condition P is true since the input value of y is unconstrained, and the post-condition Q is (y > 0),\nthe assertion to be proven. Using predicate (x < 0 \u2228 y > 0) as the loop invariant I suf\ufb01ces to prove\nthe assertion, as shown in Figure 1(b). Notation \u03c6[e/x] denotes the predicate \u03c6 with each occurrence\nof variable x replaced by expression e. This loop invariant is non-trivial to infer. The reasoning is\nsimple in the case when the input value of y is non-negative, but far more subtle in the case when it\nis negative: regardless of how negative it is at the beginning, the loop will iterate at least as many times\nas to make it positive, thereby ensuring the desired assertion upon \ufb01nishing. Indeed, a state-of-the-art\nloop invariant generator LOOPINVGEN [8] crashes on this problem instance after making 1,119 calls\nto Z3, whereas CODE2INV successfully generates it after only 26 such calls.\n\n2\n\n\fx := \u221250;\nwhile (x < 0) {\nx := x+y;\ny := y+1 }\nassert(y > 0)\n\n(cid:40)\n\n(b) A desirable loop invariant I is a predicate over x,y such that:\n\u2200x,y :\n\nI \u2227 x < 0 \u21d2 I[(y+1)/y,(x+y)/x]\nI \u2227 x\u2265 0 \u21d2 y > 0\n\ntrue \u21d2 I[\u221250/x]\n\n(pre)\n(inv)\n(post)\n\n(a) An example program.\nFigure 1: A program with a correctness assertion and a loop invariant that suf\ufb01ces to prove it.\n\n(c) The desired loop invariant is (x < 0 \u2228 y > 0).\n\nThe central role played by loop invariants in program veri\ufb01cation has led to a large body of work to auto-\nmatically infer them. Many previous approaches are based on exhaustive bounded search using domain-\nspeci\ufb01c heuristics and are thereby limited in applicability and scalability [7, 12\u201318]. A different strategy\nis followed by data-driven approaches proposed in recent years [3, 8, 10]. These methods speculatively\nguess likely invariants from program executions and check their validity. In [3], decision trees are used\nto learn loop invariants with simple linear features, e.g. a\u2217x+b\u2217y < c, where a,b\u2208{\u22121,0,1},c\u2208 Z.\nIn [8], these features are generalized by systematic enumeration. In [10], stochastic search is performed\nover a set of constraint templates. While such features or templates perform well in speci\ufb01c domains,\nhowever, they may fail to adapt to new domains. Moreover, even in the same domain, they do not\nbene\ufb01t from past experiences: successfully inferring the loop invariant for one program does not speed\nup the process for other similar ones. We hereby formulate the second problem we aim to address:\nProblem 2 (Loop Invariant Learning): Given a set of programs {Si} \u223c P that are sampled from\nsome unknown distribution P, can we learn from them and generalize the strategy we learned to other\nprograms { \u02dcSi} that are from the same distribution?\n3 End-to-End Reasoning Framework\n3.1 The reasoning process of a human expert\nWe start out by illustrating how a human expert might typically\naccomplish the task of inferring a loop invariant. Consider the\nexample in Figure 2 chosen from our benchmarks.\nAn expert usually starts by reading the assertion (line 15), which\ncontains variables x and y, then determines the locations where\nthese two variables are initialized, and then focuses on the locations\nwhere they are updated in the loop. Instead of reasoning about the\nentire assertion at once, an expert is likely to focus on updates to one\nvariable at a time. This reasoning yields the observation that x is\ninitialized to zero (line 2) and may get incremented in each iteration\n(line 5,9). Thus, the sub goal \u201cx < 4\u201d may not always hold, given that\nthe loop iterates non-deterministically. This in turn forces the other\npart \u201cy > 2\u201d to be true when \u201cx >= 4\u201d. The only way x can equal or\nexceed 4 is to execute the \ufb01rst if branch 4 times (line 4-6), during\nwhich y is set to 100. Now, a natural guess for the loop invariant is\n\u201cx < 4 || y >= 100\u201d. The reason for guessing \u201cy >= 100\u201d instead of\n\u201cy <= 100\u201d is because part of the proof goal is \u201cy > 2\u201d. However,\nthis guess will be rejected by the theorem prover. This is because y might be decreased by an arbitrary\nnumber of times in the third if-branch (line 12), which happens when x is less than zero; to avoid that\nsituation, \u201cx >= 0\u201d should also be part of the loop invariant. Finally, we have the correct loop invariant:\n\u201c(x >= 0) && (x < 4 || y >= 100)\u201d, which suf\ufb01ces to prove the assertion.\nWe observe that the entire reasoning process consists of three key components: 1) organize the program\nin a hierarchical-structured way rather than a sequence of tokens; 2) compose the loop invariant step by\nstep; and 3) focus on a different part of the program at each step, depending on the inference logic, e.g.,\nabduction and induction.\n\nFigure 2: An example from our\nbenchmarks. \u201c*\u201d denotes non-\ndeterministic choice.\n\n1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n11\n12\n13\n14\n15\n16 }\n\nint main() {\nint x = 0, y = 0;\nwhile (*) {\nif (*) {\n\n}\nassert( x < 4 || y > 2);\n\n} else if (*) {\nif (x >= 4) {\n\nx++;\ny = 100;\n\nx++;\ny++;\n\n}\nif (x < 0) y--;\n\n}\n\n3.2 Programming the reasoning procedure with neural networks\nWe propose to use a neural network to mimic the reasoning used by human experts as described above.\nThe key idea is to replace the above three components with corresponding differentiable modules:\n\u2022 a structured external memory representation which encodes the program;\n\u2022 a multi-step autoregressive model for incremental loop invariant construction; and\n\u2022 an attention component that mimics the varying focus in each step.\n\n3\n\n\fFigure 3: Overall framework of neuralizing loop invariant inference.\n\nAs shown in Figure 3, these modules together build up the network that constructs loop invariants\nfrom programs, while being jointly trained with reinforcement learning described in Section 4. At\neach step, the neural network generates a predicate. Then, given the current generated partial tree, a\nTreeLSTM module summarizes what have been generated so far, and the summarization is used to read\nthe memory using attention. Lastly, the summarization together with the read memory is fed into next\ntime step. We next elaborate upon each of these three components.\n\n3.2.1 Structured external memory\nThe loop invariant is built within the given context of program. Thus it is natural to encode the program\nas an external memory module. However, in contrast to traditional memory networks [19, 20], where\nthe memory slots are organized as a linear array, the information contained in a program has rich\nstructure. A chain LSTM over program tokens can in principle capture such information but it is\nchallenging for neural networks to understand with limited data. Inspired by Allamanis et al. [21], we\ninstead use a graph-structured memory representation. Such a representation allows to capture rich\nsemantic knowledge about the program such as its control-\ufb02ow and data-\ufb02ow.\nMore concretely, we \ufb01rst convert a given program into static single assignment (SSA) form [22], and\nconstruct a control \ufb02ow graph, each of whose nodes represents a single program statement. We then\ntransform each node into an abstract syntax tree (AST) representing the corresponding statement. Thus\na program can be represented by a graph G = (V,E), where V contains terminals and nonterminals\nof the ASTs, and E ={(e(i)\ny ,e(i)\nt ) starts\nt \u2208{1,2,...,K} representing edge type. In our construction, the program\nfrom node e(i)\ngraph contains 3 different edge types (and 6 after adding reversed edges).\n\ni=1 is the set of edges. The directed edge (e(i)\n\nx ,e(i)\ny , with e(i)\n\nx ,e(i)\n\ny ,e(i)\n\nt )}|E|\n\nx to e(i)\n\nFigure 4: Diagram for source code graph as external structured memory. We convert a given program\ninto a graph G, where nodes correspond to syntax elements, and edges indicate the control \ufb02ow, syntax\ntree structure, or variable linking. We use embedding neural network to get structured memory f (G).\n\nTo convert the graph into vector representation, we follow the general message passing operator\nintroduced in graph neural network (GNN) [23] and its variants [21, 24, 25]. Speci\ufb01cally, the graph\n\n4\n\n!(#)ConstVars%&othersyx\u2026\u202604100StructuredMemory\u2026'&&&||>=x0attention\u2026\u2026TreeLSTM%('(&&||=y100||>=x0')%)STOP\u2026\u2026Output Solutioncopycopycopy\u2026\u2026SSA nodeNon-terminalsTerminalsy1<1000x2=+x1y1y2=+y11xySource CodeGraph Representationy1<1000=+y1y2=+y11yNeural Graph Embeddingx2x1x!\"($)!\"&($'&)!\"(($'&)Variable linkAST edgeControl flowVector representation)(*)*message passing operator$=&,\u2026,.\u2026\u2026StructuredExternal Memory\fnetwork will associate each node v\u2208 V with an embedding vector \u00b5v \u2208Rd. The embedding is updated\niteratively using the general neighborhood embedding as follows:\n\n= h({\u00b5(l)\n\nu }u\u2208N k(v),k\u2208{1,2,...,K})\n\n\u00b5(l+1)\n\n(1)\nHere h(\u00b7) is a nonlinear function that aggregates the neighborhood information to update the embedding.\nN k(v) is the set of neighbor nodes connected to v with edge type k, i.e., N k(v) ={u|(u,v,k)\u2208 E}.\n,\u2200v \u2208 V . Our\nSuch process will be repeated for L steps, and the node embedding \u00b5v is set to \u00b5(L)\nparameterization takes the edge types into account. The speci\ufb01c parameterization used is shown below:\n\nv\n\nv\n\n\u00b5(l+1),k\n\nv\n\u00b5(l+1)\n\n= \u03c3(W3[\u00b5(l+1),1\n\nv\n\nu\u2208N k(v)W2\u00b5(l)\n\nu ),\u2200k\u2208{1,2,...,K}\n])\n\n,...,\u00b5(l+1),K\n\n,\u00b5(l+1),2\n\nv\n\nv\n\n(2)\n\n(3)\n\n= \u03c3((cid:80)\n\nT(cid:89)\n\nT(cid:89)\n\nv\nwith the boundary case \u00b5(0)\nv = W1xv. Here xv represents the syntax information of node v, such as\ntoken or constant value in the program. Matrices W1,2,3 are learnable model parameters, and \u03c3 is\nsome nonlinear activation function. Figure 4 shows the construction of graph structured memory using\niterative message passing operator in Eq (1). f (G) ={\u00b5v}v\u2208V denotes the structured memory.\n3.2.2 Multi-step decision making process\nA loop invariant itself is a mini-program that contains expressions and logical operations. Without loss\nof generality, we de\ufb01ne the loop invariant to be a tree T , in a form with conjunctions of disjunctions:\n(4)\nEach subtreeTt is a simple logic expression (i.e., x < y * 2 + 10 - z). Given this representation form,\nit is natural to use Markov decision process (MDP) to model this problem, where the corresponding\nT -step \ufb01nite horizon MDP is de\ufb01ned as MG = (s1,a1,r1,s2,a2,...,sT ). Here st,at,rt represent the\nstate, action and reward at time step t = 1,...,T \u22121, respectively. Here we describe the state and action\nused in the inference model, and describe the design of reward and termination in Section 4.\naction: As de\ufb01ned in Eq (4), a loop invariant tree T consists of multiple subtrees {Tt}. Thus we model\nthe action at time step t as at = (opt,Tt), where opt can either be || or &&. That is to say, at each time\nstep, the agent \ufb01rst decides whether to attach the subexpression Tt to an existing disjunction, or create a\nnew disjunction and add it to the list of conjunctions. We use T ( 1, the action at should be conditioned on graph memory, as well as the partial tree\ngenerated so far. Thus st = (G,T (= -1 && n >= 1\n\nFigure 6: (a) and (b) are veri\ufb01cation costs of pre-trained model and untrained model; (c) and (d) are\nattention highlights for two example programs.\n\nhelps to reduce the veri\ufb01cation cost modestly. CODE2INV achieves the best performance with both\ncomponents enabled\u2014the con\ufb01guration used in other parts of our evaluation.\nAdditionally, to test the effectiveness of neural graph embedding, we study a simpler encoding, that is,\nviewing a program as a sequence of tokens and encoding the sequence using an LSTM. The performance\nof this setup is shown in the last row of Table 1. With a simple LSTM embedding, CODE2INV solves 13\nfewer instances and, moreover, requires signi\ufb01cantly more parameter updates.\n\nTable 1: Ablation study for different con\ufb01gurations of CODE2INV.\n\ncon\ufb01guration\nwithout CE, without attention\nwithout CE, with attention\nwith CE, without attention\nwith CE, with attention\nLSTM embedding + CE + attention\n\n#solved instances max #Z3 queries max #parameter updates\n91\n94\n95\n106\n93\n\n441K\n162K\n337K\n290K\n661K\n\n415K\n147K\n392\n276\n32\n\n5.3 Boosting the search with pre-training\nWe next address the question: given an agent that is pre-trained on programs Ptrain ={pi}\u223cP, can the\nagent solve new programs Ptest ={\u02dcpi}\u223cP faster than solving from scratch? We prepare the training\nand testing data as follows. We take the programs solved by CODE2INV as the initial set and augment\nit by creating 100 variations for each of them by introducing confounding variables and statements\nin such a way that any valid loop invariant for the original program is still valid. Further details are\nprovided in Appendix ??. Finally, 90% of them serves as Ptrain, and the rest are used for Ptest.\nAfter pre-training the agent on Ptrain for 50 epochs, we save the model and then reuse it for \u201c\ufb01ne tuning\u201d\n(or active search [4]), i.e., the agent continues the trial-and-error reinforcement learning, on Ptest.\nFigure 6a and Figure 6b compare the veri\ufb01cation costs between the pre-trained model and untrained\nmodel on datasets augmented with 1 and 5 confounding variables, respectively. We observe that, on\none hand, the pre-trained model has a clear advantage over the untrained model on either dataset; but on\nthe other hand, this gap reduces when more confounding variables are introduced. This result suggests\nan interesting future research direction: how to design a learning agent to effectively \ufb01gure out loop\ninvariant related variables from a potentially large number of confounding variables.\n5.4 Attention visualization\nFigure 6c and 6d show the attention highlights for two example programs. The original highlights are\nprovided on the program graph representation described in Section 3.2.1. We manually converted the\ngraphs back to source code for clarity. Figure 6c shows an interesting example for which CODE2INV\n\n8\n\n0100200300400500# instances solved100101102# Z3 queriesuntrainedpretrained0100200300400# instances solved100101102# Z3 queriesuntrainedpretrained\flearns a strategy of showing the assertion is actually not reachable, and thus holds trivially. Figure 6d\nshows another interesting example for which CODE2INV performs a form of abductive reasoning.\n5.5 Discussion of limitations\nWe conclude our study with a discussion of limitations. For most of the instances that CODE2INV fails\nto solve, we observe that the loop invariant can be expressed in a compact disjunctive normal form\n(DNF) representation, which is more suited for the decision tree learning approach with hand-crafted\nfeatures. However, CODE2INV is designed to produce loop invariants in the conjunctive normal form\n(CNF). The reduction of loop invariants from DNF to CNF could incur an exponential blowup in size.\nAn interesting future research direction concerns designing a learning agent that can \ufb02exibly switch\nbetween these two forms.\n6 Related Work\nWe survey work in program synthesis, program learning, learning loop invariants, and learning\ncombinatorial optimizations.\nProgram synthesis. Automatically synthesizing a program from its speci\ufb01cation has been a key\nchallenge problem since Manna and Waldinger\u2019s work [32]. In this context, syntax-guided synthesis\n(SyGuS) [9] was proposed as a common format to express these problems. Besides several imple-\nmentations of SyGuS solvers [9, 33\u201335], a number of probabilistic techniques have been proposed\nto model syntactic aspects of programs and to accelerate synthesis[36\u201338]. While logical program\nsynthesis approaches guarantee semantic correctness, they are chie\ufb02y limited by their scalability and\nrequirement of rigorous speci\ufb01cations.\nProgram learning. There have been several attempts to learn general programs using neural net-\nworks. One large class of projects includes those attempting to use neural networks to accelerate the\ndiscovery of conventional programs [39\u201342]. Most existing works only consider speci\ufb01cations which\nare in the form input-output examples, where weak supervision [43\u201345] or more \ufb01ne grained trace\ninformation is provided to help training. In our setting, there is no supervision for the ground truth\nloop invariant, and the agent needs to be able to compose a loop invariant purely from trial-and-error.\nDrawing inspiration from both programming languages and embedding methods, we build up an\nef\ufb01cient learning agent that can perform end-to-end reasoning, in a way that mimics human experts.\nLearning program loop invariants. Our work is closely related to recent work on learning loop\ninvariants from either labeled ground truth [46] or active interactions with human experts [47].\nBrockschmidt et al. [46] learn shape invariants for data structures (e.g. linked lists or trees). Their\napproach \ufb01rst extracts features using n-gram and reachability statistics over the program\u2019s heap graph\nand then applies supervised learning to train a neural network to map features to shape invariants. In\ncontrast, we are concerned with general loop invariant generation, and our approach employs graph\nembedding directly on the program\u2019s AST and learns a generation policy without using ground truth as\nsupervision. Bounov et al. [47] propose inferring loop invariants through gami\ufb01cation and crowdsourc-\ning, which relieves the need for expertise in software veri\ufb01cation, but still requires signi\ufb01cant human\neffort. In contrast, an automated theorem prover suf\ufb01ces for our approach.\nLearning combinatorial optimizations. Our work is also related to recent advances in combina-\ntorial optimization using machine learning [4, 5, 48, 49]. However, as elaborated in Section 4, the\nproblem we study is signi\ufb01cantly more dif\ufb01cult, in the sense that the objective function is non-smooth\n(binary objective), and the positive reward is extremely sparse due to the exponentially growing size of\nthe search space with respect to program size.\n7 Conclusion\nWe studied the problem of learning loop invariants for program veri\ufb01cation. Our proposed end-to-end\nreasoning framework learns to compose the solution automatically without any supervision. It solves a\ncomparable number of benchmarks as the state-of-the-art solvers while requiring much fewer queries\nto a theorem prover. Moreover, after being pre-trained, it can generalize the strategy to new instances\nmuch faster than starting from scratch. In the future, we plan to extend the framework to discover loop\ninvariants for larger programs which present more confounding variables, as well as to discover other\nkinds of program correctness properties such as ranking functions for proving program termination [50]\nand separation predicates for proving correctness of pointer-manipulating programs [51].\nAcknowledgments. We thank the anonymous reviewers for insightful comments. We thank Ningning\nXie for useful feedback. This research was supported in part by DARPA FA8750-15-2-0009, NSF\n(CCF-1526270, IIS-1350983, IIS-1639792, CNS-1704701) and ONR N00014-15-1-2340.\n\n9\n\n\fReferences\n[1] Manuel Fahndrich and Francesco Logozzo. Static contract checking with abstract interpretation.\n\nIn\nProceedings of the 2010 International Conference on Formal Veri\ufb01cation of Object-Oriented Software, 2010.\n\n[2] Andrew W. Appel. Veri\ufb01ed Software Toolchain. In Proceedings of the 20th European Symposium on\n\nProgramming (ESOP), 2011.\n\n[3] Pranav Garg, Daniel Neider, P. Madhusudan, and Dan Roth. Learning invariants using decision trees\nand implication counterexamples. In Proceedings of the ACM Symposium on Principles of Programming\nLanguages (POPL), 2016.\n\n[4] Irwan Bello, Hieu Pham, Quoc V. Le, Mohammad Norouzi, and Samy Bengio. Neural combinatorial\n\noptimization with reinforcement learning. CoRR, abs/1611.09940, 2016.\n\n[5] Elias B. Khalil, Hanjun Dai, Yuyu Zhang, Bistra Dilkina, and Le Song. Learning combinatorial optimization\nalgorithms over graphs. In Proceedings of the Conference on Neural Information Processing Systems (NIPS),\n2017.\n\n[6] Leonardo de Moura and Nikolaj Bj\u00f8rner. Z3: An ef\ufb01cient SMT solver.\n\nIn Proceedings of the 14th\nInternational Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS),\n2008.\n\n[7] Isil Dillig, Thomas Dillig, Boyang Li, and Ken McMillan. Inductive invariant generation via abductive\ninference. In Proceedings of the ACM Conference on Object-Oriented Programming Systems, Languages\nand Applications (OOPSLA), 2013.\n\n[8] Saswat Padhi, Rahul Sharma, and Todd Millstein. Data-driven precondition inference with learned features.\nIn Proceedings of the ACM Conference on Programming Language Design and Implementation (PLDI),\n2016.\n\n[9] Rajeev Alur, Rastislav Bodik, Garvit Juniwal, Milo M. K. Martin, Mukund Raghothaman, Sanjit A. Seshia,\nRishabh Singh, Armando Solar-Lezama, Emina Torlak, and Abhishek Udupa. Syntax-guided synthesis. In\nProceedings of Formal Methods in Computer-Aided Design (FMCAD), 2013.\n\n[10] Rahul Sharma and Alex Aiken. From invariant checking to invariant inference using randomized search. In\n\nProceedings of the International Conference on Computer Aided Veri\ufb01cation (CAV), 2014.\n\n[11] C. A. R. Hoare. An axiomatic basis for computer programming. Communications of the ACM, 12(10),\n\nOctober 1969.\n\n[12] Michael A. Col\u00f3n, Sriram Sankaranarayanan, and Henny B. Sipma. Linear invariant generation using non-\nlinear constraint solving. In Proceedings of the International Conference on Computer Aided Veri\ufb01cation\n(CAV), 2003.\n\n[13] Sriram Sankaranarayanan, Henny B. Sipma, and Zohar Manna. Non-linear loop invariant generation using\nGr\u00f6bner bases. In Proceedings of the ACM Symposium on Principles of Programming Languages (POPL),\n2004.\n\n[14] Sumit Gulwani and Nebojsa Jojic. Program veri\ufb01cation as probabilistic inference. In Proceedings of the\n\nACM Symposium on Principles of Programming Languages (POPL), 2007.\n\n[15] Rahul Sharma, Saurabh Gupta, Bharath Hariharan, Alex Aiken, Percy Liang, and Aditya V. Nori. A data\ndriven approach for algebraic loop invariants. In Proceedings of the European Symposium on Programming\n(ESOP), 2013.\n\n[16] Rahul Sharma, Isil Dillig, Thomas Dillig, and Alex Aiken. Simplifying loop invariant generation using\nsplitter predicates. In Proceedings of the International Conference on Computer Aided Veri\ufb01cation (CAV),\n2011.\n\n[17] Aws Albarghouthi, Sumit Gulwani, and Zachary Kincaid. Recursive program synthesis. In Proceedings of\n\nthe International Conference on Computer Aided Veri\ufb01cation (CAV), 2013.\n\n[18] Pranav Garg, Christof L\u00f6ding, P. Madhusudan, and Daniel Neider. Ice: A robust framework for learning\n\ninvariants. In Proceedings of the International Conference on Computer Aided Veri\ufb01cation (CAV), 2014.\n\n[19] Sainbayar Sukhbaatar, Jason Weston, Rob Fergus, et al. End-to-end memory networks. In Proceedings of\n\nthe Conference on Neural Information Processing Systems (NIPS), 2015.\n\n[20] Alexander Miller, Adam Fisch, Jesse Dodge, Amir-Hossein Karimi, Antoine Bordes, and Jason Weston.\nKey-value memory networks for directly reading documents. In Proceedings of the Conference on Empirical\nMethods in Natural Language Processing (EMNLP), 2016.\n\n10\n\n\f[21] Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. Learning to represent programs with\n\ngraphs. In Proceedings of the International Conference on Learning Representations (ICLR), 2018.\n\n[22] Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck. Ef\ufb01ciently\ncomputing static single assignment form and the control dependence graph. ACM Trans. Program. Lang.\nSyst., 13(4), 1991.\n\n[23] Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. The graph\n\nneural network model. IEEE Transactions on Neural Networks, 20(1):61\u201380, 2009.\n\n[24] David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Al\u00e1n Aspuru-\nGuzik, and Ryan P Adams. Convolutional networks on graphs for learning molecular \ufb01ngerprints. In\nProceedings of the Conference on Neural Information Processing Systems (NIPS), 2015.\n\n[25] Hanjun Dai, Bo Dai, and Le Song. Discriminative embeddings of latent variable models for structured data.\n\nIn Proceedings of the International Conference on Machine Learning (ICML), 2016.\n\n[26] Emilio Parisotto, Abdel-rahman Mohamed, Rishabh Singh, Lihong Li, Dengyong Zhou, and Pushmeet\n\nKohli. Neuro-symbolic program synthesis. arXiv preprint arXiv:1611.01855, 2016.\n\n[27] Matt J Kusner, Brooks Paige, and Jos\u00e9 Miguel Hern\u00e1ndez-Lobato. Grammar variational autoencoder. In\n\nProceedings of the International Conference on Machine Learning (ICML), 2017.\n\n[28] Hanjun Dai, Yingtao Tian, Bo Dai, Steven Skiena, and Le Song. Syntax-directed variational autoencoder for\nstructured data. In Proceedings of the International Conference on Learning Representations (ICLR), 2018.\n\n[29] Kai Sheng Tai, Richard Socher, and Christopher D Manning. Improved semantic representations from\ntree-structured long short-term memory networks. In Proceedings of the Association for Computational\nLinguistics (ACL), 2015.\n\n[30] David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas\nHubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. Mastering the game of Go without human\nknowledge. Nature, 550(7676):354\u2013359, 2017.\n\n[31] SyGuS Competition, 2017. http://sygus.seas.upenn.edu/SyGuS-COMP2017.html.\n\n[32] Zohar Manna and Richard J. Waldinger. Toward automatic program synthesis. In Communications of the\n\nACM, 1971.\n\n[33] Sumit Gulwani, Susmit Jha, Ashish Tiwari, and Ramarathnam Venkatesan. Synthesis of loop-free programs.\nIn Proceedings of the ACM Conference on Programming Language Design and Implementation (PLDI),\n2011.\n\n[34] Rajeev Alur, Arjun Radhakrishna, and Abhishek Udupa. Scaling enumerative program synthesis via divide\n\nand conquer. 2017.\n\n[35] Eric Schkufza, Rahul Sharma, and Alex Aiken. Stochastic superoptimization. 2013. ISBN 978-1-4503-\n\n1870-9.\n\n[36] Pavol Bielik, Veselin Raychev, and Martin Vechev. Phog: Probabilistic model for code. In Proceedings of\n\nthe International Conference on Machine Learning (ICML), 2016.\n\n[37] C.J. Maddison and D. Tarlow. Structured generative models of natural source code. In Proceedings of the\n\nInternational Conference on Machine Learning (ICML), 2014.\n\n[38] Anh Tuan Nguyen and Tien N. Nguyen. Graph-based statistical language model for code. In Proceedings of\n\nthe International Conference on Software Engineering (ICSE), 2015.\n\n[39] M. Balog, A. L. Gaunt, M. Brockschmidt, S. Nowozin, and D. Tarlow. Deepcoder: Learning to write\n\nprograms. In Proceedings of the International Conference on Learning Representations (ICLR), 2017.\n\n[40] Vijayaraghavan Murali, Letao Qi, Swarat Chaudhuri, and Chris Jermaine. Neural sketch learning for\nconditional program generation. In Proceedings of the International Conference on Learning Representations\n(ICLR), 2018.\n\n[41] Jacob Devlin, Jonathan Uesato, Surya Bhupatiraju, Rishabh Singh, Abdel rahman Mohamed, and Pushmeet\nKohli. Robust\ufb01ll: Neural program learning under noisy i/o. In Proceedings of the International Conference\non Machine Learning (ICML), 2017.\n\n[42] Emilio Parisotto, Abdel-rahman Mohamed, Rishabh Singh, Lihong Li, Dengyong Zhou, and Pushmeet\nKohli. Neuro-symbolic program synthesis. Proceedings of the International Conference on Learning\nRepresentations (ICLR), 2016.\n\n11\n\n\f[43] Chen Liang, Jonathan Berant, Quoc Le, Kenneth D Forbus, and Ni Lao. Neural symbolic machines: Learning\n\nsemantic parsers on freebase with weak supervision. arXiv preprint arXiv:1611.00020, 2016.\n\n[44] Xinyun Chen, Chang Liu, and Dawn Song. Towards synthesizing complex programs from input-output\n\nexamples. In International Conference on Learning Representations, 2018.\n\n[45] Rudy Bunel, Matthew Hausknecht, Jacob Devlin, Rishabh Singh, and Pushmeet Kohli. Leveraging gram-\nmar and reinforcement learning for neural program synthesis. In International Conference on Learning\nRepresentations, 2018.\n\n[46] Marc Brockschmidt, Yuxin Chen, Pushmeet Kohli, Siddharth Krishna, and Daniel Tarlow. Learning shape\n\nanalysis. In Proceedings of the Static Analysis Symposium (SAS), 2017.\n\n[47] Dimitar Bounov, Anthony DeRossi, Massimiliano Menarini, William G. Griswold, and Sorin Lerner.\nInferring loop invariants through gami\ufb01cation. In Proceedings of the 2018 CHI Conference on Human\nFactors in Computing Systems, CHI \u201918, 2018.\n\n[48] Elias Boutros Khalil, Pierre Le Bodic, Le Song, George L Nemhauser, and Bistra N Dilkina. Learning to\n\nbranch in mixed integer programming. 2016.\n\n[49] Daniel Selsam, Matthew Lamm, Benedikt Bunz, Percy Liang, Leonardo de Moura, and David L Dill.\n\nLearning a sat solver from single-bit supervision. arXiv preprint arXiv:1802.03685, 2018.\n\n[50] Byron Cook, Andreas Podelski, and Andrey Rybalchenko. Proving program termination. Communications\n\nof the ACM, 54(5), 2011.\n\n[51] John C. Reynolds. Separation logic: A logic for shared mutable data structures. In Proceedings of the IEEE\n\nSymposium on Logic in Computer Science (LICS), 2002.\n\n12\n\n\f", "award": [], "sourceid": 3833, "authors": [{"given_name": "Xujie", "family_name": "Si", "institution": "University of Pennsylvania"}, {"given_name": "Hanjun", "family_name": "Dai", "institution": "Georgia Tech"}, {"given_name": "Mukund", "family_name": "Raghothaman", "institution": "University of Pennsylvania"}, {"given_name": "Mayur", "family_name": "Naik", "institution": "University of Pennsylvania"}, {"given_name": "Le", "family_name": "Song", "institution": "Ant Financial & Georgia Institute of Technology"}]}