{"title": "Perceptron Learning of SAT", "book": "Advances in Neural Information Processing Systems", "page_first": 2771, "page_last": 2779, "abstract": "Boolean satisfiability (SAT) as a canonical NP-complete decision problem is one of the most important problems in computer science. In practice, real-world SAT sentences are drawn from a distribution that may result in efficient algorithms for their solution. Such SAT instances are likely to have shared characteristics and substructures. This work approaches the exploration of a family of SAT solvers as a learning problem. In particular, we relate polynomial time solvability of a SAT subset to a notion of margin between sentences mapped by a feature function into a Hilbert space. Provided this mapping is based on polynomial time computable statistics of a sentence, we show that the existance of a margin between these data points implies the existance of a polynomial time solver for that SAT subset based on the Davis-Putnam-Logemann-Loveland algorithm. Furthermore, we show that a simple perceptron-style learning rule will find an optimal SAT solver with a bounded number of training updates. We derive a linear time computable set of features and show analytically that margins exist for important polynomial special cases of SAT. Empirical results show an order of magnitude improvement over a state-of-the-art SAT solver on a hardware verification task.", "full_text": "Perceptron Learning of SAT\n\nAlex Flint\n\nDepartment of Engineering Science\n\nUniversity of Oxford\nalexf@robots.ox.ac.uk\n\nMatthew B. Blaschko\n\nCenter for Visual Computing\n\nEcole Centrale Paris\n\nmatthew.blaschko@inria.fr\n\nAbstract\n\nBoolean satis\ufb01ability (SAT) as a canonical NP-complete decision problem is one\nof the most important problems in computer science. In practice, real-world SAT\nsentences are drawn from a distribution that may result in ef\ufb01cient algorithms for\ntheir solution. Such SAT instances are likely to have shared characteristics and\nsubstructures. This work approaches the exploration of a family of SAT solvers as\na learning problem. In particular, we relate polynomial time solvability of a SAT\nsubset to a notion of margin between sentences mapped by a feature function into\na Hilbert space. Provided this mapping is based on polynomial time computable\nstatistics of a sentence, we show that the existance of a margin between these data\npoints implies the existance of a polynomial time solver for that SAT subset based\non the Davis-Putnam-Logemann-Loveland algorithm. Furthermore, we show that\na simple perceptron-style learning rule will \ufb01nd an optimal SAT solver with a\nbounded number of training updates. We derive a linear time computable set of\nfeatures and show analytically that margins exist for important polynomial special\ncases of SAT. Empirical results show an order of magnitude improvement over a\nstate-of-the-art SAT solver on a hardware veri\ufb01cation task.\n\n1\n\nIntroduction\n\nSAT was originally shown to be a canonical NP-complete problem in Cook\u2019s seminal work [5]. SAT\nis of practical interest for solving a number of critical problems in applications such as theorem prov-\ning [8], model checking [2], planning [19], and bioinformatics [22]. That it is NP-complete indicates\nthat an ef\ufb01cient learning procedure is unlikely to exist to solve arbitrary instances of SAT. Neverthe-\nless, SAT instances resulting from real world applications are likely to have shared characteristics\nand substructures. We may view them as being drawn from a distribution over SAT instances, and\nfor key problems this distribution may be benign in that a learning algorithm can enable quick deter-\nmination of SAT. In this work, we explore the application of a perceptron inspired learning algorithm\napplied to branching heuristics in the Davis-Putnam-Logemann-Loveland algorithm [8, 7].\nThe Davis-Putnam-Logemann-Loveland (DPLL) algorithm formulates SAT as a search problem,\nresulting in a valuation of variables that satis\ufb01es the sentence, or a tree resolution refutation proof\nindicating that the sentence is not satis\ufb01able. The branching rule in this depth-\ufb01rst search procedure\nis a key determinant of the ef\ufb01ciency of the algorithm, and numerous heuristics have been proposed\nin the SAT literature [15, 16, 26, 18, 13]. Inspired by the recent framing of learning as search op-\ntimization [6], we explore here the application of a perceptron inspired learning rule to application\nspeci\ufb01c samples of the SAT problem. Ef\ufb01cient learning of SAT has profound implications for algo-\nrithm development across computer science as a vast number of important problems are polynomial\ntime reducable to SAT.\nA number of authors have considered learning branching rules for SAT solvers. Ruml applied rein-\nforcement learning to \ufb01nd valuations of satis\ufb01able sentences [25]. An approach that has performed\nwell in SAT competitions in recent years is based on selecting a heuristic from a \ufb01xed set and apply-\n\n1\n\n\fing it on a per-sentence basis [27, 17]. The relationship between learnability and NP-completeness\nhas long been considered in the literature, e.g. [20]. Closely related to our approach is the learning\nas search optimization framework [6]. That approach makes perceptron-style updates to a heuristic\nfunction in A\u2217 search, but to our knowledge has not been applied to SAT, and requires a level of\nsupervision that is not available in a typical SAT setting. A similar approach to learning heuristics\nfor search was explored in [12].\n\n2 Theorem Proving as a Search Problem\n\nThe SAT problem [5] is to determine whether a sentence \u2126 in propositional logic is satis\ufb01able. First\nwe introduce some notation. A binary variable q takes on one of two possible values, {0, 1}. A\nliteral p is a proposition of the form q (a \u201cpositive literal\u201d) or \u00acq (a \u201cnegative literal\u201d). A clause \u03c9k\nis a disjunction of nk literals, p1 \u2228 p2 \u2228 \u00b7\u00b7\u00b7 \u2228 pnk . A unit clause contains exactly one literal. A\nsentence \u2126 in conjunctive normal form (CNF) [15] is a conjunction of m clauses, \u03c91\u2227\u03c92\u2227\u00b7\u00b7\u00b7\u2227\u03c9m .\nA valuation B for \u2126 assigns to each variable in \u2126 a value bi \u2208 {0, 1}. A variable is free under B if\nB does not assign it a value. A sentence \u2126 is satis\ufb01able iff there exists a valuation under which \u2126 is\ntrue. CNF is considered a canonical representation for automated reasoning systems. All sentences\nin propositional logic can be transformed to CNF [15].\n\n2.1 The Davis\u2013Putnam\u2013Logemann\u2013Loveland algorithm\n\nDavis et al.\n[7] proposed a simple procedure for recognising satis\ufb01abile CNF sentences on N\nvariables. Their algorithm is essentially a depth \ufb01rst search over all possible 2N valuations over the\ninput sentence, with specialized criteria to prune the search and transformation rules to simplify the\nsentence. We summarise the DPLL procedure below.\n\nif \u2126 contains only unit clauses and no contradictions then\n\nreturn YES\n\nend if\nif \u2126 contains an empty clause then\n\nreturn NO\n\nend if\nfor all unit clauses \u03c9 \u2208 \u2126 do\n\u2126 := UnitPropagate(\u2126, \u03c9)\n\nend for\nfor all literals p such that \u00acp /\u2208 \u2126 do\n\nremove all clauses containing p from \u2126\n\nend for\np :=PickBranch(\u2126)\nreturn DPLL(\u2126 \u2227 p) \u2228 DPLL(\u2126 \u2227 \u00acp)\n\nUnitPropagate simpli\ufb01es \u2126 under the assumption p. PickBranch applies a heuristic to choose a\nliteral in \u2126. Many modern SAT algorithms contain the DPLL procedure at their core [15, 16, 26, 18,\n13], including top performers at recent SAT competitions [21]. Much recent work has focussed on\nchoosing heuristics for the selection of branching literals since good heuristics have been empirically\nshown to reduce processing time by several orders of magnitude [28, 16, 13].\nIn this paper we learn heuristics by optimizing over a family of the form, argmaxp f (x, p) where\nx is a node in the search tree, p is a candidate literal, and f is a priority function mapping possible\nbranches to real numbers. The state x will contain at least a CNF sentence and possibly pointers to\nancestor nodes or statistics of the local search region. Given this relaxed notion of the search state,\nwe are unaware of any branching heuristics in the literature that cannot be expressed in this form.\nWe explicitly describe several in section 4.\n\n3 Perceptron Learning of SAT\n\nWe propose to learn f from a sequence of sentences drawn from some distribution determined by\na given application. We identify f with an element of a Hilbert space, H, the properties of which\n\n2\n\n\fare determined by a set of statistics polynomial time computable from a SAT instance, \u2126. We apply\nstochastic updates to our estimate of f in order to reduce our expected search time. We use xj\nto denote a node that is visited in the application of the DPLL algorithm, and \u03c6i(xj) to denote the\nfeature map associated with instantiating literal pi. Using reproducing kernel Hilbert space notation,\nour decision function at xj takes the form\n\n(cid:104)f, \u03c6i(xj)(cid:105)H.\n\nargmax\n\ni\n\n(1)\n\nWe would like to learn f such that the expected search time is reduced. We de\ufb01ne yij to be +1\nif the instantiation of pi at xj leads to the shortest possible proof, and \u22121 otherwise. Our learning\nprocedure therefore will ideally learn a setting of f that only instantiates literals for which yij is +1.\nWe de\ufb01ne a margin in a standard way:\n\nmax \u03b3 s.t. (cid:104)f, \u03c6i(xj)(cid:105)H \u2212 (cid:104)f, \u03c6k(xl)(cid:105)H \u2265 \u03b3 \u2200{(i, j)|yij = +1},{(k, l)|ykl = \u22121}\n\n(2)\n\n3.1 Restriction to Satis\ufb01able Sentences\n\nIf we had access to all yij, the application of any binary learning algorithm to the problem of learning\nSAT would be straightforward. Unfortunately, the identity of yij is only known in the worst case\nafter an exhaustive enumeration of all 2N variable assignments. We do note, however, that the DPLL\nalgorithm is a depth\u2013\ufb01rst search over literal valuations. Furthermore, for satis\ufb01able sentences the\nlength of the shortest proof is bounded by the number of variables. Consequently, in this case, all\nnodes visited on a branch of the search tree that resolved to unsatis\ufb01able have yij = \u22121 and the\nnodes on the branch leading to satis\ufb01able have yij = +1. We may run the DPLL algorithm with a\ncurrent setting of f and if the sentence is satis\ufb01able, update f using the inferred yij.\nThis learning framework is capable of computing in polynomial time valuations of satis\ufb01able sen-\ntences in the following sense.\nTheorem 1 \u2203 a polynomial time computable \u03c6 with \u03b3 > 0 \u21d0\u21d2 \u2126 belongs to a subset of satis\ufb01able\nsentences for which there exists a polynomial time algorithm to \ufb01nd a valid valuation.\n\nProof Necessity is shown by noting that the argmax in each step of the DPLL algorithm is com-\nputable in time polynomial in the sentence length by computing \u03c6 for all literals, and that there exists\na setting of f such that there will be at most a number of steps equal to the number of variables.\nSuf\ufb01ciency is shown by noting that we may run the polynomial algorithm to \ufb01nd a valid valuation\nand use that valuation to construct a feature space with \u03b3 \u2265 0 in polynomial time. Concretely,\nchoose a canonical ordering of literals indexed by i and let \u03c6i(xj) be a scalar. Set \u03c6i(xj) = +i\nif literal pi is instantiated in the solution found by the polynomial algorithm, \u22121 otherwise. When\nf = 1, \u03b3 = 2.\nCorollary 1 \u2203 polynomial time computable feature space with \u03b3 > 0 for SAT \u21d0\u21d2 P = N P\n\n2\n\nProof If P = N P there is a polynomial time solution to SAT, meaning that there is a polynomial\ntime solution to \ufb01nding valuations satis\ufb01able sentences. For satis\ufb01able sentences, this indicates that\nthere is a non-negative margin. For unsatis\ufb01able sentences, either a proof exists with length less\nthan the number of variables, or we may terminate the DPLL procedure after N + 1 steps and return\nunstatis\ufb01able.\n2\n\nWhile Theorem 1 is positive for \ufb01nding variable settings that satisfy sentences, unsatis\ufb01able sen-\ntences remain problematic when we are unsure that there exists \u03b3 > 0 or if we have an incorrect\nsetting of f. We are unaware of an ef\ufb01cient method to determine all yij for visited nodes in proofs\nof unsatis\ufb01able sentences. However, we expect that similar substructures will exist in satis\ufb01able\nand unsatis\ufb01able sentences resulting from the same application. Early iterations of our learning\nalgorithm will mistakenly explore branches of the search tree for satis\ufb01able sentences and these\nbranches will share important characteristics with inef\ufb01cient branches of proofs of unsatis\ufb01ability.\nConsequently, proofs of unsatis\ufb01ability may additionally bene\ufb01t from a learning procedure applied\nonly to satis\ufb01able sentences. In the case that we analyitically know that \u03b3 > 0 and we have a correct\nsetting of f, we may use the termination procedure in Corollary 1.\n\n3\n\n\fFigure 1: Generation of training samples from\nthe search tree. Nodes labeled in red result in\nbacktracking and therefore have negative label,\nwhile those coloured blue lie on the path to a\nproof of satis\ufb01ability.\n\nFigure 2: Geometry of the feature space. Positive\nand negative nodes are separated by a margin of\n\u03b3. Given the current estimate of f, a threshold,\nT , is selected as described in section 3.2. The\npositive nodes with a score less than T are aver-\naged, as are negative nodes with a score greater\nthan T . The resulting means lie within the re-\nspective convex hulls of the positive and nega-\ntive sets, ensuring that the geometric conditions\nof the proof of Theorem 2 are ful\ufb01lled.\n\n3.2 Davis-Putnam-Logemann-Loveland Stochastic Gradient\n\nWe use a modi\ufb01ed perceptron style update based on the learning as search optimization framework\nproposed in [6]. In contrast to that work, we do not have a notion of \u201cgood\u201d and \u201cbad\u201d nodes at each\nsearch step. Instead, we must run the DPLL algorithm to completion with a \ufb01xed model, ft. We\nknow that nodes on a path to a valuation that satis\ufb01es the sentence have positive labels, and those\nnodes that require backtracking have negative labels (Figure 1). If the sentence is satis\ufb01able, we may\ncompute a DPLL stochastic gradient, \u2207DPLL, and update f. We de\ufb01ne two sets of nodes, S+ and\nS\u2212, such that all nodes in S+ have positive label and lower score than all nodes in S\u2212 (Figure 2). In\nthis work, we have used the suf\ufb01cient condition of de\ufb01ning these sets by setting a score threshold,\nT , such that fk(\u03c6i(xj)) < T \u2200(i, j) \u2208 S+, fk(\u03c6i(xj)) > T \u2200(i, j) \u2208 S\u2212 , and |S+| \u00d7 |S\u2212| is\nmaximized. The DPLL stochastic gradient update is de\ufb01ned as follows:\n\n\u03c6k(xl)\n|S+|\n\n,\n\nft+1 = ft \u2212 \u03b7\u2207DPLL\n\n(3)\n\n(cid:88)\n\n|S\u2212| \u2212 (cid:88)\n\n\u03c6i(xj)\n\n\u2207DPLL =\n\n(i,j)\u2208S\u2212\n\n(k,l)\u2208S+\n\nwhere \u03b7 is a learning rate. While poor settings of f0 may result in a very long proof before learning\ncan occur, we show in Section 4 that we can initialize f0 to emulate the behavior of current state-of-\nthe-art SAT solvers. Subsequent updates improve performance over the baseline.\nWe de\ufb01ne R to be a positive real value such that \u2200i, j, k, l (cid:107)\u03c6i(xj) \u2212 \u03c6k(xl)(cid:107) \u2264 R\nTheorem 2 For any training sequence that is separable by a margin of size \u03b3 with (cid:107)f(cid:107) = 1, using\nthe update rule in Equation (3) with \u03b7 = 1, the number of errors (updates) made during training on\nsatis\ufb01able sentences is bounded above by R2/\u03b32.\nProof Let f1(\u03c6(x)) = 0 \u2200\u03c6(x). Considering the kth update,\n\n(cid:107)fk+1(cid:107)2 = (cid:107)fk \u2212 \u2207DPLL(cid:107)2 = (cid:107)fk(cid:107)2 \u2212 2(cid:104)fk,\u2207DPLL(cid:105) + (cid:107)\u2207DPLL(cid:107)2 \u2264 (cid:107)fk(cid:107)2 + 0 + R2.\n\n(4)\nWe note that it is the case that (cid:104)fk,\u2207DPLL(cid:105) \u2265 0 for any selection of training examples such that the\naverage of the negative examples score higher than the average of the positive examples generated\nby running a DPLL search. It is possible that some negative examples with lower scores than the\nsome positive nodes will be visited during the depth \ufb01rst search of the DPLL algorithm, but we are\nguaranteed that at least one of them will have higher score. Similarly, some positive examples may\nhave higher scores than the highest scoring negative example. In both cases, we may simply discard\n\n4\n\n\fFeature\nis-positive\nlit-unit-clauses\nvar-unit-clauses\nlit-counts\nvar-counts\nbohm-max\nbohm-min\nlit-total\nneg-lit-total\nvar-total\nlit-smallest\nneg-lit-smallest\njw\njw-neg\nactivity\ntime-since-active\nhas-been-active\n\nDimensions Description\n1\n1\n1\n3\n3\n3\n3\n1\n1\n1\n1\n1\n1\n1\n1\n1\n1\n\n1 if p is positive, 0 otherwise\nC1(p), occurences of literal in unit clauses\nC1(q), occurences of variable in unit clauses\nCi(p) for i = 2, 3, 4, occurences in small clauses\nCi(q) for i = 2, 3, 4, as above, by variable\nmax(Ci(p), Ci(\u00acp)), i = 2, 3, 4\nmax(Ci(p), Ci(\u00acp)), i = 2, 3, 4\nC(p), total occurences by literal\nC(\u00acp), total occurences of negated literal\nC(q), total occurences by variable\nCm(p), where m is the size of the smallest unsatis\ufb01ed clause\nCm(\u00acp), as above, for negated literal\nJ(p) Jeroslow\u2013Wang cue, see main text\nJ(\u00acp) Jeroslow\u2013Wang cue, see main text\nminisat activity measure\nt \u2212 T (p) time since last activity (see main text)\n1 if this p has ever appeared in a con\ufb02ict clause; 0 otherwise\n\nFigure 3: Summary of our feature space. Features are computed as a function of a sentence \u2126 and a\nliteral p. q implicitly refers to the variable within p.\n\nsuch instances from the training algorithm (as described in Section 3.2) guaranteeing the desired\ninequality. By induction, (cid:107)fk+1(cid:107)2 \u2264 kR2.\nLet u be an element of H that obtains a margin of \u03b3 on the training set. We next obtain a lower bound\non (cid:104)u, fk+1(cid:105) = (cid:104)u, fk(cid:105) \u2212 (cid:104)u,\u2207DPLL(cid:105) \u2265 (cid:104)u, fk(cid:105) + \u03b3. That \u2212(cid:104)u,\u2207DPLL(cid:105) \u2265 \u03b3 follows from the fact\nthat the means of the positive and negative training examples lie in the convex hull of the positive\nand negative sets, respectively, and that u achieves a margin of \u03b3. By induction, (cid:104)u, fk+1(cid:105) \u2265 k\u03b3.\nPutting the two results together gives\nyields k \u2264 (R/\u03b3)2.\nThe proof of this theorem closely mirrors those of the mistake bounds in [24, 6]. We note also that\nan extension to approximate large-margin updates is straightforward to implement, resulting in an\nalternate mistake bound (c.f. [6, Theorem 4]). For simplicity we consider only the perceptron style\nupdates of Equation (3) in the sequel.\n\nkR \u2265 (cid:107)fk+1(cid:107) \u2265 (cid:104)u, fk+1(cid:105) \u2265 k\u03b3 which, after some algebra,\n\n\u221a\n\n2\n\n4 Feature Space\n\nIn this section we describe our feature space. Recall that each node xj consists of a CNF sentence\n\u2126 together with a valuation for zero or more variables. Our feature function \u03c6(x, p) maps a node x\nand a candidate branching literal p to a real vector \u03c6. Many heuristics involve counting occurences\nof literals and variables. For notational convenience let C(p) be the number of occurences of p in \u2126\nand let Ck(p) be the number of occurences of p among clauses of size k. Table 4 summarizes our\nfeature space.\n\n4.1 Relationship to previous branching heuristics\n\nMany branching heuristics have been proposed in the SAT literature [28, 13, 18, 26]. Our features\nwere selected from the most successful of these and our system is hence able to emulate many other\nsystems for particular priority functions f.\nLiteral counting. Silva [26] suggested two simple heuristics based directly on literal counts. The\n\ufb01rst was to always branch on the literal that maximizes C(p) and the second was to maximize\nC(p) + C(\u00acp). Our features \u201clit-total\u201d and \u201cneg-lit-total\u201d capture these cues.\nMOM. Freeman [13] proposed a heuristic that identi\ufb01ed the size of the smallest unsatis\ufb01ed clause,\nm = min|\u03c9|, \u03c9 \u2208 \u2126, and then identi\ufb01ed the literal appearing most frequently amongst clauses of\nsize m. This is the motivation for our features \u201clit-smallest\u201d and \u201cneg-lit-smallest\u201d.\nBOHM. Bohm [3] proposed a heuristic that selects the literal maximizing\n\n\u03b1 max\n\nCk(p, xj), Ck(\u00acp, xj)\n\n+ \u03b2 min\n\nCk(p, xj), Ck(\u00acp, xj)\n\n(cid:17)\n\n,\n\n(5)\n\n(cid:16)\n\n(cid:16)\n\n(cid:17)\n\n5\n\n\fJ(p) =\n\n2\u2212|\u03c9|\n\n(cid:88)\n\nwith k = 2, or in the case of a tie, with k = 3 (and so on until all ties are broken). In practice we\nfound that ties are almost always broken by considering just k \u2264 4; hence we include \u201cbohm-max\u201d\nand \u201cbohm-min\u201d in our feature space.\nJeroslow\u2013Wang. Jerosolow and Wang [18] proposed a voting scheme in which clauses vote for\ntheir components with weight 2\u2212k, where k is the length of the clause. The total votes for a literal p\nis\n\n(6)\nwhere the sum is over clauses \u03c9 that contain p. The Jeroslow\u2013Wang rule chooses branches that\nmaximize J(p). Three variants were studied by Hooker [16]. Our features \u201cjw\u201d and \u201cjw-neg\u201d are\nsuf\ufb01cient to span the original rule as well as the variants.\nDynamic activity measures. Many modern SAT solvers use boolean constraint propagation (BCP)\nto speed up the search process [23]. One component of BCP generates new clauses as a result of\ncon\ufb02icts encountered during the search. Several modern SAT solvers use the time since a variable\nwas last added to a con\ufb02ict clause to measure the \u201cactivity\u201d of that variable . Empirically, resolving\nvariables that have most recently appeared in con\ufb02ict clauses results in an ef\ufb01cient search[14]. We\ninclude several activity\u2013related cues in our feature vector, which we compute as follows. Each\ndecision is is given a sequential time index t. After each decision we update the most\u2013recent\u2013\nactivity table T (p) := t for each p added to a con\ufb02ict clause during that iteration. We include\nthe difference between the current iteration and the last iteration at which a variable was active in\nthe feature \u201ctime-since-active\u201d. We also include the boolean feature \u201chas-been-active\u201d to indicate\nwhether a variable has ever been active. The feature \u201cactivity\u201d is a related cue used by minisat [10].\n\n5 Polynomial special cases\n\nIn this section we discuss special cases of SAT for which polynomial\u2013time algorithms are known.\nFor each we show that a margin exists in our feature space.\n\n5.1 Horn\nA Horn clause [4] is a disjunction containing at most one positive literal, \u00acq1 \u2228 \u00b7\u00b7\u00b7 \u2228 \u00acqk\u22121 \u2228 qk.\nA sentence \u2126 is a Horn formula iff it is a conjunction of Horn clauses. There are polynomial time\nalgorithms for deciding satis\ufb01ability of Horn formulae [4, 9]. One simple algorithm based on unit\npropagation [4] operates as follows. If there are no unit clauses in \u2126 then \u2126 is trivially satis\ufb01able\nby setting all variables to false. Otherwise, let {p} be a unit clause in \u2126. Delete any clause from \u2126\nthat contains p and remove \u00acp wherever it appears. Repeat until either a trivial contradiction q \u2227 \u00acq\nis produced (in which case \u2126 is unsatis\ufb01able) or until no further simpli\ufb01cation is possible (in which\ncase \u2126 is satis\ufb01able) [4].\n\nTheorem 3 There is a margin for Horn clauses in our feature space.\n\nProof We will show that there is a margin for Horn clauses in our feature space by showing that for\na particular priority function f0, our algorithm will emulate the unit propagation algorithm above.\nLet f0 be zero everywhere except for the following elements:1\u201cis-positive\u201d = \u2212\u0001, \u201clit-unit-clauses\u201d\n= 1. Let H be the decision heuristic corresponding to f0. Consider a node x and let \u2126 be the\ninput sentence \u21260 simpli\ufb01ed according to the (perhaps partial) valuation at x. If \u2126 contains no unit\nclauses then clearly (cid:104)\u03c6(x, p), f0(cid:105) will be maximized for a negative literal p = \u00acq. If \u2126 does contain\nunit clauses then for literals p which appear in unit clauses we have (cid:104)\u03c6(x, p), f0(cid:105) \u2265 1, while for all\nother literals we have (cid:104)\u03c6(x, p), f0(cid:105) < 1. Therefore H will select a unit literal if \u2126 contains one.\nFor satis\ufb01able \u2126, this exactly emulates the unit propagation algorithm, and since that algorithm never\nback\u2013tracks [4], our algorithm makes no mistakes. For unsatis\ufb01able \u2126 our algorithm will behave as\nfollows. First note that every sentence encountered contains at least one unit clause, since, if not,\nthat sentence would be trivially satis\ufb01able by setting all variables to false and this would contradict\nthe assumption that \u2126 is unsatis\ufb01able. So at each node x, the algorithm will \ufb01rst branch on some\nunit clause p, then later will back\u2013track to x and branch on \u00acp. But since p appears in a unit clause\nat x this will immediately generate a contradiction and no further nodes will be expanded along that\npath. Therefore the algorithm expands no more than 2N nodes, where N is the length of \u2126.\n2\n\n1For concreteness let \u0001 = 1\n\nK+1 where K is the length of the input sentence \u2126\n\n6\n\n\f(a) Performance for planar graph colouring\n\n(b) Performance for hardware veri\ufb01cation\n\nFigure 4: Results for our algorithm applied to (a) planar graph colouring; (b) hardware veri\ufb01cation.\nBoth \ufb01gures show the mistake rate as a function of the training iteration. In \ufb01gure (a) we report\nthe mistake rate on the current training example since no training example is ever repeated, while\nin \ufb01gure (b) it is computed on a seperate validation set (see \ufb01gure 5). The red line shows the\nperformance of minisat on the validation set (which does not change over time).\n\n5.2\n\n2\u2013CNF\n\nA 2\u2013CNF sentence is a CNF sentence in which every clause contains exactly two literals. In this sec-\ntion we show that a function exists in our feature space for recognising satis\ufb01able 2\u2013CNF sentences\nin polynomial time.\nA simple polynomial\u2013time solution to 2\u2013CNF proposed by Even et al. [11] operates as follows. If\nthere are no unit clauses in \u2126 then pick any literal and add it to \u2126. Otherwise, let {p} be a unit\nclause in \u2126 and apply unit propagation to p as described in the previous section. If a contradiction\nis generated then back\u2013track to the last branch and negate the literal added there. If there is no such\nbranch, then \u2126 is unsatis\ufb01able. Even et al. showed that this algorithm never back\u2013tracks over more\nthan one branch, and therefore completes in polynomial time.\nTheorem 4 Under our feature space, H contains a priority function that recognizes 2\u2013SAT sen-\ntences in polynomial time.\n\nProof By construction. Let f0 be a weight vector with all elements set to zero except for the element\ncorrersponding to the \u201cappears-in-unit-clause\u201d feature, which is set to 1. When using this weight\nvector, our algorithm will branch on a unit literal whenever one is present. This exactly emulates the\nbehaviour of the algorithm due to Even et al. described above, and hence completes in polynomial\ntime for all 2\u2013SAT sentences.\n2\n\n6 Empirical Results\n\nPlanar Graph Colouring: We applied our algorithm on the problem of planar graph colouring, for\nwhich polynomial time algorithms are known [1]. Working in this domain allowed us to generate an\nunlimited number of problems with a consistent but non\u2013trivial structure on which to validate our\nalgorithm. By allowing up to four colours we also ensured that all instances were satis\ufb01able [1].\nWe generated instances as follows. Starting with an empty L \u00d7 L grid we sampled K cells at\nrandom and labelled them 1 . . . K. We then repeatedly picked a labelled cell with at least one\nunlabelled neighbour and copied its label to its neighbour until all cells were labelled. Next we\nformed a K \u00d7 K adjacency matrix A with Aij = 1 iff there is a pair of adjacent cells with labels\ni and j. Finally we generated a SAT sentence over 4K variables (each variable corresponds to a\nparticular colouring of a particular vertex), with clauses expressing the constraints that each vertex\nmust be assigned one and only one colours and that no pair of adjacent vertices may be assigned the\nsame colour.\nIn our experiments we used K = 8, L = 5 and a learning rate of 0.1. We ran 40 training iterations of\nour algorithm. No training instance was repeated. The number of mistakes (branching decision that\n\n7\n\n\fTraining\n\nValidation\n\nProblem Clauses\n26106\nferry11\n25500\nferry11u\nferry9\n16210\n15748\nferry9u\nferry12u\n31516\n\nProblem Clauses\n20792\nferry10\n20260\nferry10u\nferry8\n12312\n11916\nferry8u\nferry12\n32200\n\nFigure 5: Instances in training and validation sets.\n\nwere later reversed by back\u2013tracking) made at each iteration is shown in \ufb01gure 4(a). Our algorithm\nconverged after 18 iterations and never made a mistake after that point.\nHardware Veri\ufb01cation: We applied our algorithm to a selection of problems from a well\u2013known\nSAT competition [21]. We selected training and validation examples from the same suite of prob-\nlems; this is in line with our goal of learning the statistical structure of particular subsets of SAT\nproblems. The problems selected for training and validation are from the 2003 SAT competition and\nare listed in \ufb01gure 5.\nDue to the large size of these problems we extended an existing high\u2013performance SAT solver,\nminisat [10], replacing its decision heuristic with our perceptron strategy. We executed our algorithm\non each training problem sequentially for a total of 8 passes through the training set (40 iterations\nin total). We performed a perceptron update (3) after solving each problem. After each update we\nevaluated the current priority function on the entire validation set. The average mistake rate on the\nvalidation set are shown for each training iteration in \ufb01gure 4(b).\n\n7 Discussion\n\nSection 6 empirically shows that several important theoretical results of our learning algorithm hold\nin practice. The experiments reported in Figure 4(a) show in practice that for a polynomial time solv-\nable subset of SAT, the algorithm indeed has a bounded number of mistakes during training. Planar\ngraph colouring is a known polynomial time computable problem, but it is dif\ufb01cult to characterize\ntheoretically and an automated theorem prover was employed in the proof of polynomial solvability.\nThe hardware veri\ufb01cation problem explored in Figure 4(b) shows that the algorithm learns a setting\nof f that gives performance an order of magnitude faster than the state of the art Minisat solver. It\ndoes so after relatively few training iterations and then maintains good performance.\nSeveral approaches present themselves as good opportunites of extensions to learning SAT. In this\nwork, we argued that learning on positive examples is suf\ufb01cient if the subset of SAT sentences\ngenerated by our application has a positive margin. However, it is of interest to consider learning\nin the absense of a positive margin, and learning may be accelerated by making updates based on\nunsatis\ufb01able sentences. One potential approach would be to consider a stochastic \ufb01nite difference\napproximation to the risk gradient by running the DPLL algorithm a second time with a perturbed f.\nAdditionally, we may consider updates to f during a run of the DPLL algorithm when the algorithm\nbacktracks from a branch of the search tree for which we can prove that all yij = \u22121. This, however,\nrequires care in ensuring that the implicit empirical risk minimization is not biased.\nIn this work, we have shown that a perceptron-style algorithm is capable of learning all polynomial\nsolvable SAT subsets in bounded time. This has important implications for learning real-world\nSAT applications such as theorem proving, model checking, planning, hardware veri\ufb01cation, and\nbioinformatics. We have shown empirically that our theoretical results hold, and that state-of-the-\nart computation time can be achieved with our learning rule on a real-world hardware veri\ufb01cation\nproblem. As SAT is a canonical NP-complete problem, we expect that the ef\ufb01cient solution of\nimportant subsets of SAT may have much broader implications for the solution of many real-world\nproblems.\nAcknowledgements: This work is partially funded by the European Research Council under the\nEuropean Community\u2019s Seventh Framework Programme (FP7/2007-2013)/ERC Grant agreement\nnumber 259112, and by the Royal Academy of Engineering under the Newton Fellowship Alumni\nScheme.\n\n8\n\n\fReferences\n[1] K. Appel, W. Haken, and J. Koch. Every planar map is four colorable. Illinois J. Math, 21(3):491 \u2013 567,\n\n1977.\n\n[2] A. Biere, A. Cimatti, E. M. Clarke, and Y. Zhu. Symbolic model checking without BDDs. In International\n\nConference on Tools and Algorithms for Construction and Analysis of Systems, pages 193\u2013207, 1999.\n\n[3] M. Buro and H. K. Buning. Report on a sat competition. 1992.\n[4] C.-L. Chang and R. C.-T. Lee. Symbolic Logic and Mechanical Theorem Proving. Academic Press, Inc.,\n\nOrlando, FL, USA, 1st edition, 1997.\n\n[5] S. A. Cook. The complexity of theorem-proving procedures. In Proceedings of the third annual ACM\n\nsymposium on Theory of computing, STOC \u201971, pages 151\u2013158, New York, NY, USA, 1971. ACM.\n\n[6] H. Daum\u00b4e, III and D. Marcu. Learning as search optimization: approximate large margin methods for\n\nstructured prediction. In International Conference on Machine learning, pages 169\u2013176, 2005.\n\n[7] M. Davis, G. Logemann, and D. Loveland. A machine program for theorem-proving. Commun. ACM,\n\n5:394\u2013397, July 1962.\n\n[8] M. Davis and H. Putnam. A computing procedure for quanti\ufb01cation theory. J. ACM, 7:201\u2013215, 1960.\n[9] W. F. Dowling and J. H. Gallier. Linear-time algorithms for testing the satis\ufb01ability of propositional horn\n\nformulae. The Journal of Logic Programming, 1(3):267 \u2013 284, 1984.\n\n[10] N. E\u00b4en and N. S\u00a8orensson. An extensible sat-solver. In Theory and Applications of Satis\ufb01ability Testing,\n\npages 333\u2013336. 2004.\n\n[11] S. Even, A. Itai, and A. Shamir. On the complexity of time table and multi-commodity \ufb02ow problems. In\n\nSymposium on Foundations of Computer Science, pages 184\u2013193, 1975.\n\n[12] M. Fink. Online learning of search heuristics. Journal of Machine Learning Research - Proceedings\n\nTrack, 2:114\u2013122, 2007.\n\n[13] J. W. Freeman. Improvements to propositional satis\ufb01ability search algorithms. PhD thesis, University of\n\nPennsylvania, 1995.\n\n[14] E. Goldberg and Y. Novikov. Berkmin: A fast and robust sat-solver. In Design, Automation and Test in\n\nEurope Conference and Exhibition, 2002. Proceedings, pages 142 \u2013149, 2002.\n\n[15] J. Harrison. Handbook of Practical Logic and Automated Reasoning. Cambridge University Press, 2009.\n[16] J. N. Hooker and V. Vinay. Branching rules for satis\ufb01ability. Journal of Automated Reasoning, 15:359\u2013\n\n383, 1995.\n\n[17] F. Hutter, D. Babic, H. H. Hoos, and A. J. Hu. Boosting veri\ufb01cation by automatic tuning of decision\n\nprocedures. In Proceedings of the Formal Methods in Computer Aided Design, pages 27\u201334, 2007.\n\n[18] R. G. Jeroslow and J. Wang. Solving propositional satis\ufb01ability problems. Annals of Mathematics and\n\nArti\ufb01cial Intelligence, 1:167\u2013187, 1990.\n\n[19] H. A. Kautz. Deconstructing planning as satis\ufb01ability. In Proceedings of the Twenty-\ufb01rst National Con-\n\nference on Arti\ufb01cial Intelligence (AAAI-06), 2006.\n\n[20] M. Kearns, M. Li, L. Pitt, and L. Valiant. On the learnability of boolean formulae. In Proceedings of the\n\nnineteenth annual ACM symposium on Theory of computing, pages 285\u2013295, 1987.\n\n[21] D. Le Berra and O. Roussel. Sat competition 2009. http://www.satcompetition.org/2009/.\n[22] I. Lynce and J. a. Marques-Silva. Ef\ufb01cient haplotype inference with boolean satis\ufb01ability. In Proceedings\nof the 21st national conference on Arti\ufb01cial intelligence - Volume 1, pages 104\u2013109. AAAI Press, 2006.\n[23] M. Moskewicz, C. Madigan, Y. Zhao, L. Zhang, and S. Malik. Chaff: engineering an ef\ufb01cient sat solver.\n\nIn Design Automation Conference, 2001. Proceedings, pages 530 \u2013 535, 2001.\n\n[24] F. Rosenblatt. The Perceptron: A probabilistic model for information storage and organization in the\n\nbrain. Psychological Review, 65:386\u2013408, 1958.\n\n[25] W. Ruml. Adaptive Tree Search. PhD thesis, Harvard University, 2002.\n[26] J. a. P. M. Silva. The impact of branching heuristics in propositional satis\ufb01ability algorithms. In Proceed-\nings of the 9th Portuguese Conference on Arti\ufb01cial Intelligence: Progress in Arti\ufb01cial Intelligence, EPIA\n\u201999, pages 62\u201374, London, UK, 1999. Springer-Verlag.\n\n[27] L. Xu, F. Hutter, H. H. Hoos, and K. Leyton-Brown. Satzilla: portfolio-based algorithm selection for sat.\n\nJ. Artif. Int. Res., 32:565\u2013606, June 2008.\n\n[28] R. Zabih. A rearrangement search strategy for determining propositional satis\ufb01ability. In in Proceedings\n\nof the National Conference on Arti\ufb01cial Intelligence, pages 155\u2013160, 1988.\n\n9\n\n\f", "award": [], "sourceid": 1276, "authors": [{"given_name": "Alex", "family_name": "Flint", "institution": null}, {"given_name": "Matthew", "family_name": "Blaschko", "institution": null}]}