{"title": "Learning Local Search Heuristics for Boolean Satisfiability", "book": "Advances in Neural Information Processing Systems", "page_first": 7992, "page_last": 8003, "abstract": "We present an approach to learn SAT solver heuristics from scratch through deep reinforcement learning with a curriculum. In particular, we incorporate a graph neural network in a stochastic local search algorithm to act as the variable selection heuristic. We consider Boolean satisfiability problems from different classes and learn specialized heuristics for each class. Although we do not aim to compete with the state-of-the-art SAT solvers in run time, we demonstrate that the learned heuristics allow us to find satisfying assignments in fewer steps compared to a generic heuristic, and we provide analysis of our results through experiments.", "full_text": "Learning Local Search Heuristics for\n\nBoolean Satis\ufb01ability\n\nEmre Yolcu\n\nCarnegie Mellon University\n\neyolcu@cs.cmu.edu\n\nBarnab\u00e1s P\u00f3czos\n\nCarnegie Mellon University\nbapoczos@cs.cmu.edu\n\nAbstract\n\nWe present an approach to learn SAT solver heuristics from scratch through deep\nreinforcement learning with a curriculum. In particular, we incorporate a graph\nneural network in a stochastic local search algorithm to act as the variable selection\nheuristic. We consider Boolean satis\ufb01ability problems from different classes and\nlearn specialized heuristics for each class. Although we do not aim to compete\nwith the state-of-the-art SAT solvers in run time, we demonstrate that the learned\nheuristics allow us to \ufb01nd satisfying assignments in fewer steps compared to a\ngeneric heuristic, and we provide analysis of our results through experiments.\n\n1\n\nIntroduction\n\nRecently there has been a surge of interest in applying machine learning to combinatorial optimiza-\ntion [7, 24, 32, 27, 9]. The problems of interest are often NP-complete and traditional methods\nef\ufb01cient in practice usually rely on heuristics or produce approximate solutions. These heuristics\nare commonly manually-designed, requiring signi\ufb01cant insight into the problem. Similar to the way\nthat the recent developments in deep learning have transformed research in computer vision [28] and\narti\ufb01cial intelligence [43] by moving from engineered methods to ones that are learned from data and\nexperience, the expectation is that it will lead to advancements in search and optimization algorithms\nas well. Interest in this line of work has been fueled by the developments in neural networks that\noperate on graphs [5] since many combinatorial problems can be naturally represented using graphs.\nA problem that is becoming a popular target for machine learning is satis\ufb01ability [11]. Boolean\nsatis\ufb01ability (abbreviated SAT) is the decision problem of determining whether there exists a satisfying\nassignment for a given Boolean formula. The task of \ufb01nding such assignments if they exist or proving\nunsatis\ufb01ability otherwise is referred to as SAT solving. SAT is the canonical NP-complete problem\n[13]. It is heavily researched, and there exist ef\ufb01cient heuristic algorithms that can solve problems\ninvolving millions of variables and clauses. In addition to its fundamental place in complexity theory,\nSAT is practically relevant, and there is a plethora of problems arising from arti\ufb01cial intelligence,\ncircuit design, planning, and automated theorem proving that can be reduced to SAT [37].\nAssuming we are given instances from a known class of SAT problems, it is an interesting question\nwhether we can discover a heuristic from scratch specialized to that class which improves upon a\ngeneric one. While handcrafting such heuristics for every single class is not feasible, an alternative is\nto sample problems from a class and learn a heuristic by training to solve the problems. In practice,\nwe are often interested in solving similar problems coming from the same distribution, which makes\nthis setting worth studying. In this paper we focus on stochastic local search (SLS) and propose a\nlearnable algorithm with a variable selection heuristic computed by a graph neural network. Through\nreinforcement learning, we train from scratch a suite of solvers with heuristics specialized to different\nclasses, and demonstrate that this approach can lead to algorithms that require fewer, although costlier,\nsteps to arrive at a solution compared to a generic heuristic.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\f2 Related work\n\nLately, neural networks have seen an increased interest in applications to SAT solving. Selsam et al.\n[42] propose an approach where a graph neural network (called NeuroSAT) is trained to classify SAT\nproblems as satis\ufb01able or unsatis\ufb01able. They provide evidence that as a result of training to predict\nsatis\ufb01ability on their speci\ufb01cally crafted dataset, a local search algorithm is synthesized, through\nwhich they can often decode the satisfying assignments. In their work, the graph neural network\nitself acts as a learnable solver while we use it as a relatively cheap heuristic in an explicitly speci\ufb01ed\nsearch algorithm. It is unclear whether their approach can be applied to learning distribution-speci\ufb01c\nalgorithms as it is hypothesized to be crucial that NeuroSAT be trained on \u201cpatternless\u201d problems,\nhence the reason for our approach. In a more recent work, Selsam and Bj\u00f8rner [41] also use a\nsimpli\ufb01ed NeuroSAT to guide the search process of an existing solver. Another similar approach\nis that of Lederman et al. [31], where they attempt to learn improved heuristics to solve quanti\ufb01ed\nBoolean formulas through reinforcement learning while using a heuristic computed by a graph neural\nnetwork. A critical difference of our work from theirs is that we have a minimal set of features\n(one-hot encodings of the assignments) just enough to obtain a lossless encoding of the solver state\nwhile they make use of a large set of handcrafted features. On another front, Amizadeh et al. [1] use a\ngraph neural network and a training procedure mimicking reinforcement learning to directly produce\nsolutions to the Circuit-SAT problem.\nSAT solving community has also experimented with more practical applications of machine learning\nto SAT, possibly the most successful example being SATzilla [48]. Closest to our approach are the\nworks of Fukunaga [18, 19], KhudaBukhsh et al. [25], Illetskova et al. [22]. Brie\ufb02y summarized, they\nevolve heuristics through genetic algorithms by combining existing primitives, with the latter two\naiming to specialize the created heuristics to particular problem classes. We adopt a similar approach,\nwith the crucial difference that our approach involves learning heuristics from experience as opposed\nto combining designed ones. There have also been other approaches utilizing reinforcement learning\nto discover variable selection heuristics [33, 34, 35, 29, 17], although they focus mostly on crafting\nreward functions. We turn our attention to learning with little prior knowledge, and use a terminal\nreward which is nonzero only when a satisfying assignment is found.\n\n3 Background\n\n3.1 Boolean formulas\n\nWe limit our attention to Boolean formulas in conjunctive normal form (CNF), which consist of the\nfollowing components: a list of n Boolean variables (x1, . . . , xn), and a list of m clauses (c1, . . . , cm)\nwhere each clause cj is a disjunction (\u2228) of literals (a literal refers to different polarities of a Boolean\nvariable: xi and \u00acxi). The CNF formula is the conjunction (\u2227) of all clauses. Throughout the paper,\n\u03c6 : {0, 1}n \u2192 {0, 1} refers to a Boolean formula, X \u2208 {0, 1}n to a particular truth assignment to the\nlist of variables (x1, . . . , xn), and \u03c6(X) is simply the truth value of the Boolean formula evaluated at\nassignment X. Also, n and m always refer to the number of variables and clauses, respectively.\n\n3.2 Local search\n\nSAT solvers based on stochastic local search\nstart with a random initial candidate solution and\niteratively re\ufb01ne it by moving to neighboring so-\nlutions until arriving at a satisfying assignment.\nNeighboring solutions differ by the assignment\nof a single variable, that is, the assigned value\nof a variable is \ufb02ipped between solutions. Al-\ngorithm 1 shows the pseudocode for a generic\nstochastic local search algorithm for SAT.\nAs an example heuristic, the SelectVariable\nfunction of WalkSAT [39] \ufb01rst randomly selects\na clause unsatis\ufb01ed by the current assignment\nand \ufb02ips the variable within that clause which,\nwhen \ufb02ipped, would result in the fewest number\n\ntrials K, maximum number of \ufb02ips L\n\nX \u2190 Initialize(\u03c6)\nfor j \u2190 1 to L do\n\nAlgorithm 1 Local search for SAT\nInput: Boolean formula \u03c6, maximum number of\n1: for i \u2190 1 to K do\n2:\n3:\n4:\n5:\n6:\n7:\n8:\n9:\n10:\n11: end for\n12: return unsolved\n\nreturn X\nindex \u2190 SelectVariable(\u03c6, X)\nX \u2190 Flip(X, index)\n\nif \u03c6(X) = 1 then\n\nelse\n\nend if\n\nend for\n\n2\n\n\fof previously satis\ufb01ed clauses becoming unsatis\ufb01ed, or with some probability (referred to as walk\nprobability) it selects a variable randomly within the clause.\nOur algorithm \ufb01ts into the same template, with the difference being that the function SelectVariable\nemploys a graph neural network to select a variable. Unlike WalkSAT we do not limit the selection to\nthe variables in unsatis\ufb01ed clauses, so it is possible to pick a variable that occurs only in currently\nsatis\ufb01ed clauses. However, similar to WalkSAT, with some probability we also randomly select a\nvariable from a randomly selected unsatis\ufb01ed clause.\n\n3.3 Graph neural networks\n\nGraph neural networks (GNNs) are a family of neural network architectures that operate on graphs\nwith their computation structured accordingly [21, 38, 20, 5].\nIn order to explain our speci\ufb01c\narchitecture in Section 4 more easily, here we describe a formalism for GNNs similar to the message-\npassing framework of Gilmer et al. [20].\nAssume we have an undirected graph G = (V, E) where V is the set of vertices (nodes) and\nE \u2286 V \u00d7 V is the set of edges. Further suppose that for each node v \u2208 V we have a row vector\nv \u2208 Rd0\nV . Similarly,\nh0\nfor an edge vw \u2208 E between two nodes v, w \u2208 V we have a row vector hvw \u2208 RdE of edge features\nwith some dimension dE. A GNN maps each node to a vector space embedding by iteratively\nupdating the representation of the node based on its neighbors. In this formalism we do not extract\nedge features. At each iteration t \u2208 {1, . . . , T}, for each v \u2208 V we update its previous representation\nht\u22121\n\nV of node features (possibly derived from node labels) with some dimension d0\n\nto ht\n\nV as\n\nv \u2208 Rdt\n\nv\n\n\uf8eb\uf8edht\u22121\n\nv\n\n,\n\n(cid:88)\n\nM t(cid:0)ht\u22121\n\nv\n\nht\nv = U t\n\nw\u2208N (v)\n\n(cid:1)\uf8f6\uf8f8 ,\n\n, ht\u22121\n\nw , hvw\n\nwhere N (v) denotes the set of neighbors of node v. Message functions M t and update functions U t\nare differentiable functions whose parameters are learned. After T iterations, we obtain the extracted\nfeatures hT\n\nv for each node v in the graph.\n\nFor a compact notation, de\ufb01ne the stacked node features H t\nwhere V (i) is the i-th node in the graph under some indexing. Similarly de\ufb01ne HE \u2208 R|E|\u00d7dE to be\nthe stacked edge features. Then a GNN with T iterations computes H T\nf\u03b8 is a function parameterized by \u03b8 that encodes each node of the graph G as a real-valued vector.\n\n(cid:0)H 0\nV , HE, G(cid:1) where\n\nV such that H t\n\nV = f\u03b8\n\nV (i)\n\nV \u2208 R|V |\u00d7dt\n\nV (i,\u00b7) = ht\n\n4 Model\n\nIn this section we \ufb01rst describe the graphical representation for CNF formulas, then explain the exact\ninput of the model, and \ufb01nally de\ufb01ne the architecture of the graph neural network that we use.\n\n4.1 Factor graph of CNF formulas\n\nWe opt for a factor graph representation of CNF formulas.\nWith this representation we obtain an undirected bipartite graph\nwith two node types (variable and clause) and two edge types\n(positive and negative polarities). For each variable in the\nformula we have a node and for each clause we have a node of\na different type. Between a clause and each variable that occurs\nin it there is an edge whose type depends on the polarity of the\nvariable in the clause.\nNote that unlike the graphical representation employed by Sel-\nsam et al. [42] and Lederman et al. [31], each variable (as\nopposed to a literal) has a corresponding node, which results in\na slightly more compact mapping of a CNF formula to a graph.\nFigure 1 displays the factor graph of a simple CNF formula.\n\nx1\n\nx2\n\nx3\n\nc1\n\nc2\n\nc3\n\nc4\n\nc5\n\nFigure 1: Factor graph of the for-\nmula (x1 \u2228 x3) \u2227 (x2 \u2228 \u00acx3) \u2227\n(\u00acx1 \u2228 x3) \u2227 (\u00acx1 \u2228 \u00acx2) \u2227 (x1 \u2228\n\u00acx3). Solid and dashed edges cor-\nrespond respectively to positive and\nnegative polarities of variables in\nclauses.\n\n3\n\n\f4.2\n\nInput representation\n\nFor variable selection, we use a GNN that takes as input a formula \u03c6 with an assignment X to its\nvariables and outputs a vector of probabilities corresponding to variables (described further in the\nnext section). The actual input to the model consists of the adjacency information of the factor graph\nand node features. Edge features are apparent from the adjacency matrices and do not act as explicit\ninputs to the model.\nSince the factor graph of a CNF formula is bipartite and there are edges of two different types, we\nstore a pair of n \u00d7 m biadjacency matrices A = (A+, A\u2212) such that A+(i, j) = 1{xi \u2208 cj} and\nA\u2212(i, j) = 1{\u00acxi \u2208 cj}.\nAs node features we use one-hot vectors. When variable assignments are taken into account, there are\nthree node types: variable assigned True, variable assigned False, and clause. More concretely, node\nv \u2208 Rn\u00d73 and clause\nfeatures are stored as a pair H0 = (H 0\nc \u2208 Rm\u00d73. As a result, the pair (A, H0) is all that is needed to perform local search on a\nfeatures H 0\nformula using the GNN variable selection heuristic.\nFor a single run of the local search algorithm, H 0\nv is set at \ufb01rst to re\ufb02ect a random initial assignment\nto variables and at each search iteration the row corresponding to the variable selected by the heuristic\nis modi\ufb01ed to \ufb02ip its assignment.\n\nc ) of stacked variable features H 0\n\nv , H 0\n\n4.3 Policy network architecture\n\nIn Section 3.3 we explained the abstract GNN architecture. The speci\ufb01c GNN that we implement,\nwhich we call the policy network along with the output layer, can be described as an instance of this\narchitecture. In particular, it has four different message functions, one for each combination of nodes\n(variable, clause) and edges (positive, negative), and two different update functions for variables\nand clauses. The policy network runs for T iterations, and as its components we have the following\nlearnable functions, with dependence on parameters omitted for brevity, where t \u2208 {1, . . . , T}:\n\n\u2022 M t\n\u2022 M t\n\u2022 U t\n\nvariable from clauses they positively and negatively occur in.\n\nv\u2212 : Rdt\u22121 \u2192 Rdt compute the incoming messages to each\nv+ : Rdt\u22121 \u2192 Rdt and M t\nc\u2212 : Rdt\u22121 \u2192 Rdt compute the incoming messages to each\nc+ : Rdt\u22121 \u2192 Rdt and M t\nclause from variables that occur positively and negatively in them.\nv : Rdt\u22121 \u00d7 Rdt \u2192 Rdt and U t\nc : Rdt\u22121 \u00d7 Rdt \u2192 Rdt update the representations of\nvariables and clauses based on their previous representation and the sum of the incoming\nmessages.\n\n\u2022 Z : RdT \u2192 R produces a score given the extracted node features of a variable.\n\nIn the actual implementation we have dt = d for t > 0, that is, the same at each iteration, and d0 = 3\nfor input features. With a slight abuse of notation we will assume that we can apply the functions\nabove to matrices, in which context they mean row-wise application. Having described the individual\ncomponents, let f t be the function, with dependence on parameters omitted again, that computes the\nnode representations of a graph with adjacency A at iteration t. We can write f t compactly as\n\nc\n\nM t\n\n(cid:18)\n\n(cid:21)(cid:19)\n\nv, H t\nc)\nv+(H t\u22121\nv\u2212(H t\u22121\n\n(cid:20)M t\n(cid:20)A+\nHT =(cid:0)f T \u25e6 f T\u22121 \u25e6 \u00b7\u00b7\u00b7 \u25e6 f 1(cid:1) (H0)\nv )(cid:1)\n\u02c6p = softmax(cid:0)Z(H T\n\nH t\u22121\n\n, U t\nc\n\n)\n)\n\n,\n\nU t\nv\n\n=\n\n.\nFor the same graph, the policy network computes a probability vector \u02c6p \u2208 Rn over variables as\n\nA\u2212\n\nM t\n\nv\n\nc\n\nc\n\n, [A+ A\u2212]\n\nwhere softmax(y) = exp(y)/(cid:80) exp(yi). In order to refer to the above computation concisely we\n\nde\ufb01ne the function \u03c0\u03b8 computed by the policy network as \u02c6p = \u03c0\u03b8(\u03c6, X) where we assume H0 will\nbe obtained from the initial assignment X, and A will be obtained from the formula \u03c6. As before, \u03b8\nis the list of all neural network parameters. In our implementation, all of M t\nc\u2212,\nc, Z are multilayer perceptrons (MLP). Also, we use T = 2 which is the minimum number of\nv, U t\nU t\niterations to allow messages to travel between variables that occur together in a clause. Remaining\nhyperparameters are described in Appendix A.\n\nv\u2212, M t\n\nv+, M t\n\nc+, M t\n\nf t((H t\u22121\n\n, H t\u22121\n\nc\n\n)) = (H t\n\n(cid:32)\n\n(cid:18)\n\nv\n\nH t\u22121\n\n(cid:21)(cid:62)(cid:20)M t\n\nc+(H t\u22121\nc\u2212(H t\u22121\n\nv\n\nv\n\n)\n)\n\n(cid:21)(cid:19)(cid:33)\n\n4\n\n\f(a) Random 3-SAT (b) Clique detection\n\n(c) Vertex cover\n\n(d) Graph coloring\n\n(e) Dominating set\n\nFigure 2: Examples of factor graphs exhibiting distinct structures, each corresponding to problems\nsampled from different distributions. Structures of the graphs corresponding to formulas have been\nstudied in the SAT solving community as a way to explain the effectiveness of CDCL [6] on industrial\nproblems [2]. With a similar intuition, our aim is to learn heuristics that can exploit the structure.\n\n5 Data\n\nAs our goal is to learn specialized heuristics for different classes of satis\ufb01ability problems, we\ngenerate problems from various distributions to train on. There are \ufb01ve problem classes we perform\nexperiments with: random 3-SAT, clique detection, vertex cover, graph coloring, dominating set.\nTable 1 presents the notation for the problem classes that we consider.\n\nTable 1: As before, n and m refer to\nthe number of variables and clauses. For\ngraph problems N refers to the number\nof vertices in the graph and p to the prob-\nability that an edge exists. k refers to the\nproblem speci\ufb01c size parameter. From\neach distribution D on the table we can\nsample a CNF formula \u03c6 \u223c D.\n\nDistribution\nClass\nRandom 3-SAT rand3(n, m)\nk-clique\ncliquek(N, p)\nk-cover\ncoverk(N, p)\nk-coloring\ncolork(N, p)\nk-domset\ndomsetk(N, p)\n\nRandom 3-SAT1 is of theoretical interest in computa-\ntional complexity and also serves as a common bench-\nmark for SLS-based SAT algorithms. The latter four are\nNP-complete graph problems. For each of these problems\nwe sample Erd\u02ddos\u2013R\u00e9nyi graphs [16] from the distribution\ndenoted as G(N, p) to mean that it has N > 0 vertices\nand between any two vertices an edge exists independently\nwith probability p \u2208 [0, 1]. Then we encode them as SAT\nproblems. As a result we obtain \ufb01ve parameterized ran-\ndom distributions that we can sample formulas from. For\ntraining and evaluation we generate problems of various\nsizes (made explicit later) from these \ufb01ve families of dis-\ntributions. It is worth noting that the generated problem\ninstances are not particularly dif\ufb01cult for state-of-the-art\nSAT solvers, and for our purpose they serve as simple\nbenchmarks to help demonstrate the feasibility of a purely\nlearning-based approach.\n\nIn creating problem instances we use CNFgen [30] to generate formulas and Minisat [15] to \ufb01lter out\nthe unsatis\ufb01able ones. Since our local search algorithm is an incomplete solver it can only \ufb01nd an\nassignment if one exists and otherwise returns unsolved.\n\n6 Training\n\n6.1 Markov decision process formulation\n\nTo learn heuristics through reinforcement learning [44], we formalize local search for SAT as a\nMarkov decision process (MDP). For each problem distribution D shown on Table 1 we have a\nseparate MDP represented as a tuple (SD,A,T ,R, \u03b3) with the following components:\n\n\u2022 SD is the set of possible states. Each state is characterized by a pair s = (\u03c6, X), the CNF\nformula and a truth assignment to its variables. At the start of an episode we sample a\nformula \u03c6 \u223c D with n variables and m clauses, and a uniformly random initial assignment\n1For random 3-SAT, experimental research [40, 14] indicates that formulas with a ratio of the number of\nclauses to the number of variables approximately 4.26 and a large enough number of variables are near the\nsatis\ufb01ability threshold, that is, the probability of a sampled formula being satis\ufb01able is near 1/2. We focus on\nrandom 3-SAT problems near the threshold.\n\n5\n\n\fX \u2208 {0, 1}n where each element is either 0 or 1 with probability 1/2. The episode\nterminates either when we arrive at a satisfying assignment or after L (a predetermined\nconstant) steps are taken.\n\nhave A(s) = {1, . . . , n} where n is the number of variables in \u03c6.\n\n\u2022 A is formally a function that maps states to available actions. For a state s = (\u03c6, X) we\n\u2022 T : SD \u00d7 {1, . . . , n} \u2192 SD is the transition function, mapping from a state-action pair to\nthe next state. It is de\ufb01ned as T (s, a) = T ((\u03c6, X), a) = (\u03c6, Flip(X, a)) where Flip simply\nnegates the assignment of the variable indexed by a \u2208 {1, . . . , n}.\n\u2022 R : SD \u2192 {0, 1} is the reward function, de\ufb01ned as R(s) = R((\u03c6, X)) = \u03c6(X), that is, 1\n\u2022 \u03b3 \u2208 (0, 1] is the discount factor, which we set to a constant strictly less than 1 in order to\n\nfor a satisfying assignment and 0 otherwise.\n\nencourage \ufb01nding solutions in fewer steps.\n\nWith the MDP de\ufb01ned as above, the problem of learning a good heuristic is equivalent to \ufb01nding\nan optimal policy \u03c0 which maximizes the expected accumulated reward obtained when starting at a\nrandom initial state and sampling a trajectory by taking actions according to \u03c0. In trying to \ufb01nd an\noptimal policy, we use the REINFORCE algorithm [47]. As our policy we have a function \u03c1\u03b8(\u03c6, X)\nwhich returns an action (variable index) a \u223c \u03c0\u03b8(\u03c6, X) where \u03c0\u03b8 is the policy network we described in\nSection 4.3. With probability 1/2, \u03c1\u03b8 returns a randomly selected variable from a randomly selected\nunsatis\ufb01ed clause.\nWhen training to learn a heuristic for a problem distribution D, at each training iteration we sample a\nformula \u03c6 \u223c D and generate multiple trajectories for the same formula with several different initial\nassignments X. Then we accumulate the policy gradient estimates from all trajectories and perform a\nsingle update to the parameters. Algorithm 2 in Appendix B shows the pseudocode for training.\n\n6.2 Curriculum learning\n\nIn our MDP formulation, a positive reward is achieved only when an episode ends at a state corre-\nsponding to a satisfying assignment. This means that to have non-zero policy gradient estimates, a\nsolution to the SAT problem has to be found. For dif\ufb01cult problems, we are unlikely to arrive at a\nsolution by taking uniformly random actions half the time, which is what happens at the beginning of\ntraining with a policy network that has randomly initialized parameters. This may not prevent learning,\nbut makes it prohibitively slow. In order to solve this problem, we opt for curriculum learning [8].\nWith curriculum learning, training is performed on a sequence of problems of increasing dif\ufb01culty.\nThe intuition is that the policy learned for easier problems should at least partially generalize to more\ndif\ufb01cult problems, and that this should lead to faster improvement compared to training directly on\nthe dif\ufb01cult problem of interest. Speci\ufb01cally, we follow an approach where we run the training loop\nin sequence for distributions of increasing size within the same problem class.\nAs an example, assuming our goal is to learn heuristics for rand3(25, 106), we begin by training to\nsolve rand3(5, 21) for which a positive reward is obtained often enough to yield a quick improvement\nin the policy. This improvement translates to larger problems, and we continue training the same\npolicy network to solve rand3(10, 43) for which it becomes easier to quickly \ufb01nd a solution compared\nto starting from scratch. More speci\ufb01cally, at the ith curriculum step we train on a small distribution\nDi for a \ufb01xed number of steps while evaluating on a slightly larger distribution Di+1. For the next\nstep, we train on Di+1 beginning with the model parameters from before that took the lowest median\nnumber of steps during evaluation on Di+1. In this manner we keep stepping up to larger problems\nand \ufb01nally we train on the distribution of the desired size.\n\n7 Experiments\n\nEvaluation As a baseline we have the SLS algorithm WalkSAT as described by Selman et al.\n[39]. Note that although there have been improvements over WalkSAT in the last three decades, we\nselected it due to its simplicity and its foundational role in the development of the state-of-the-art\nSAT solvers that use SLS [3, 10]. That being said, competing with the state-of-the-art SAT solvers in\nrun time is not our primary goal as the scalability of the heuristic computed by the GNN is currently\nbound to be lacking, which we expect will be alleviated by the ongoing rapid advancements in\n\n6\n\n\fTable 2: Performance of the learned heuristics and WalkSAT. In the \ufb01rst column, n and m on the\nsecond line of each cell refers to the maximum number of variables and clauses in the sampled\nformulas. For graph problems, the size of the graph G(N, p) and the size of the factor graph\ncorresponding to the SAT encoding are different. At each cell, there are three metrics (top to bottom):\naverage number of steps, median number of steps, percentage solved.\n\nrand3(n, m)\n\ncliquek(N, p)\n\ncoverk(N, p)\n\ncolork(N, p)\n\ndomsetk(N, p) WalkSAT\n\nrand3(50, 213)\n\nclique3(20, 0.05)\nn = 60, m = 1725\n\ncover5(9, 0.5)\nn = 55, m = 790\n\ncolor5(20, 0.5)\nn = 100, m = 480\n\ndomset4(12, 0.2)\nn = 60, m = 995\n\n367\n273\n84%\n529\n750\n48%\n749\n750\n0%\n675\n750\n16%\n729\n750\n0%\n\n743\n750\n0%\n116\n57\n\n100%\n750\n750\n0%\n748\n750\n0%\n660\n750\n16%\n\n749\n750\n0%\n623\n750\n16%\n181\n115\n100%\n750\n750\n0%\n304\n169\n76%\n\n736\n750\n0%\n743\n750\n0%\n750\n750\n0%\n342\n223\n88%\n748\n750\n0%\n\n642\n750\n20%\n725\n750\n0%\n224\n162\n100%\n645\n750\n16%\n205\n121\n100%\n\n385\n297\n80%\n237\n182\n100%\n319\n280\n96%\n416\n379\n80%\n217\n140\n100%\n\ndomain-speci\ufb01c processors. Hence, WalkSAT serves mostly as a \u201csanity-check\u201d for the learned\nheuristics. This has been the case for other purely neural network-based approaches to SAT solving or\ncombinatorial optimization, and these kinds of studies are currently more of scienti\ufb01c than practical\ninterest. Nevertheless, we provide a comparison to WalkSAT.\nIn order to evaluate the algorithms, we sample 50 satis\ufb01able formulas from \ufb01ve problem distributions\nto obtain evaluation sets, and perform 25 search trials (with each trial starting at a random initial\nassignment) using the learned heuristics and WalkSAT. Each algorithm runs for a maximum of\n750 steps unless speci\ufb01ed otherwise and has a walk probability of 1/2. We model our evaluation\nmethodology after that of KhudaBukhsh et al. [25] and report three metrics for each evaluation set on\nTable 2: the average number of steps, the median of the median number of steps (the inner median\nis over trials on each problem and the outer median is over the problems in the evaluation sets), the\npercentage of instances considered solved (median number of steps less than the allowed number of\nsteps). For all the results reported in the next section we follow the same method. Our implementation\nis available at https://github.com/emreyolcu/sat.\n\n7.1 Results\n\nComparison to WalkSAT Table 2 summarizes the performance of the learned heuristics and\nWalkSAT. Each column on the table corresponds to a heuristic trained to solve problems from a\ncertain class. For each heuristic we follow a curriculum as explained in Section 6.2, that is, we train\non incrementally larger problems of the same class and \ufb01nally train on the distribution that we want\nto evaluate the performance on. Each row of the table corresponds to an evaluation set.\nAfter training, the learned heuristics require fewer steps than WalkSAT to solve their respective\nproblems. The difference is larger for graph problems than it is for random 3-SAT, which makes\nsense as there is no particularly regular structure to exploit in its factor graph. While the number of\nsteps is reduced, we should emphasize that the running time of our algorithm is much longer than\nWalkSAT. With this work, our goal has not been to produce a practical SAT solver, but to demonstrate\nthat this approach might allow us to create specialized algorithms from scratch for speci\ufb01c problems.\n\nSpecialization to problem classes As expected, the performance of each heuristic degrades on\nunseen classes of problems compared to the one it is specialized to. This provides evidence that the\nlearned heuristics exploit class-speci\ufb01c features of the problems. Also, it is interesting to note that the\nperformance of the learned heuristics on vertex cover and dominating set problems correlate. They\nare indeed similar problems, and it is not surprising to see that a good heuristic for one also performs\nwell for the other. Their similarity is also visible from their example factor graphs in Figure 2.\n\n7\n\n\fFigure 3: Curriculum during training. The plots display the average number of steps taken over\nmultiple trials during training when learning to solve SAT problems. Each plot includes two curves,\none for the small distribution from which we sample training problems, and another for a \ufb01xed set of\nevaluation problems from a larger distribution of the same problem class.\n\nEffect of the curriculum Example training runs of the curriculum that we employ are shown in\nFigure 3. They demonstrate that the learned heuristic for a smaller distribution transfers to a slightly\nlarger distribution from the same class, which allows us to step up to larger problems.\n\nGeneric heuristics Each learned heuristic is ex-\npected to exploit the speci\ufb01c structure of its problem\nclass as much as possible. Intuition suggests that\nwhen there is no discernible structure to exploit, the\nresulting heuristic may generalize relatively well to\nother classes. As an experiment, we learn a heuristic\nto solve satis\ufb01able formulas from the SR distribution,\ncreated by Selsam et al. [see 42, section 4] to generate\nformulas that are dif\ufb01cult to classify as satis\ufb01able or\nunsatis\ufb01able based on crude statistics. We then eval-\nuate this heuristic on the graph problems. Random\n3-SAT near the threshold is another similarly dif\ufb01cult\ndistribution, although its dif\ufb01culty is an asymptotic\nstatement and not necessarily useful for our case with\nsmall formulas. Still, we repeat the experiment with\nrandom 3-SAT for completeness. Table 3 shows that\na heuristic learned on SR(10) generalizes better to\nthe slightly larger and different problem distributions\ncompared to a heuristic learned on rand3(10, 43).\n\nTable 3: Generalization from SR(10) to dif-\nferent problem distributions. Each cell dis-\nplays the average number of steps and the\npercentage of problems solved.\n\nSR(10)\n\nrand3(10, 43)\n\nclique3(10, 0.1)\nn = 30, m = 420\n\ncover4(8, 0.5)\nn = 40, m = 450\n\ncolor4(15, 0.5)\nn = 60, m = 200\n\ndomset4(12, 0.2)\nn = 60, m = 995\n\n363\n98%\n\n410\n86%\n\n204\n98%\n\n499\n70%\n\n649\n4%\n\n643\n16%\n\n390\n84%\n\n653\n16%\n\nSearch behavior Table 2 shows that the largest differences between the learned heuristics and\nWalkSAT are on clique detection and vertex cover problems. In order to gain an understanding of\nhow the learned heuristics behave differently, we look at a few statistics of how they traverse the\nsearch space. Table 4 shows the ratio of the \ufb02ips during the search that \ufb02ip a previously \ufb02ipped\nvariable (undo), move to a solution that increases (upwards), does not change (sideways), or decreases\n(downwards) the number of unsatis\ufb01ed clauses. The reported statistics are computed by taking into\naccount only the \ufb02ips that are made deterministically by the heuristics, although if a variable was\npreviously randomly \ufb02ipped and the heuristic chooses to \ufb02ip the same variable again then this is\ncounted as an \u201cundo\u201d. The striking difference is that the learned heuristics make sideways moves far\nless often, and instead appear to zoom in on a solution with downwards moves. It is also intriguing\nthat they make \u201cbad\u201d moves (upwards) with relatively high frequency.\n\nTable 4: Macroscopic comparison of the search behaviors of the learned heuristics and WalkSAT.\n\nclique3(20, 0.05)\n\ncover5(9, 0.5)\n\nHeuristic\nLearned\nWalkSAT\n\nUndo Upwards\n0.26\n0.33\n\n0.28\n0.16\n\nSideways Downwards\n\n0.14\n0.54\n\n0.57\n0.29\n\nUndo Upwards\n0.37\n0.39\n\n0.38\n0.24\n\nSideways Downwards\n\n0.13\n0.40\n\n0.48\n0.35\n\n8\n\n010002000050100150200rand3(5,21)rand3(10,43)010002000204060clique3(5,0.2)clique3(10,0.1)0100020000100200300cover2(5,0.5)cover3(7,0.5)0100020000100200300color3(5,0.5)color3(10,0.5)01000200050100150200domset2(5,0.2)domset3(7,0.2)0.00.20.40.60.81.0Iteration0.00.20.40.60.81.0Average#steps\fLearned WalkSAT\n\nTable 5: Performance of the learned heuristics\nand WalkSAT on problems with graphs from\ndistributions unseen during training. Each\ncell displays the average number of steps and\nthe percentage of problems solved.\n\nDifferent random graphs Above experiments on\nSAT encoded graph problems all use the Erd\u02ddos\u2013R\u00e9nyi\nmodel for random graphs. Although the results on\nTable 2 provide evidence that each heuristic is spe-\ncialized to its respective problem class, they do not\nallow us to conclude that the heuristics should still\nwork for different random graphs. Ideally, the heuris-\ntic learned on graph coloring problems should more\nheavily be exploiting the fact that the formula is the\nSAT encoding of a coloring problem than the fact\nthat the encoding is for a graph sampled according to\nthe Erd\u02ddos\u2013R\u00e9nyi model.\nTo test whether the heuristics can generalize to dif-\nferent random graphs, for each graph problem we\ngenerate satis\ufb01able instances on graphs from four dis-\ntributions (random regular [26], random geometric\n[36], Barab\u00e1si\u2013Albert [4], Watts\u2013Strogatz [46]) with\nthe number of vertices varying from 10 to 15 and use\nthese instances to re-evaluate the previously learned\nheuristics. Table 5 shows the results compared to WalkSAT. Each row on the table corresponds to a\nset of 60 problems with varying problem speci\ufb01c size parameters k and graphs sampled from the\nfour distributions. While the shift in the evaluation distribution causes some decline in the relative\nperformance of the learned heuristics against WalkSAT, they can still \ufb01nd solutions in fewer steps\nthan WalkSAT most of the time. Recall that the goal here is not necessarily to perform better than\nWalkSAT, but being able to do so provides evidence of the robustness of the learned heuristics.\n\n285\n98%\n370\n92%\n193\n100%\n255\n92%\n\ncliquek\n3 \u2264 k \u2264 5\ncoverk\n4 \u2264 k \u2264 6\n\ncolork\n3 \u2264 k \u2264 5\ndomsetk\n2 \u2264 k \u2264 4\n\n198\n100%\n306\n90%\n162\n100%\n271\n92%\n\nLarger problems\nIn order to show the extent to\nwhich a learned heuristic can generalize to larger\nproblems, we run the heuristic learned only on the\nsmall problems from SR(10) to solve satis\ufb01able\nproblems sampled from SR(n) for much larger n\nwith a varying number of maximum steps. Figure 4\nshows that as we increase the number of maximum\niterations, the percentage of problems solved scales\naccordingly with no sign of plateauing. For compar-\nison, WalkSAT results are also included.\n\n8 Conclusion\n\nFigure 4: Generalization of the SR(10)\nheuristic to larger problems. Solid and dashed\nlines correspond to the learned heuristic and\nWalkSAT, respectively.\n\nWe have shown that it is possible to use a graph\nneural network to learn distribution-speci\ufb01c variable\nselection heuristics for local search from scratch. As\nformulated, the approach does not require access to\nthe solutions of the training problems, and this makes it viable to automatically learn specialized\nheuristics for a variety of problem distributions. While the learned heuristics are not competitive with\nthe state-of-the-art SAT solvers in run time, after specialization they can \ufb01nd solutions consistently in\nfewer steps than a generic heuristic.\nThere are a number of avenues for improvement. Arguably, the most crucial next step would be\nto reduce the cost of the heuristic to scale to larger problem instances. If a GNN is to be used for\ncomputing the heuristic, more lightweight architectures than we have can be of interest. Also, in\nthis work we focused on stochastic local search, however, the SAT solvers used in practice perform\nbacktracking search. Consequently, model-based algorithms such as Monte Carlo tree search [12]\ncan provide critical improvements. As a step towards practicality, it is also important to incorporate\nthe suite of heuristics in an algorithm portfolio. In this work, we have achieved promising results for\nlearning heuristics, and we hope that this helps pave the way for automated algorithm design.\n\n9\n\n102103104Numberofmaximumiterations020406080100PercentageofproblemssolvedLearned,SR(10)Learned,SR(20)Learned,SR(40)Learned,SR(60)Learned,SR(80)WalkSAT,SR(10)WalkSAT,SR(20)WalkSAT,SR(40)WalkSAT,SR(60)WalkSAT,SR(80)\fReferences\n[1] Saeed Amizadeh, Sergiy Matusevych, and Markus Weimer. Learning to solve Circuit-SAT: An\nunsupervised differentiable approach. In International Conference on Learning Representations.\n2019.\n\n[2] Carlos Ans\u00f3tegui, Jes\u00fas Gir\u00e1ldez-Cru, and Jordi Levy. The community structure of SAT\nformulas. In Theory and Applications of Satis\ufb01ability Testing \u2013 SAT 2012, pages 410\u2013423,\n2012.\n\n[3] Adrian Balint and Uwe Sch\u00f6ning. Choosing probability distributions for stochastic local search\nand the role of make versus break. In Theory and Applications of Satis\ufb01ability Testing \u2013 SAT\n2012, pages 16\u201329, 2012.\n\n[4] Albert-L\u00e1szl\u00f3 Barab\u00e1si and R\u00e9ka Albert. Emergence of scaling in random networks. Science,\n\n286(5439):509\u2013512, October 1999.\n\n[5] Peter W. Battaglia, Jessica B. Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zam-\nbaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner,\nCaglar Gulcehre, Francis Song, Andrew Ballard, Justin Gilmer, George Dahl, Ashish Vaswani,\nKelsey Allen, Charles Nash, Victoria Langston, Chris Dyer, Nicolas Heess, Daan Wierstra,\nPushmeet Kohli, Matt Botvinick, Oriol Vinyals, Yujia Li, and Razvan Pascanu. Relational\ninductive biases, deep learning, and graph networks. arXiv:1806.01261, October 2018.\n\n[6] Roberto J. Bayardo, Jr. and Robert C. Schrag. Using CSP look-back techniques to solve\nreal-world SAT instances. In Proceedings of the Fourteenth National Conference on Arti\ufb01cial\nIntelligence and Ninth Conference on Innovative Applications of Arti\ufb01cial Intelligence, pages\n203\u2013208, 1997.\n\n[7] Irwan Bello, Hieu Pham, Quoc V. Le, Mohammad Norouzi, and Samy Bengio. Neural combi-\n\nnatorial optimization with reinforcement learning. arXiv:1611.09940, January 2017.\n\n[8] Yoshua Bengio, J\u00e9r\u00f4me Louradour, Ronan Collobert, and Jason Weston. Curriculum learning.\nIn Proceedings of the 26th Annual International Conference on Machine Learning, pages 41\u201348,\n2009.\n\n[9] Yoshua Bengio, Andrea Lodi, and Antoine Prouvost. Machine learning for combinatorial\n\noptimization: A methodological tour d\u2019horizon. arXiv:1811.06128, November 2018.\n\n[10] Armin Biere. Yet another local search solver and Lingeling and friends entering the SAT\n\ncompetition 2014. In Proceedings of SAT Competition 2014, pages 39\u201340. 2014.\n\n[11] Armin Biere, Marijn J. H. Heule, Hans van Maaren, and Toby Walsh. Handbook of Satis\ufb01ability.\n\nIOS Press, 2009.\n\n[12] Cameron B. Browne, Edward Powley, Daniel Whitehouse, Simon M. Lucas, Peter I. Cowling,\nPhilipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis, and Simon Colton.\nA survey of Monte Carlo tree search methods. IEEE Transactions on Computational Intelligence\nand AI in Games, 4(1):1\u201343, March 2012.\n\n[13] Stephen A. Cook. The complexity of theorem-proving procedures. In Proceedings of the Third\n\nAnnual ACM Symposium on Theory of Computing, pages 151\u2013158, 1971.\n\n[14] James M. Crawford and Larry D. Auton. Experimental results on the crossover point in random\n\n3-SAT. Arti\ufb01cial Intelligence, 81(1):31\u201357, March 1996.\n\n[15] Niklas E\u00e9n and Niklas S\u00f6rensson. An extensible SAT-solver. In Theory and Applications of\n\nSatis\ufb01ability Testing \u2013 SAT 2004, pages 502\u2013518, 2004.\n\n[16] Paul Erd\u02ddos and Alfr\u00e9d R\u00e9nyi. On random graphs. I. Publicationes Mathematicae, 6:290\u2013297,\n\n1959.\n\n[17] Andreas Fr\u00f6hlich, Armin Biere, Christoph Wintersteiger, and Youssef Hamadi. Stochastic local\nsearch for satis\ufb01ability modulo theories. In Proceedings of the Twenty-Ninth AAAI Conference\non Arti\ufb01cial Intelligence, pages 1136\u20131143, 2015.\n\n[18] Alex S. Fukunaga. Evolving local search heuristics for SAT using genetic programming. In\n\nGenetic and Evolutionary Computation \u2013 GECCO 2004, pages 483\u2013494, 2004.\n\n[19] Alex S. Fukunaga. Automated discovery of local search heuristics for satis\ufb01ability testing.\n\nEvolutionary Computation, 16(1):31\u201361, March 2008.\n\n10\n\n\f[20] Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl.\nIn Proceedings of the 34th International\n\nNeural message passing for quantum chemistry.\nConference on Machine Learning, pages 1263\u20131272, 2017.\n\n[21] Michele Gori, Gabriele Monfardini, and Franco Scarselli. A new model for learning in graph\ndomains. In Proceedings of the 2005 IEEE International Joint Conference on Neural Networks,\nvolume 2, pages 729\u2013734, July 2005.\n\n[22] Marketa Illetskova, Alex R. Bertels, Joshua M. Tuggle, Adam Harter, Samuel Richter, Daniel R.\nTauritz, Samuel Mulder, Denis Bueno, Michelle Leger, and William M. Siever. Improving\nperformance of CDCL SAT solvers by automated design of variable selection heuristics. In\n2017 IEEE Symposium Series on Computational Intelligence (SSCI), pages 1\u20138, November\n2017.\n\n[23] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training\nby reducing internal covariate shift. In Proceedings of the 32nd International Conference on\nInternational Conference on Machine Learning, pages 448\u2013456, 2015.\n\n[24] Elias Khalil, Hanjun Dai, Yuyu Zhang, Bistra Dilkina, and Le Song. Learning combinatorial\noptimization algorithms over graphs. In Advances in Neural Information Processing Systems\n30, pages 6348\u20136358. 2017.\n\n[25] Ashiqur R. KhudaBukhsh, Lin Xu, Holger H. Hoos, and Kevin Leyton-Brown. SATenstein:\nAutomatically building local search SAT solvers from components. Arti\ufb01cial Intelligence, 232:\n20\u201342, March 2016.\n\n[26] Jeong Han Kim and Van H. Vu. Generating random regular graphs. In Proceedings of the\n\nThirty-Fifth Annual ACM Symposium on Theory of Computing, pages 213\u2013222, 2003.\n\n[27] Wouter Kool, Herke van Hoof, and Max Welling. Attention, learn to solve routing problems! In\n\nInternational Conference on Learning Representations. 2019.\n\n[28] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. ImageNet classi\ufb01cation with deep\nconvolutional neural networks. In Advances in Neural Information Processing Systems 25,\npages 1097\u20131105. 2012.\n\n[29] Michail G. Lagoudakis and Michael L. Littman. Learning to select branching rules in the DPLL\nprocedure for satis\ufb01ability. Electronic Notes in Discrete Mathematics, 9:344\u2013359, June 2001.\n[30] Massimo Lauria, Jan Elffers, Jakob Nordstr\u00f6m, and Marc Vinyals. CNFgen: A generator of\ncrafted benchmarks. In Theory and Applications of Satis\ufb01ability Testing \u2013 SAT 2017, pages\n464\u2013473, 2017.\n\n[31] Gil Lederman, Markus N. Rabe, Edward A. Lee, and Sanjit A. Seshia. Learning heuristics for\n\nautomated reasoning through deep reinforcement learning. arXiv:1807.08058, April 2019.\n\n[32] Zhuwen Li, Qifeng Chen, and Vladlen Koltun. Combinatorial optimization with graph convolu-\ntional networks and guided tree search. In Advances in Neural Information Processing Systems\n31, pages 539\u2013548. 2018.\n\n[33] Jia Hui Liang, Vijay Ganesh, Pascal Poupart, and Krzysztof Czarnecki. Exponential recency\nweighted average branching heuristic for SAT solvers. In Proceedings of the Thirtieth AAAI\nConference on Arti\ufb01cial Intelligence, pages 3434\u20133440, 2016.\n\n[34] Jia Hui Liang, Vijay Ganesh, Pascal Poupart, and Krzysztof Czarnecki. Learning rate based\nbranching heuristic for SAT solvers. In Theory and Applications of Satis\ufb01ability Testing \u2013 SAT\n2016, pages 123\u2013140, 2016.\n\n[35] Jia Hui Liang, Hari Govind V.K., Pascal Poupart, Krzysztof Czarnecki, and Vijay Ganesh. An\nempirical study of branching heuristics through the lens of global learning rate. In Theory and\nApplications of Satis\ufb01ability Testing \u2013 SAT 2017, pages 119\u2013135, 2017.\n\n[36] Mathew Penrose. Random Geometric Graphs. Oxford University Press, May 2003.\n[37] Stuart Russell and Peter Norvig. Arti\ufb01cial Intelligence: A Modern Approach. Prentice Hall\n\nPress, 3rd edition, 2009.\n\n[38] Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini.\nThe graph neural network model. IEEE Transactions on Neural Networks, 20(1):61\u201380, January\n2009.\n\n11\n\n\f[39] Bart Selman, Henry A. Kautz, and Bram Cohen. Local search strategies for satis\ufb01ability testing.\n\nIn Cliques, Coloring, and Satis\ufb01ability, pages 521\u2013532, 1993.\n\n[40] Bart Selman, David G. Mitchell, and Hector J. Levesque. Generating hard satis\ufb01ability problems.\n\nArti\ufb01cial Intelligence, 81(1):17\u201329, March 1996.\n\n[41] Daniel Selsam and Nikolaj Bj\u00f8rner. Guiding high-performance SAT solvers with unsat-core\npredictions. In Theory and Applications of Satis\ufb01ability Testing \u2013 SAT 2019, pages 336\u2013353,\n2019.\n\n[42] Daniel Selsam, Matthew Lamm, Benedikt B\u00fcnz, Percy Liang, Leonardo de Moura, and David L.\nIn International Conference on\n\nDill. Learning a SAT solver from single-bit supervision.\nLearning Representations. 2019.\n\n[43] David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur\nGuez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap,\nKaren Simonyan, and Demis Hassabis. A general reinforcement learning algorithm that masters\nchess, shogi, and Go through self-play. Science, 362(6419):1140\u20131144, December 2018.\n\n[44] Richard S. Sutton and Andrew G. Barto. Introduction to Reinforcement Learning. MIT Press,\n\n1st edition, 1998.\n\n[45] Tijmen Tieleman and Geoffrey E. Hinton. RMSProp: Divide the gradient by a running average\n\nof its recent magnitude. Slides of lecture \u201cNeural Networks for Machine Learning\u201d, 2012.\n\n[46] Duncan J. Watts and Steven H. Strogatz. Collective dynamics of \u2018small-world\u2019 networks. Nature,\n\n393(6684):440\u2013442, June 1998.\n\n[47] Ronald J. Williams. Simple statistical gradient-following algorithms for connectionist reinforce-\n\nment learning. Machine Learning, 8(3):229\u2013256, May 1992.\n\n[48] Lin Xu, Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. SATzilla: Portfolio-based\nalgorithm selection for SAT. Journal of Arti\ufb01cial Intelligence Research, 32(1):565\u2013606, June\n2008.\n\n12\n\n\f", "award": [], "sourceid": 4384, "authors": [{"given_name": "Emre", "family_name": "Yolcu", "institution": "Carnegie Mellon University"}, {"given_name": "Barnabas", "family_name": "Poczos", "institution": "Carnegie Mellon University"}]}