{"title": "Constructing Proofs in Symmetric Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 217, "page_last": 224, "abstract": null, "full_text": "Constructing Proofs in Symmetric Networks \n\nGadi Pinkas \nComputer Science Department \nWashington University \nCampus Box 1045 \nSt. Louis, MO 63130 \n\nAbstract \n\nThis paper considers the problem of expressing predicate calculus in con(cid:173)\nnectionist networks that are based on energy minimization. Given a first(cid:173)\norder-logic knowledge base and a bound k, a symmetric network is con(cid:173)\nstructed (like a Boltzman machine or a Hopfield network) that searches \nfor a proof for a given query. If a resolution-based proof of length no \nlonger than k exists, then the global minima of the energy function that \nis associated with the network represent such proofs. The network that \nis generated is of size cubic in the bound k and linear in the knowledge \nsize. There are no restrictions on the type of logic formulas that can be \nrepresented. The network is inherently fault tolerant and can cope with \ninconsistency and nonmonotonicity. \n\n1 \n\nIntroduction \n\nThe ability to reason from acquired knowledge is undoubtedly one of the basic and \nmost important components of human intelligence. Among the major tools for \nreasoning in the area of AI are deductive proof techniques. However, traditional \nmethods are plagued by intractability, inability to learn and adjust, as well as by \ninability to cope with noise and inconsistency. A connectionist approach may be \nthe missing link: fine grain, massively parallel architecture may give us real-time \napproximation; networks are potentially trainable and adjustable; and they may be \nmade tolerant to noise as a result of their collective computation. \nMost connectionist reasoning systems that implement parts of first-order logic \n(see for examples: (Holldobler 90], [Shastri et a1. 90]) use the spreading activation \nparadigm and usually trade expressiveness with time efficiency. In contrast, this \n217 \n\n\f218 \n\nPinkas \n\npaper uses the energy minimization paradigm (like [Derthick 88], [Ballard 86] and \n[Pinkas 91c]), representing an intractable problem, but trading time with correct(cid:173)\nness; i.e., as more time is given, the probability of converging to a correct answer \nincreases. \nSymmetric connectionist networks used for constraint satisfaction are the \ntarget platform [Hopfield 84b], [Hinton, Sejnowski 86], (peterson, Hartman 89], \n[Smolensky 86]. They are characterized by a quadratic energy function that should \nbe minimized. Some of the models in the family may be seen as performing a search \nfor a global minimum of their energy function. The task is therefore to represent \nlogic deduction that is bound by a finite proof length as energy minimization (with(cid:173)\nout a bound on the proof length, the problem is undecidable). When a query is \nclamped, the network should search for a proof that supports the query. If a proof \nto the query exists, then every global minimum of the energy function associated \nwith the network represents a proof. If no proof exists, the global minima represent \nthe lack of a proof. \nThe paper elaborates the propositional case; however, due to space limitations, the \nfirst-order (FOL) case is only sketched. For more details and full treatment of FOL \nsee [Pinkas 91j]. \n\n2 Representing proofs of propositional logic \n\nI'll start by assuming that the knowledge base is propositional. \nThe proof area: \nA proof is a list of clauses ending with the query such that every clause used is \neither an original clause, a copy (or weakening) of a clause that appears earlier in \nthe proof, or a result of a resolution step of the two clauses that appeared just \nearlier. The proof emerges as an activation pattern on special unit structures called \nthe proof area, and is represented in reverse to the common practice (the query \nappears first). For example: given a knowledge base of the following clauses: \n1) A \n2) ..,Av B vC \n3) ..,Bv D \n4) ..,CV D \nwe would like to prove the query D, by generating the following list of clauses: \n\n1) D \n2) A \n3) ..,Av D \n4) ..,CV D \n5) -.AVCv D \n6) -.Bv D \n7) ..,Av B vC \n\n(obtained by resolution of clauses 2 and 3 by canceling A). \n(original clause no. 1). \n(obtained by resolution of clauses 4 and 5 by canceling C). \n(original clause no. 4). \n(obtained by resolution of clauses 6 and 7 by canceling B). \n(original clause no. 3). \n(original clause no. 2). \n\nEach clause in the proof is either an original clause, a copy of a clause from earlier \nin the proof, or a resolution step. \nThe matrix C in figure 1, functions as a clause list. This list represents an ordered \nset of clauses that form the proof. The query clauses are clamped onto this area \n\n\fand activate hard constraints that force the rest of the units of the matrix to form \na valid proof (if it exists). \n\nConstructing Proofs in Symmetric Networks \n\n219 \n\nQuery: \n\nD \n\nA \n\n.,AvD \n\n-CvD \n\n.,AvCvD \n\n-JJvD \n\n.,AvBvC \n\n1 \nA 0 \nB \n\nC \n\nD \n\nn \n\n3 \n\n@ \n\nr - -\n\n1 0 \n\n4 \n\ne \n\nIN /2 C \n2 0 ~ \n4 0 ~ @ \n\nS 0 0@ \n6 0 \n7 0 0 G> \n\n3 0 0 \n\n~ \n\nG> \n!0~ \n\\G>J \n\nRES KB O'Y \n\nR \n\nk \n\n123 \n0 \n\n0 \n\n0 \n\nk \n\n0 \n\n0 \n\n0 \n\n0 \n\n0 \n\n0 \n0 \n\nk -\n\n2 \n\n3 \n\n4 \n\nk \n\n1 \n\n2 \n\n3 \n\n4 \n\nk \n\n1 \n\n2 \n\nk \n\n0 \n\n0 \n\n0 \n\np \n\n1 \n2 0 \n3 \n\n4 \n\nS \n\n6 \n\n7 \n\nt \n\n0 \n\n0 \n\n0 \n\nl\"igure 1: The proof area for a propositional case \n\nK \n\nD \n\nVariable binding is performed by dynamic allocation of instances using a technique \nsimilar to [Anand an et a!. 891 and [Barnden 91]. In this technique, if two symbols \nneed to be bound together, an instance is allocated from a pool of general purpose \ninstances, and is connected to both symbols. An instance can be connected to a \nliteral in a clause, to a predicate type, to a constant, to a function or to a slot \nof another instance (for example, a constant that is bound to the first slot of a \npredicate). \nThe clauses that participate in the proof are represented using a 3-dimensional \nmatrix (C.\",;) and a 2-dimensional matrix (P\";) as illustrated in figure 1. The \nrows of C represent clauses of the proof, while the rows of P represent atomic \n\n\f220 \n\nPinkas \n\npropositions. The columns of both matrices represent the pool of instances used for \nbinding propositions to clauses. \nA clause is a list of negative and positive instances that represent literals. The \ninstance thus behaves as a two-way pointer that binds composite structures like \nclauses with their constituents (the atomic propositions). A row i in the matrix \nC represents a clause which is composed of pairs of instances. If the unit C+,i,i is \nset, then the matrix represents a positive literal in clause i. If P A,i is also set, then \nC+,',j represents a positive literal of clause i that is bound to the atomic proposition \nA. Similarly C-\"J represents a negative literal. \nThe first row of matrix C in the figure is the query clause D. It contains only one \npositive literal that is bound to atomic proposition D via instance 4. For another \nexample consider the third row of the C which represents a clause of two literals: a \npositive one that is bound to D via instance 4, and a negative one bound to A via \ninstance 1 (it is the clause ..,A V D, generated as a result of a resolution step). \nParticipation in the proof: The vector IN represents whether clauses in C \nparticipate in the proof. In our example, all the clauses are in the proof; however, \nin the general case some of the rows of C may be meaningless. When IN. is on, it \nmeans that the clause i is in the proof and must be proved as well. Every clause that \nparticipates in the proof is either a result of a resolution step (RES. is set), a copy \nof a some clause (CPYi is set), or it is an original clause from the knowledge base \n(K B. is set). The second clause of C in figure 1 for example is an original clause \nof the knowledge base. If a clause j is copied, it must be in the proof itself and \ntherefore I Nj is set. Similarly, if clause i is a result of a resolution step, then the two \nresolved clauses must also be in the proof (I Ni+l,i and I Ni+2,i) and therefore must \nbe themselves resolvents, copies or originals. This chain of constraints continues \nuntil all constraints are satisfied and a valid proof is generated. \nPosting a query: The user posts a query clamping its clauses onto the first rows \nof C and setting the appropriate IN units. This indicates that the query clauses \nparticipate in the proof and should be proved by either a resolution step, a copy \nstep or by an original clause. Figure 1 represents the complete proof for the query \nD . We start by allocating an instance (4) for D in the P matrix, and clamping a \npositive literal D in the first row of C (C+,l ,4); the rest of the first row's units are \nclamped zero. The unit INl is biased (to have the value of one), indicating that \nthe query is in the proof; this cause a chain of constraints to be activated that are \nsatisfied only by a valid proof. If no proof exists, the I Nl unit will become zero; \ni.e., the global minima is obtained by setting I Nl to zero despite the bias. \nRepresenting resolutions steps: The vector RES is a structure of units that \nindicates which are the clauses in C that are obtained by a resolution step. If RES, \nis set, then the ith row is obtained by resolving row i + 1 of C with row i + 2. \nThus, the unit RESl in figure 1 indicates that the clause D of the first row of \nC is a resolvent of the second and the third rows of C representing ..,A V D and \nA respectfully. Two literals cancel each other if they have opposite signs and are \nrepresented by the same instance. In figure 1, literal A of the third row of C and \nliteral ..,A of the second row cancel each other, generating the clause of the first \nrow. \n\nThe rows of matrix R represent literals canceled by resolution steps. If row i of \n\n\fConstructing Proofs in Symmetric Networks \n\n221 \n\nC is the result of a resolution step, there must be one and only one instance j such \nthat both clause i + 1 and clause i + 2 include it with opposite signs. For example \n(figure 1): clause D in the first row of C is the result of resolving clause A with \nclause..,A V D which are in the second and third rows of C respectfully. Instance 1, \nrepresenting atomic proposition A, is the one that is canceled; RI,I is set therefore, \nindicating that clause 1 is obtained by a resolution step that cancels the literals of \ninstance 1. \nCopied and original clauses: The matrix D indicates which clauses are copied \nto other clauses in the proof area. Setting Di,i means that clause i is obtained by \ncopying (or weakening) clause j into clause i (the example does not use copy steps). \nThe matrix K indicates which original knowledge-base clauses participate in the \nproof. The unit Ki,J indicates that a clause i in the proof area is an original clause, \nand the syntax of the j-th clause in the knowledge base must be imposed on the \nunits of clause i. In figure 1 for example, clause 2 in the proof (the second row in \nC), assumes the identity of clause number 1 in the knowledge base and therefore \nK l ,2 is set. \n\n3 Constraints \n\nWe are now ready to specify the constraints that must be satisfied by the units so \nthat a proof is found. The constraints are specified as well formed logic formulas. \nFor example the formula (A V B) \"C imposes a constraint over the units (A,B,C) \nsuch that the only possible valid assignments to those units are (011), (101), (111). \nA general method to implement an arbitrary logical constraint on connectionist \nnetworks is shown in [Pinkas 90b]. Most of the constraints specified in this section \nare hard constraints; i.e., must be satisfied for a valid proof to emerge. Towards the \nend of this section, some soft constraints are presented. \n\nIn-proof constraints: If a clause participates in the proof, it must be either a \nresult of a resolution step, a copy step or an original clause. In logic, the constraints \nmay be expressed as: Vi : INi- RESi V CP'Yi V K Bi. The three units (per clause \ni) consist a winner takes all subnetwork (WTA). This means that only one of the \nthree units is actually set. The WTA constraints may be expressed as: \nRESi-..,CP'Yi \" ..,K Bi \nCP'Yi--,RESi \" ..,K Bi \nK Bi--,RESi \" ..,C P'Yi \nThe WTA property may be enforced by inhibitory connections between every pair \nof the three units. \nCopy constraints: If CPYi is set then clause i must be a copy of another clause \nj in the proof. This can be expressed as Vi : C P'Yi- V . (Di,i \" I Ni ). The rows of \nDare WTAs allowing-i to be a copy of only one j. In addition, if clause j is copied \nor weakened into clause i then every unit set in clause j must also be set in clause \ni. This may be specified as: Vi,j,l : Di,,- \u00abC+,.,' +- C+\",') \" (C_,.,' +- C_\",,\u00bb. \nResolution constraints: If a clause i is a result of resolving the two clauses \ni + 1 and i + 2, then there must be one and only one instance (j) that is canceled \n(represented by Ra,i)' and C. is obtained by copying both the instances of CHI and \nCH2, without the instance j. These constraints may be expressed as: \n\n\f222 \n\nPinkas \n\nVi : RE Si- Vi Ri,i \nat least one instance is canceled \nVi,j,j',j' \u00a2 j: Ri,i--'Ri,i' \nonly one instance is canceled (WT.t \nVi, j : ~,i-(C+,i+l,i \" C-,i+2,i) V (C-,i+1J \"C+,i+2,j) cancel literals with opposite signs. \nVi : RESi-INi+l \"INi+2 \nthe two resolvents are also in proof \nVi : RE Si-( C+,i,i +-+( C+,i+l,i V C+,i+2,i) \" \"'Ri,i \ncopy positive literals \nVi: RESi-(C-,iJ+-+(C-,i+1J V C-,i+2J) \" -'~,i \ncopy negative literals \n\nClause-instance constraints: The sign of an instance in a clause should be \nunique; therefore, any instance pair in the matrix Cis WTA: Vi, j : C+,i,i--,C-,iJ' \nThe columns of matrix P are WTAs since an instance is allowed to represent only \none atomic prop06ition: VA, i, B :F A : PA,i-\",PB,i. The rows of P may be also \nWTAs: VA,i,j:f; i: PA,i-\"'PA,j (this constraint is not imposed in the FOL case). \nKnowledge base constraints: If a clause i is an original knowledge base clause, \nthen there must be a clause j (out of the m original clauses) whose syntax is forced \nupon the units of the i-th row of matrix C. This constraint can be expressed as: \nVi : K Bi- Vj Ki,i' The rows of K are WTA networks so that only one original \nclause is forced on the units of clause i: Vi, j, j' :F j : K',i--,Ki,i\" \nThe only hard constraints that are left are those that force the syntax of a particular \nclause from the knowledge base. Assume for example that Ki,4 is set, meaning that \nclause i in C must have the syntax of the fourth clause in the knowledge base of our \nexample (..,CV D). Instances j and j' must be allocated to the atomic propositions \nC and D respectfully, and must appear also in clause i as the literals C-,iJ and \nC+,i,i\" The following constraints capture the syntax of (..,CV D): \nVi : Ki,4- V . (C_ ,iJ \" PC,i) \nVi: K i ,4-V; (D+,i,i \"Pc,i) \nFOL extension: \nIn first-order predicate logic (FOL) instead of atomic propositions we must deal \nwith predicates (see [pinkas 91j] for details). As in the propositional case, a literal \nin a clause is represented by a positive or negative instance; however, the instance \nmust be allocated now to a predicate name and may have slots to be filled by other \ninstances (representing functions and constants). To accommodate such complexity \na new matrix (NEST) is added, and the role of matrix P is revised. \nThe matrix P must accommodate now function names, predicate names and con(cid:173)\nstant names instead of just atomic propositions. Each row of P represents a name, \nand the columns represent instances that are allocated to those names. The rows \nof P that are associated with predicates and functions may contain several differ(cid:173)\nent instances of the same predicate or function, thus, they are not WTA anymore. \nIn order to represent compound terms and predicates, instances may be bound to \nslots of other instances. The new matrix (N ESn,i,p) is capable of representing \nsuch bindings. If N ESn,i,p is set, then instance i is bound to the p slot of instance \nj. The columns of NEST are WTA, allowing only one instance to be bound to \na certain slot of another instance. When a clause i is forced to have the syntax \nof some original clause I, syntactic constraints are triggered so that the literals of \nclause i become instantiated by the relevant predicates, functions, constants and \nvariables imposed by clause I. \n\nthere exists a negative literal that is bound to C; \nthere exists a positive literal that is bound to D. \n\n\fConstructing Proofs in Symmetric Networks \n\n223 \n\nUnification is implicitly obtained if two predicates are representing by the same \ninstance while still satisfying all the constraints (imposed by the syntax of the \ntwo clauses). When a resolution step is needed, the network tries to allocate the \nsame instance to the two literals that need to cancel each other. If the syntactic \nconstraints on the literals permit such sharing of an instance, then the attempt \nto share the instance is successful and a unification occurs (occur check is done \nimplicitly since the matrix NEST allows the only finite trees to be represented). \nMinimizing the violation of soft constraints: Among the valid proofs some \nare preferable to others. By means of soft constraints and optimization it is possible \nto encourage the network to search for preferred proofs. Theorem-proving thus is \nviewed as a constraint optimization problem. A weight may be assigned to each \nof the constraints [Pinkas 91c) and the network tries to minimize the weighted sum \nof the violated constraints, so that the set of the optimized solutions is exactly the \nset of the preferred proofs. For example, preference of proofs with most general \nunification is obtained by assignment of small penalties (negative bias) to every \nbinding of a function to a position of another instance (in NEST). Using similar \ntechniques, the network can be made to prefer shorter, more parsimonious or more \nreliable proofs, low-cost plans or even more specific arguments as in nonmonotonic \nreasonmg. \n\n4 Summary \n\nGiven a finite set T of m clauses, where n is the number of different predicates, \nfunctions and constants, and given also a bound k over the proof length, we can \ngenerate a network that searches for a proof with length not longer then k, for \na clamped query Q. If a global minimum is found then an answer is given as to \nwhether there exists such a proof, and the proof (with MGU's) may be extracted \nfrom the state of the visible units. Among the possible valid proofs the system \nprefers some \"better\" proofs by minimizing the violation of soft constraints. The \nconcept of \"better\" proofs may apply to applications like planning (minimize the \ncost), abduction (parsimony) and nonmonotonic reasoning (specificity). \nIn the propositional case the generated network is of O(k2 + km + kn) units and \nO( k 3 + km + kn) connections. For predica;te logic there are O( k3 + km + kn) units \nand connections, and we need to add O( Pm) connections and hidden units, where \ni is the complexity-level of the syntactic constraints [Pinkas 91j). \nThe results improve an earlier approach [Ballard 86]: There are no restrictions on \nthe rules allowed; every proof no longer than the bound is allowed; the network \nis compact and the representation of bindings (unifications) is efficient; nesting of \nfunctions and multiple uses of rules are allowed; only one relaxation phase is needed; \ninconsistency is allowed in the knowledge base, and the query does not need to be \nnegated and pre-wired (it can be clamped during query time). \nThe architecture discussed has a natural fault-tolerance capability: When a unit \nbecomes faulty, it simply cannot assume a role in the proof, and other units are \nallocated instead. \nAcknowledgment: I wish to thank Dana Ballard, Bill Ball, Rina Dechter, \nPeter Had dawy, Dan Kimura, Stan Kwasny, Ron Loui and Dave Touretzky for \n\n\f224 \n\nPinkas \n\nhelpful conunents. \n\nReferences \n\n[Anand an et al. 89] P. Anandan, S. Letovsky, E. Mjolsness, \"Connectionist variable \nbinding by optimization,\" Proceedings of the 11th Cognitive Science \nSociety 1989. \n\n[Ballard 86] D. H. Ballard \"Parallel Logical Inference and Energy Minimization,\" \nProceedings of the 5th National Conference on Artificial Intelligence, \nPhiladelphia, pp. 203-208, 1986. \n\n[Bamden 91] J .A. Barnden, \"Encoding complex symbolic data structures with some \nunusual connectionist techniques,\" in J.A Barnden and J.B. Pollack, \nAdvances in Connectionist and Neural Computation Theory 1, High(cid:173)\nlevel connectionist models, Ablex Publishing Corporation, 1991. \n\n[Derthick 88] M. Derthick \"Mundane reasoning by parallel constraint satisfaction,\" \nPhD thesis, CMU-CS-88-182 Carnegie Mellon University, Sept. 1988 \n[Hinton, Sejnowski 86] G.E Hinton and T.J. Sejnowski, \"Learning and re-learning \nin Boltzman Machines,\" in J. L. McClelland and D. E. Rumelhart, \nParallel Distributed Processing: Explorations in The Microstructure \nof Cognition I, pp. 282 - 317, MIT Press, 1986. \n\n[Holldobler 90] S. Holldobler, \"CHCL, a connectionist inference system for Horn \n\nlogic based on connection method and using limited resources,\" Inter(cid:173)\nnational Computer Science Institute TR-90-042, 1990. \n\n[Hopfield 84b] J. J. Hopfield \"Neurons with graded response have collective com(cid:173)\n\nputational properties like those of two-state neurons,\" Proceedings of \nthe National Academy of Sciences 81, pp. 3088-3092, 1984. \n\n[Peterson, Hartman 89] C. Peterson, E. Hartman, \"Explorations of mean field the(cid:173)\n\nory learning algorithm,\" Neural Networks t, no. 6, 1989. \n\n[Pinkas 90b] G. Pinkas, \"Energy minimization and the satisfiability of propositional \n\ncalculus,\" Neural Computation 9, no. 2, 1991. \n\n[Pinkas 91c] G. Pinkas, \"Propositional Non-Monotonic Reasoning and Inconsis(cid:173)\n\ntency in Synunetric Neural Networks,\" Proceedings of IlCAI, Sydney, \n1991. \n\n[Pinkas 91j] G. Pinkas, \"First-order logic proofs using connectionist constraint re(cid:173)\nlaxation,\" technical report, Department of Computer Science, Wash(cid:173)\nington University, WUCS-91-S4, 1991. \n\n[Shastri et al. 90] L. Shastri, V. Ajjanagadde, \"From simple associations to sys(cid:173)\n\ntematic reasoning: A connectionist representation of rules, variables \nand dynamic bindings,\" technical report, University of Pennsylvania, \nPhiladelphia, MS-CIS-90-0S, 1990. \n\n[Smolensky 86] P. Smolensky, \"Information processing in dynamic systems: Foun(cid:173)\ndations of harmony theory,\" in J.L.McClelland and D.E.Rumelhart, \nParallel Distributed Processing: Explorations in The Microstructure \nof Cognition I , MIT Press, 1986. \n\n\f", "award": [], "sourceid": 473, "authors": [{"given_name": "Gadi", "family_name": "Pinkus", "institution": null}]}