{"title": "Embedding Symbolic Knowledge into Deep Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 4233, "page_last": 4243, "abstract": "In this work, we aim to leverage prior symbolic knowledge to improve the performance of deep models. We propose a graph embedding network that projects propositional formulae (and assignments) onto a manifold via an augmented Graph Convolutional Network (GCN). To generate semantically-faithful embeddings, we develop techniques to recognize node heterogeneity, and semantic regularization that incorporate structural constraints into the embedding. Experiments show that our approach improves the performance of models trained to perform entailment checking and visual relation prediction. Interestingly, we observe a connection between the tractability of the propositional theory representation and the ease of embedding. Future exploration of this connection may elucidate the relationship between knowledge compilation and vector representation learning.", "full_text": "Embedding Symbolic Knowledge into Deep Networks\n\nYaqi Xie\u2217, Ziwei Xu\u2217, Mohan S Kankanhalli, Kuldeep S. Meel, Harold Soh\n\n{yaqixie, ziwei-xu, mohan, meel, harold}@comp.nus.edu.sg\n\nSchool of Computing\n\nNational University of Singapore\n\nAbstract\n\nIn this work, we aim to leverage prior symbolic knowledge to improve the per-\nformance of deep models. We propose a graph embedding network that projects\npropositional formulae (and assignments) onto a manifold via an augmented Graph\nConvolutional Network (GCN). To generate semantically-faithful embeddings, we\ndevelop techniques to recognize node heterogeneity, and semantic regularization\nthat incorporate structural constraints into the embedding. Experiments show that\nour approach improves the performance of models trained to perform entailment\nchecking and visual relation prediction. Interestingly, we observe a connection\nbetween the tractability of the propositional theory representation and the ease of\nembedding. Future exploration of this connection may elucidate the relationship\nbetween knowledge compilation and vector representation learning.\n\n1\n\nIntroduction\n\nThe recent advances in design and training methodology of deep neural networks [1] have led to wide-\nspread application of machine learning in diverse domains such as medical image classi\ufb01cation [2] and\ngame-playing [3]. Although demonstrably effective on a variety of tasks, deep NNs have voracious\nappetites; obtaining a good model typically requires large amounts of labelled data, even when the\nlearnt concepts could be described succinctly in symbolic representation. As a result, there has\nbeen a surge of interest in techniques that combine symbolic and neural reasoning [4] including a\ndiverse set of approaches to inject existing prior domain knowledge into NNs, e.g., via knowledge\ndistillation [5], probabilistic priors [6], or auxiliary losses [7]. However, doing so in a scalable\nand effective manner remains a challenging open problem. One particularly promising approach is\nthrough learned embeddings, i.e., real-vector representations of prior knowledge, that can be easily\nprocessed by NNs [8, 9, 10, 11, 12].\nIn this work, we focus on embedding symbolic knowledge expressed as logical rules. In sharp\ncontrast to connectionist NN structures, logical formulae are explainable, compositional, and can be\nexplicitly derived from human knowledge. Inspired by insights from the knowledge representation\ncommunity, this paper investigates embedding alternative representation languages to improve the\nperformance of deep networks. To this end, we focus on two languages: Conjunctive Normal Form\n(CNF) and decision-Deterministic Decomposable Negation Normal Form (d-DNNF) [13, 14]. Every\nBoolean formula can be succinctly represented in CNF, but CNF is intractable for most queries of\ninterest such as satis\ufb01ability. On the other hand, representation of Boolean formula in d-DNNF\nmay lead to exponential size blowup, but d-DNNF is tractable (polytime) for most queries such as\nsatis\ufb01ability, counting, enumeration and the like [14].\nIn comparison to prior work that treat logical formulae as symbol sequences, CNF and d-DNNF\nformulae are naturally viewed as graphs structures. Thus, we utilize recent Graph Convolutional Net-\nworks (GCNs) [15] (that are robust to relabelling of nodes) to embed logic graphs. We further employ\n\n\u2217Equal contribution and the rest of the authors are ordered alphabetically by last name.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fFigure 1: LENSR overview. Our GCN-based embedder projects logic graphs representing formulae\nor assignments onto a manifold where entailment is related to distance; satisfying assignments are\ncloser to the associated formula. Such a space enables fast approximate entailment checks \u2014 we use\nthis embedding space to form logic losses that regularize deep neural networks for a target task.\n\na novel method of semantic regularization to learn embeddings that are semantically consistent with\nd-DNNF formulae. In particular, we augment the standard GCN to recognize node heterogeneity and\nintroduce soft constraints on the embedding structure of the children of AND and OR nodes within the\nlogic graph. An overview of our Logic Embedding Network with Semantic Regularization (LENSR)\nis shown in Fig 1.\nOnce learnt, these logic embeddings can then be used to form a logic loss that guides NN training;\nthe loss encourages the NN to be consistent with prior knowledge. Experiments on a synthetic\nmodel-checking dataset show that LENSR is able to learn high quality embeddings that are predictive\nof formula satis\ufb01ability. As a real-world case-study, we applied LENSR to the challenging task of\nVisual Relation Prediction (VRP) where the goal is to predict relations between objects in images. Our\nempirical analysis demonstrates that LENSR signi\ufb01cantly outperforms baseline models. Furthermore,\nwe observe that LENSR with d-DNNF achieves a signi\ufb01cant performance improvement over LENSR\nwith CNF embedding. We propose the notion of embeddable-demanding to capture the observed\nbehavior of a plausible relationship between tractability of representation language and the ease of\nlearning vector representations.\nTo summarize, this paper contributes a framework for utilizing logical formulae in NNs. Different\nfrom prior work, LENSR is able to utilize d-DNNF structure to learn semantically-constrained\nembeddings. To the best of our knowledge, this is also the \ufb01rst work to apply GCN-based embeddings\nfor logical formulae, and experiments show the approach to be effective on both synthetic and real-\nworld datasets. Practically, the model is straight-forward to implement and use. We have made our\nsource code available online at https://github.com/ZiweiXU/LENSR. Finally, our evaluations\nsuggest a connection between the tractability of a normal form and its amenability to embedding;\nexploring this relationship may reveal deep connections between knowledge compilation [14] and\nvector representation learning.\n\n2 Background and Related Work\n\nLogical Formulae, CNF and d-DNNF Logical statements provide a \ufb02exible declarative language\nfor expressing structured knowledge. In this work, we focus on propositional logic, where a propo-\nsition p is a statement which is either True or False. A formula F is a compound of propositions\nconnected by logical connectives, e.g.\u00ac,\u2227,\u2228,\u21d2. An assignment \u03c4 is a function which maps propo-\nsitions to True or False. An assignment that makes a formula F True is said to satisfy F, denoted\n\u03c4 |= F.\nA formula that is a conjunction of clauses (a disjunction of literals) is in Conjunctive Normal Form\n(CNF). Let X be the set of propositional variables. A sentence in Negation Normal Form (NNF) is\nde\ufb01ned as a rooted directed acyclic graph (DAG) where each leaf node is labeled with True, False,\nx, or \u00acx, x \u2208 X; and each internal node is labeled with \u2227 or \u2228 and can have arbitrarily many\nchildren. Deterministic Decomposable Negation Normal Form (d-DNNF) [13, 14] further imposes\nthat the representation is: (i) Deterministic: An NNF is deterministic if the operands of \u2228 in all\nwell-formed boolean formula in NNF are mutually inconsistent; (ii) Decomposable: An NNF is\n\n2\n\n\u2026AND Semantic Regularization:OR Semantic Regularization:+++++++-------------Layer 1Layer 2Layer nkVTkVkdiag(VTkVk)k22AAACF3icbVDLSgMxFM3UV62vqks3wSLUhWWmCLosunFZoS/otEMmk7ahmQfJHbFM+xdu/BU3LhRxqzv/xrQdRFsP3MvhnHtJ7nEjwRWY5peRWVldW9/Ibua2tnd29/L7Bw0VxpKyOg1FKFsuUUzwgNWBg2CtSDLiu4I13eH11G/eMal4GNRgFLGOT/oB73FKQEtOvmSPccMZdmvTjs+wDewepJ94nPQnxR/nFNtjp9wtO/mCWTJnwMvESkkBpag6+U/bC2nsswCoIEq1LTOCTkIkcCrYJGfHikWEDkmftTUNiM9UJ5ndNcEnWvFwL5S6AsAz9fdGQnylRr6rJ30CA7XoTcX/vHYMvctOwoMoBhbQ+UO9WGAI8TQk7HHJKIiRJoRKrv+K6YBIQkFHmdMhWIsnL5NGuWSZJev2vFC5SuPIoiN0jIrIQheogm5QFdURRQ/oCb2gV+PReDbejPf5aMZIdw7RHxgf3xDQnfc=AAACF3icbVDLSgMxFM3UV62vqks3wSLUhWWmCLosunFZoS/otEMmk7ahmQfJHbFM+xdu/BU3LhRxqzv/xrQdRFsP3MvhnHtJ7nEjwRWY5peRWVldW9/Ibua2tnd29/L7Bw0VxpKyOg1FKFsuUUzwgNWBg2CtSDLiu4I13eH11G/eMal4GNRgFLGOT/oB73FKQEtOvmSPccMZdmvTjs+wDewepJ94nPQnxR/nFNtjp9wtO/mCWTJnwMvESkkBpag6+U/bC2nsswCoIEq1LTOCTkIkcCrYJGfHikWEDkmftTUNiM9UJ5ndNcEnWvFwL5S6AsAz9fdGQnylRr6rJ30CA7XoTcX/vHYMvctOwoMoBhbQ+UO9WGAI8TQk7HHJKIiRJoRKrv+K6YBIQkFHmdMhWIsnL5NGuWSZJev2vFC5SuPIoiN0jIrIQheogm5QFdURRQ/oCb2gV+PReDbejPf5aMZIdw7RHxgf3xDQnfc=AAACF3icbVDLSgMxFM3UV62vqks3wSLUhWWmCLosunFZoS/otEMmk7ahmQfJHbFM+xdu/BU3LhRxqzv/xrQdRFsP3MvhnHtJ7nEjwRWY5peRWVldW9/Ibua2tnd29/L7Bw0VxpKyOg1FKFsuUUzwgNWBg2CtSDLiu4I13eH11G/eMal4GNRgFLGOT/oB73FKQEtOvmSPccMZdmvTjs+wDewepJ94nPQnxR/nFNtjp9wtO/mCWTJnwMvESkkBpag6+U/bC2nsswCoIEq1LTOCTkIkcCrYJGfHikWEDkmftTUNiM9UJ5ndNcEnWvFwL5S6AsAz9fdGQnylRr6rJ30CA7XoTcX/vHYMvctOwoMoBhbQ+UO9WGAI8TQk7HHJKIiRJoRKrv+K6YBIQkFHmdMhWIsnL5NGuWSZJev2vFC5SuPIoiN0jIrIQheogm5QFdURRQ/oCb2gV+PReDbejPf5aMZIdw7RHxgf3xDQnfc=AAACF3icbVDLSgMxFM3UV62vqks3wSLUhWWmCLosunFZoS/otEMmk7ahmQfJHbFM+xdu/BU3LhRxqzv/xrQdRFsP3MvhnHtJ7nEjwRWY5peRWVldW9/Ibua2tnd29/L7Bw0VxpKyOg1FKFsuUUzwgNWBg2CtSDLiu4I13eH11G/eMal4GNRgFLGOT/oB73FKQEtOvmSPccMZdmvTjs+wDewepJ94nPQnxR/nFNtjp9wtO/mCWTJnwMvESkkBpag6+U/bC2nsswCoIEq1LTOCTkIkcCrYJGfHikWEDkmftTUNiM9UJ5ndNcEnWvFwL5S6AsAz9fdGQnylRr6rJ30CA7XoTcX/vHYMvctOwoMoBhbQ+UO9WGAI8TQk7HHJKIiRJoRKrv+K6YBIQkFHmdMhWIsnL5NGuWSZJev2vFC5SuPIoiN0jIrIQheogm5QFdURRQ/oCb2gV+PReDbejPf5aMZIdw7RHxgf3xDQnfc=Xelement-wisevj2Ciq(vj)122AAACR3icbVBNTxsxEPWmXzT9SsuRy6hRJXog2o2Q2iOCC0eQGkCKw8rrTIKL17u1Z2kjs/+OC1du/AUuHIqqHutNcmihI9l6em/G8/yyUitHcXwdtR49fvL02crz9ouXr16/6bx9d+CKykocyEIX9igTDrUyOCBFGo9KiyLPNB5mpzuNfniG1qnCfKFZiaNcTI2aKCkoUGnnmG+rKT8H7qo89eHOHAl56jnhD/KoMUdDG9+Vwxo4BzhLvwJXBngu6EQK7Xfq1Ku6ruHbetA+wkYCiyfT/nG/nXa6cS+eFzwEyRJ02bL20s4VHxeyarZKLZwbJnFJIy8sKamxbvPKYRkMiikOAzQiRzfy8xxq+BCYMUwKG44hmLN/T3iROzfLs9DZ2Hf3tYb8nzasaPJ55JUpK0IjF4smlQYqoAkVxsqiJD0LQEirgleQJ8IKSSH6JoTk/pcfgoN+L4l7yf5md2t7GccKW2Pv2TpL2Ce2xXbZHhswyS7YDfvJ7qLL6Db6Ff1etLai5cwq+6da0R+ht7GNAAACR3icbVBNTxsxEPWmXzT9SsuRy6hRJXog2o2Q2iOCC0eQGkCKw8rrTIKL17u1Z2kjs/+OC1du/AUuHIqqHutNcmihI9l6em/G8/yyUitHcXwdtR49fvL02crz9ouXr16/6bx9d+CKykocyEIX9igTDrUyOCBFGo9KiyLPNB5mpzuNfniG1qnCfKFZiaNcTI2aKCkoUGnnmG+rKT8H7qo89eHOHAl56jnhD/KoMUdDG9+Vwxo4BzhLvwJXBngu6EQK7Xfq1Ku6ruHbetA+wkYCiyfT/nG/nXa6cS+eFzwEyRJ02bL20s4VHxeyarZKLZwbJnFJIy8sKamxbvPKYRkMiikOAzQiRzfy8xxq+BCYMUwKG44hmLN/T3iROzfLs9DZ2Hf3tYb8nzasaPJ55JUpK0IjF4smlQYqoAkVxsqiJD0LQEirgleQJ8IKSSH6JoTk/pcfgoN+L4l7yf5md2t7GccKW2Pv2TpL2Ce2xXbZHhswyS7YDfvJ7qLL6Db6Ff1etLai5cwq+6da0R+ht7GNAAACR3icbVBNTxsxEPWmXzT9SsuRy6hRJXog2o2Q2iOCC0eQGkCKw8rrTIKL17u1Z2kjs/+OC1du/AUuHIqqHutNcmihI9l6em/G8/yyUitHcXwdtR49fvL02crz9ouXr16/6bx9d+CKykocyEIX9igTDrUyOCBFGo9KiyLPNB5mpzuNfniG1qnCfKFZiaNcTI2aKCkoUGnnmG+rKT8H7qo89eHOHAl56jnhD/KoMUdDG9+Vwxo4BzhLvwJXBngu6EQK7Xfq1Ku6ruHbetA+wkYCiyfT/nG/nXa6cS+eFzwEyRJ02bL20s4VHxeyarZKLZwbJnFJIy8sKamxbvPKYRkMiikOAzQiRzfy8xxq+BCYMUwKG44hmLN/T3iROzfLs9DZ2Hf3tYb8nzasaPJ55JUpK0IjF4smlQYqoAkVxsqiJD0LQEirgleQJ8IKSSH6JoTk/pcfgoN+L4l7yf5md2t7GccKW2Pv2TpL2Ce2xXbZHhswyS7YDfvJ7qLL6Db6Ff1etLai5cwq+6da0R+ht7GNAAACR3icbVBNTxsxEPWmXzT9SsuRy6hRJXog2o2Q2iOCC0eQGkCKw8rrTIKL17u1Z2kjs/+OC1du/AUuHIqqHutNcmihI9l6em/G8/yyUitHcXwdtR49fvL02crz9ouXr16/6bx9d+CKykocyEIX9igTDrUyOCBFGo9KiyLPNB5mpzuNfniG1qnCfKFZiaNcTI2aKCkoUGnnmG+rKT8H7qo89eHOHAl56jnhD/KoMUdDG9+Vwxo4BzhLvwJXBngu6EQK7Xfq1Ku6ruHbetA+wkYCiyfT/nG/nXa6cS+eFzwEyRJ02bL20s4VHxeyarZKLZwbJnFJIy8sKamxbvPKYRkMiikOAzQiRzfy8xxq+BCYMUwKG44hmLN/T3iROzfLs9DZ2Hf3tYb8nzasaPJ55JUpK0IjF4smlQYqoAkVxsqiJD0LQEirgleQJ8IKSSH6JoTk/pcfgoN+L4l7yf5md2t7GccKW2Pv2TpL2Ce2xXbZHhswyS7YDfvJ7qLL6Db6Ff1etLai5cwq+6da0R+ht7GN---Learned Embedding SpaceInput FormulaLENSRFormula EmbeddingSatisfying Assign.Unsatisfying Assign.-+-Info. weighted by AND nodesInfo. weighted by OR nodes\f(a) General Form\n\n(b) CNF\n\n(c) d-DNNF\n\n(d) d-DNNF DAG\n\nFigure 2: (a)\u2013(c) Logic graphs examples of the formula (p =\u21d2 q) \u2227 m \u2227 n in (a) General form,\n(b) CNF, (c) d-DNNF. This formula could encode a rule for \u201cperson wearing glasses\u201d where p\ndenotes wear(person,glasses), q denotes in(glasses,person), m denotes exist(person) and n denotes\nexist(glasses). (d) An example DAG showing a more complex d-DNNF logic rule.\n\ndecomposable if the operands of \u2227 in all well-formed boolean formula in the NNF are expressed on a\nmutually disjoint set of variables. In contrast to CNF and more general forms, d-DNNF has many\ndesirable tractability properties (e.g., polytime satis\ufb01ability and polytime model counting). These\ntractability properties make d-DNNF particularly appealing for complex AI applications [16].\nAlthough building d-DNNFs is a dif\ufb01cult problem in general, practical compilation can often be\nperformed in reasonable time. We use c2d [17], which can compile relatively large d-DNNFs; in our\nexperiments, it took less than 2 seconds to compile a d-DNNF from a CNF with 1000 clauses and\n1000 propositions on a standard workstation. Our GCN can also embed other logic forms expressible\nas graphs and thus, other logic forms (e.g., CNF) could be used when d-DNNF compilation is not\npossible or prohibitive\n\nLogic in Neural Networks\nIntegrating learning and reasoning remains a key problem in AI and\nencompasses various methods, including logic circuits [18], Logic Tensor Networks [19, 20], and\nknowledge distillation [5]. Our primary goal in this work is to incorporate symbolic domain knowl-\nedge into connectionist architectures. Recent work can be categorized into two general approaches.\nThe \ufb01rst approach augments the training objective with an additional logic loss as a means of applying\nsoft-constraints [7, 21, 22, 23]. For example, the semantic loss used in [7] quanti\ufb01es the probability\nof generating a satisfying assignment by randomly sampling from a predictive distribution. The\nsecond approach is via embeddings, i.e., learning vector based representations of symbolic knowledge\nthat can be naturally handled by neural networks. For example, the ConvNet Encoder [8] embeds\nformulae (sequences of symbols) using a stack of one-dimensional convolutions. TreeRNN [9] and\nTreeLSTM encoders [12, 24, 11] recursively encode formulae using recurrent neural networks.\nThis work adopts the second embedding-based approach and adapts the Graph Convolutional Network\n(GCN) [15] towards embedding logical formulae expressed in d-DNNF. The prior work discussed\nabove have focused largely on CNF (and more general forms), and have neglected d-DNNF despite its\nappealing properties. Unlike the ConvNet and TreeRNN/LSTM, our GCN is able to utilize semantic\ninformation inherent in the d-DNNF structure, while remaining invariant to proposition relabeling.\n\n3 Logic Embedding Network with Semantic Regularization\n\nIn this section, we detail our approach, from logic graph creation to model training and eventual use\non a target task. As a guide, Fig. 1 shows an overview of our model. LENSR specializes a GCN for\nd-DNNF formulae. A logical formula (and corresponding truth assignments) can be represented as a\ndirected or undirected graph G = (V,E) with N nodes, vi \u2208 V, and edges (vi, vj) \u2208 E. Individual\nnodes are either propositions (leaf nodes) or logical operators (\u2227,\u2228,\u21d2), where subjects and objects\nare connected to their respective operators. In addition to the above nodes, we augment the graph\nwith a global node, which is linked to all other nodes in the graph.\nAs a speci\ufb01c example (see Fig. 2), consider an image which contains a person and a pair of glasses.\nWe wish to determine the relation between them, e.g., whether the person is wearing the glasses. We\ncould use spatial logic to reason about this question; if the person is wearing the glasses, the image of\nthe glasses should be \u201cinside\u201d the image of the person. Expressing this notion as a logical rule, we\nhave: (wear(person,glasses) =\u21d2 in(glasses, person)) \u2227 exist(person) \u2227 exist(glasses). Although the\nexample rule above results in a tree structure, d-DNNF formulae are DAGs in general.\n\n3\n\n\u21d2\u2227pqnm\u2228-pqnm\u2227-pqpnm\u2227\u2228\u2227\u2227\u2228\u2227\u2228p-p-q-mm\u2227q\u2228n\u2227\u2227\f3.1 Logic Graph Embedder with Heterogeneous Nodes and Semantic Regularization\n\nWe embed logic graphs using a multi-layer Graph Convolutional Network [15], which is a \ufb01rst-order\napproximation of localized spectral \ufb01lters on graphs [25, 26]. The layer-wise propagation rule is,\n\n2 \u02dcA \u02dcD\u2212 1\n\nZ (l+1) = \u03c3(cid:16) \u02dcD\u2212 1\nwhere Z (l) are the learnt latent node embeddings at lth (note that Z (0) = X), \u02dcA = A + IN is the\nadjacency matrix of the undirected graph G with added self-connections via the identity matrix IN .\n\u02dcD is a diagonal degree matrix with \u02dcDii =(cid:80)j\n\u02dcAij. The layer-speci\ufb01c trainable weight matrices are\nW (l), and \u03c3(\u00b7) denotes the activation function. To better capture the semantics associated with the\nlogic graphs, we propose two modi\ufb01cations to the standard graph embedder: heterogenous node\nembeddings and semantic regularization.\n\n2 Z (l)W (l)(cid:17)\n\n(1)\n\nHeterogeneous Node Embedder.\nIn the default GCN embedder, all nodes share the same set of\nembedding parameters. However, different types of nodes have different semantics, e.g., compare\nan \u21d2 node v.s. a proposition node. Thus, learning may be improved by using distinct information\npropagation parameters for each node type. Here, we propose to use type-dependent logical gate\nweights and attributes, i.e., a different W (l) for each of the \ufb01ve node types (leaf, global, \u2227,\u2228,\u21d2).\nSemantic Regularization. d-DNNF logic graphs possess certain structural/semantic constraints,\nand we propose to incorporate these constraints into the embedding structure. More precisely, we\nregularize the children embeddings of \u2227 gates to be orthogonal. This intuitively corresponds to the\nconstraint that the children do not share variables (i.e., \u2227 is decomposable). Likewise, we propose\nto constrain the \u2228 gate children embeddings to sum up to a unit vector, which corresponds to the\nconstraint that one and only one child of \u2228 gate is true (i.e., \u2228 is deterministic). The resultant semantic\nregularizer loss is:\n\n(cid:96)r(F) = (cid:88)vi\u2208NO(cid:13)(cid:13)(cid:13) (cid:88)element-wise\n\n+ (cid:88)vk\u2208NA\nwhere q is our logic embedder, NO is the set of \u2228 nodes, NA is the set of \u2227 nodes, C\u2217 is the set of\nchild nodes of v\u2217, Vk = [q(v1), q(v2), ..., q(vl)] where vl \u2208 Ck.\n3.2 Embedder Training with a Triplet Loss\n\nq(vj) \u2212 1(cid:13)(cid:13)(cid:13)\n\nk Vk \u2212 diag(V T\n\nk Vk)(cid:107)2\n2,\n\nvj\u2208Ci\n\n2\n\n2\n\n(cid:107)V T\n\n(2)\n\nAs previously mentioned, LENSR minimizes distances between the embeddings of formulae and\nsatisfying assignments in a shared latent embedding space. To achieve this, we use a triplet loss that\nencourages formulae embeddings to be close to satisfying assignments, and far from unsatisfying\nassignments.\nFormally, let q(\u00b7) be the embedding produced by the modi\ufb01ed GCN embedder. Denote q(F) as\nthe embedding of d-DNNF logic graph for a given formula, and q(\u03c4T) and q(\u03c4F) as the assignment\nembeddings for a satisfying and unsatisfying assignment, respectively. For assignments, the logical\ngraph structures are simple and shallow; assignments are a conjunction of propositions p \u2227 q \u2227 . . . z\nand thus, the pre-augmented graph is a tree with one \u2227 gate. Our triplet loss is a hinge loss:\n(3)\nwhere d(x, y) is the squared Euclidean distance between vector x and vector y, m is the margin. We\nmake use of SAT solver, python-sat [27], to obtain the satisfying and unsatisfying assignments.\nTraining the embedder entails optimizing a combined loss:\n\n(cid:96)t(F, \u03c4T, \u03c4F) = max{d(q(F), q(\u03c4F)) \u2212 d(q(F), q(\u03c4T)) + m, 0},\n\nLemb =(cid:88)F (cid:88)\u03c4T,\u03c4F\n\n(cid:96)t(F, \u03c4T, \u03c4F) + \u03bbr(cid:96)r(F),\n\n(4)\n\nwhere (cid:96)t is the triplet loss above, (cid:96)r is the semantic regularization term for d-DNNF formulae, and \u03bbr\nis a hyperparameter that controls the strength of the regularization. The summation is over formulas\nand associated pairs of satisfying and unsatisfying assignments in our dataset. In practice, pairs of\nassignments are randomly sampled for each formula during training.\n\n4\n\n\f(a)\n\n(b)\n\n(c)\n\nFigure 3: (a) Prediction loss (on the training set) as training progressed (line shown is the average\nover 10 runs with shaded region representing the standard error); (b) Formulae satis\ufb01ability v.s.\ndistance in the embedding space, showing that LENSR learnt a good representation by projecting\nd-DNNF logic graphs; (c) Test accuracies indicate that the learned d-DNNF embeddings outperform\nthe general form and CNF embeddings, and are more robust to increasing formula complexity.\n\n3.3 Target Task Training with a Logic Loss\n\nFinally, we train the target model h by augmenting the per-datum loss with a logic loss (cid:96)logic :\n\n(cid:96) = (cid:96)c + \u03bb(cid:96)logic,\n\n(5)\n\n2 is the embedding distance between the formula related to the input\nwhere (cid:96)logic = (cid:107)q(Fx)\u2212 q(h(x))(cid:107)2\nx and the predictive distribution h(x), (cid:96)c is the task-speci\ufb01c loss (e.g., cross-entropy for classi\ufb01cation),\nand \u03bb is a trade-off factor. Note that the distribution h(x) may be any relevant predictive distribution\nproduced by the network, including intermediate layers. As such, intermediate outputs can be\nregularized with prior knowledge for later downstream processing. To obtain the embedding of h(x)\nas q(h(x)), we \ufb01rst compute an embedding for each predicted relationship pi by taking an average of\nthe relationship embeddings weighted by their predicted probabilities. Then, we construct a simple\n\nlogic graph G =(cid:86)i pi, which is embedded using q.\n\n4 Empirical Results: Synthetic Dataset\n\nIn this section, we focus on validating that d-DNNF formulae embeddings are more informative\nrelative to embeddings of general form and CNF formulae. Speci\ufb01cally, we conduct tests using a\nentailment checking problem; given the embedding of a formula f and the embedding of an assignment\n\u03c4, predict whether \u03c4 satis\ufb01es f.\n\nExperiment Setup and Datasets. We trained 7 different models using general, CNF, and d-DNNF\nformulae (with and without heterogenous node embedding and semantic regularization). For this\ntest, each LENSR model comprised 3 layers, with 50 hidden units per layer. LENSR produces 100-\ndimension embedding for each input formula/assignment. The neural network used for classi\ufb01cation\nis a 2-layer perceptron with 150 hidden units. We set m = 1.0 in Eqn. 3 and \u03bbr = 0.1 in Eqn. 4. We\nused grid search to \ufb01nd reasonable parameters.\nTo explicitly control the complexity of formulae, we synthesized our own dataset. The complexity of\na formula is (coarsely) re\ufb02ected by its number of variables nv and the maximum formula depth dm.\nWe prepared three datasets with (nv, dm) = (3, 3), (3, 6), (6, 6) and label their complexity as \u201clow\u201d,\n\u201cmoderate\u201d, and \u201chigh\u201d. We attempted to provide a good coverage of potential problem dif\ufb01culty: the\n\u201clow\u201d case represents easy problems that all the compared methods were expected to do well on, and\nthe \u201chigh\u201d case represents very challenging problems. For each formula, we use the python-sat\npackage [27] to \ufb01nd its satisfying and unsatisfying assignments. There are 1000 formulae in each\ndif\ufb01culty level. We take at most 5 satisfying assignments and 5 unsatisfying assignments for each\nformula in our dataset. We converted all formulae and assignments to CNF and d-DNNF.\n\n5\n\n\u25cf\u25cf\u25cf\u25cf\u25cf\u25cf\u25cf\u25cf\u25cf0.60.70.80.90.000.250.500.751.00Squared Euclidean Distance (normalized)Satisfiability (percentage)\u25cfGeneral FormCNFd\u2212DNNFPercentage of Satis\ufb01ed Clauses0.60.70.80.91.0LowModerateHighComplexityTest Performanc AccGeneral FormCNFd\u2212DNNF\fTable 1: Prediction accuracy and standard error over 10 independent runs with model using different\nforms of formulae and regularization. Standard error shown in brackets. \u201cHE\u201d means the model\nis a heterogeneous embedder, \u201cSR\u201d means the model is trained with semantic regularization. \u201c(cid:88)\u201d\ndenotes \u201cwith the respective property\u201c and \u201c-\u201d denotes \u201cNot Applicable\u201d. The best scores are in bold.\n\nFormula Form HE\n\nSR\n\nLow\n\nGeneral\n\nCNF\n\nd-DNNF\n\n-\n(cid:88)\n\n(cid:88)\n(cid:88)\n\n89.63 (0.25)\n-\n90.02 (0.18)\n-\n90.25 (0.15)\n-\n89.91 (0.31)\n(cid:88) 90.22 (0.23)\n90.27 (0.55)\n(cid:88) 90.35 (0.32)\n\nAcc.(%)\nModerate\n70.32 (0.89)\n71.19 (0.93)\n73.92 (1.01)\n82.49 (1.11)\n82.28 (1.40)\n81.30 (1.29)\n83.04 (1.58)\n\nHigh\n\n68.51 (0.53)\n69.42 (1.03)\n68.79 (0.69)\n70.56 (1.16)\n71.46 (1.17)\n70.54 (0.62)\n71.52 (0.54)\n\nResults and Discussion. Table 1 summarizes the classi\ufb01cation accuracies across the models and\ndatasets over 10 independent runs. In brief, the heterogeneous embedder with semantic regularization\ntrained on d-DNNF formulae outperforms the alternatives. We see that semantic regularization works\nbest when paired with heterogeneous node embedding; this is relatively unsurprising since the AND\nand OR operators are regularized differently and distinct sets of parameters are required to propagate\nrelevant information.\nIn our experiments, we found the d-DNNF model to converge faster than the CNF and general form\n(Fig. 3a). Utilizing both semantic regularization and heterogeneous node embedding further improves\nthe convergence rate. The resultant embedding spaces are also more informative of satis\ufb01ability;\nFig. 3b shows that the distances between the formulae and associated assignments better re\ufb02ect\nsatis\ufb01ability for d-DNNF. This results in higher accuracies (Fig. 3c), particularly on the moderate\ncomplexity dataset. We posit that the differences on the low and high regimes were smaller because\n(i) in the low case, all the methods performed reasonably well, and (ii) on the high regime, embedding\nthe constraints helps to a limited extent and points to avenues for future work.\nOverall, these results provide empirical evidence for our conjecture that d-DNNF are more amenable\nto embedding, compared to CNF and general form formulae.\n\n5 Visual Relation Prediction\n\nIn this section, we show how our framework can be applied to a real-world task \u2014 Visual Relation\nPrediction (VRP) \u2014 to train improved models that are consistent with both training data and prior\nknowledge. The goal of VRP is to predict the correct relation between two objects given visual\ninformation in an input image. We evaluate our method on VRD [28]. The VRD dataset contains\n5,000 images with 100 object categories and 70 annotated predicates (relations). For each image, we\nsample pairs of objects and induce their spatial relations. If there is no annotation for a pair of object\nin the dataset, we label it as having \u201cno-relation\u201d.\n\nPropositions and Constraints The logical rules for the VRP task consist of logical formulae\nspecifying constraints. In particular, there are three types of propositions in our model:\n\n\u2022 Existence Propositions\nThe existence of each object forms a proposition which is True if\nit exists in the image and False otherwise. For example, proposition p=exist(person) is True\nif a person is in the input image and False otherwise.\n\u2022 Visual Relation Propositions\nEach of the candidate visual relation together with its\nsubject and object forms a proposition. For example, wear(person, glasses) is a proposition\nand has value True if there is a person wearing glasses in the image and False otherwise.\n\u2022 Spatial Relation Propositions\nIn order to add spatial constraints, e.g. a person cannot\nwear the glasses if their bounding boxes do not overlap, we de\ufb01ne 10 types of spatial\nrelationships (illustrated in Fig. 4a). We assign a proposition for each spatial relation\nsuch that the proposition evaluation is True if the relation holds and False otherwise, e.g.\nin(glasses, person). Furthermore, exactly one spatial relation proposition for a \ufb01xed subject\n\n6\n\n\f(a)\n\n(b)\n\nFigure 4: (a) The 10 spatial relations used in the Visual Relation Prediction task, and an example\nimage illustrating the relation in(helmet, person).\n(b) A prediction comparison between neural\nnetworks trained w/ and w/o LENSR. A tick indicates a correct prediction. In this example, the\nmisleading effects of the street are corrected by spatial constraints on \u201cskate on\u201d.\n\nFigure 5: The framework we use to train VRP models with LENSR.\n\nobject pair is True, i.e. spatial relation propositions for a \ufb01xed subject object pair are mutually\nexclusive.\n\nThe above propositions are used to form two types of logical constraints:\n\n\u2022 Existence Constraints.\nThe prerequisite of any relation is that relevant objects exist in\nthe image. Therefore p(sub,obj) =\u21d2 (exist(sub) \u2227 exist(obj)), where p is any of the visual\nor spatial relations introduced above.\n\u2022 Spatial Constraints. Many visual relations hold only if a given subject and object follow\na spatial constraint. For example, a person cannot be wearing glasses if the bounding boxes\nfor the person and the glasses do not overlap. This observation gives us rules such as\nwear(person, glasses) =\u21d2 in(glasses, person).\n\nEach clause cij represents a constraint in image i where j is the constraint index, and each proposition\npjk represents a relation in image i where k is the proposition index in the constraint cij. We obtain\nthe relations directly from the annotations for the image and calculate the constraints based on the\nde\ufb01nitions above. Depending on the number of objects in image i, Fi can contain 50 to 1000 clauses\n\nFor each image i in the training set we can generate a set of clauses Fi = {cij} where cij =(cid:87)k pjk.\nand variables. All these clauses are combined together to form a formula F =(cid:86)i Fi.\n\n5.1 VRP Model Training\n\nUsing the above formulae, we train our VRP model in a two-step manner; \ufb01rst, we train the embedder\nq using only f. The embedder is a GCN with the same structure as described in Sec. 4. Then, the\nembedder is \ufb01xed and the target neural network h is trained to predict relation together with the logic\nloss (described in Sec. 3.3). The training framework is illustrated in Fig. 5. In our experiment, h is a\nMLP with 2 layers and 512 hidden units. To elaborate:\n\nEmbedder Training. For each training image, we generate an intermediate formula fi that only\ncontains propositions related to the current image I. To do so, we iterate over all clauses in f and add\n\n7\n\nSubjectObjectin(helmet, person)inoutrightabovebelowoverlap leftoverlap rightoverlap aboveoverlap belowleftGround Truth: Top-3 result of NN w/ LENSeR: \u2713 Top-3 result of NN w/o LENSeR: \u2713Relation PredictionInputKnowledge as FormulaeLogic EmbeddingPrediction EmbeddingLogic LossPrediction LossLossxyhh(x)qq(h(x))LpredLlogicLGAAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFl0ocsK9gFtKJPppB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueeMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGGdHIqWCzSj81LCF0QkasZ6kkMTNBNo88c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X//N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTommlC0LVVsCf7yyaukfVH3vbr/cFlr3BR1lOEETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AeWyRXQ==AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFl0ocsK9gFtKJPppB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueeMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGGdHIqWCzSj81LCF0QkasZ6kkMTNBNo88c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X//N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTommlC0LVVsCf7yyaukfVH3vbr/cFlr3BR1lOEETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AeWyRXQ==AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFl0ocsK9gFtKJPppB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueeMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGGdHIqWCzSj81LCF0QkasZ6kkMTNBNo88c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X//N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTommlC0LVVsCf7yyaukfVH3vbr/cFlr3BR1lOEETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AeWyRXQ==AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFl0ocsK9gFtKJPppB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueeMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGGdHIqWCzSj81LCF0QkasZ6kkMTNBNo88c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X//N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTommlC0LVVsCf7yyaukfVH3vbr/cFlr3BR1lOEETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AeWyRXQ==q()GAAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFl0ocsK9gFtKJPppB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueeMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGGdHIqWCzSj81LCF0QkasZ6kkMTNBNo88c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X//N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTommlC0LVVsCf7yyaukfVH3vbr/cFlr3BR1lOEETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AeWyRXQ==AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFl0ocsK9gFtKJPppB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueeMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGGdHIqWCzSj81LCF0QkasZ6kkMTNBNo88c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X//N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTommlC0LVVsCf7yyaukfVH3vbr/cFlr3BR1lOEETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AeWyRXQ==AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFl0ocsK9gFtKJPppB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueeMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGGdHIqWCzSj81LCF0QkasZ6kkMTNBNo88c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X//N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTommlC0LVVsCf7yyaukfVH3vbr/cFlr3BR1lOEETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AeWyRXQ==AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFl0ocsK9gFtKJPppB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueeMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGGdHIqWCzSj81LCF0QkasZ6kkMTNBNo88c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X//N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTommlC0LVVsCf7yyaukfVH3vbr/cFlr3BR1lOEETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AeWyRXQ==LENSRLabel (\u201cwear\u201d)ConvNet + Classi\ufb01er\fa clause ci into the intermediate formula if all subjects and objects of all literals in ci is in the image.\nThe formula fi is then appended with existence and spatial constraints de\ufb01ned in Sec. 5.\nTo obtain the vector representation of a proposition, we \ufb01rst convert its corresponding relation into\na phrase (e.g. p=wear(person, glasses) is converted to \u201cperson wear glasses\u201d). Then, the GLoVe\nembeddings [29] for each word are summed to form the embedding for the entire phrase. The formula\nis then either kept as CNF or converted to d-DNNF [17] depending on the embedder. Similar to\nSec. 4, the assignments of fi are found and used to train the embedder using the triplet loss (Eqn. 3).\n\nTarget Model Training. After q is trained, we \ufb01x its parameters and use it to train the relation\nprediction network h. In our relation prediction task, we assume the objects in the images are known;\nwe are given the object labels and bounding boxes. Although this is a strong assumption, object\ndetection is an upstream task that is handled by other methods and is not the focus of this work.\nIndeed, all compared approaches are provided with exactly the same information. The input to\nthe model is the image, and the labels and bounding boxes of all detected objects, for example:\n(I, [(table, [23, 78, 45, 109]), (cup, [10, 25, 22, 50])]). The network predicts a relation based on the\nvisual feature and the embedding of class labels:\n\n\u02c6y = h([r1, b1, r2, b2, v]),\n\n(6)\n\nwhere \u02c6y is the relation prediction, ri = GLoVe(labeli) is the GLoVe embedding for the class\nlabels of subjects and objects, bi is the relative bounding box positions of subjects and objects,\nv = ResNet(Ibbox1\u222abbox2) is the visual feature extracted from the union bounding box of the objects,\n[\u00b7] indicates concatenation of vectors. We compute the logic loss term as\n\nLlogic =(cid:13)(cid:13)(cid:13)q(f) \u2212 q(cid:16)(cid:94)i\n\npi(cid:17)(cid:13)(cid:13)(cid:13)\n\n,\n\n(7)\n\n2\n\n2\n\nwhere pi is the predicate for ith relation predicted to be hold in the image, and f is the formula generated\nfrom the input information. As previously stated, our \ufb01nal objective function is L = Lc + \u03bbLlogic\nwhere Lc is the cross entropy loss, and \u03bb = 0.1 is a trade-off factor. We optimized this objective\nusing Adam [30] with learning rate 10\u22123.\nAlthough our framework can be trained end-to-end, we trained the logic embedder and target network\nseparately to (i) alleviate potential loss \ufb02uctuations in joint optimization, and (ii) enable the same\nlogic embeddings to be used with different target networks (for different tasks). The networks could\nbe further optimized jointly to \ufb01ne-tune the embeddings for a speci\ufb01c task, but we did not perform\n\ufb01ne-tuning for this experiment.\n\n5.2 Empirical Results\n\nTable 2 summarizes our results and shows the top-5 accuracy score of the compared methods2. We\nclearly see that our GCN approach (with heterogeneous node embedding and semantic regularization)\nperforms far better than the baseline model without logic embeddings. Note also that direct application\nof d-DNNFs via the semantic loss [7] only resulted in marginal improvement over the baseline. A\npotential reason is that the constraints in VRP are more complicated than those explored in prior\nwork: there are thousands of propositions and a straightforward use of d-DNNFs causes the semantic\nloss to rapidly approach \u221e. Our embedding approach avoids this issue and thus, is able to better\nleverage the encoded prior knowledge. Our method also outperforms the state-of-the-art TreeLSTM\nembedder [12]; since RNN-based embedders are not invariant to variable-ordering, they may be less\nappropriate for symbolic expressions, especially propositional logic.\nAs a qualitative comparison, Fig. 4b shows an example where logic rules embedded by LENSR help\nthe target task model. The top-3 predictions of neural network trained with LENSR are all reasonable\nanswers for the input image. However, the top 3 relations predicted baseline model are unsatisfying\nand the model appears misled by the street between the subject and the object. LENSR leverages the\nlogic rules that indicate that the \u201cskate on\u201d relation requires the subject to be \u201cabove\u201d or \u201coverlap\nabove\u201d the object, which corrects for the effect of the street.\n\n2Top-5 accuracy was used as our performance metric because a given pair of objects may have multiple\n\nrelationships, and reasonable relations may not have been annotated in the dataset.\n\n8\n\n\fTable 2: Performance of VRP under different con\ufb01gurations. \u201cHE\u201d indicates a heterogeneous node\nembedder, \u201cSR\u201d means the model uses semantic regularization. \u201c(cid:88)\u201d denotes \u201cwith the respective\nproperty\u201c and \u201c-\u201d denotes \u201cNot Applicable\u201d. The best scores are in bold.\n\nModel\n\nForm\n\nwithout logic\n\nwith semantic loss [7]\n\nwith treeLSTM embedder [12]\n\n-\n-\n\nCNF\n\nd-DNNF\n\nLENSR\n\nCNF\n\nd-DNNF\n\nHE\n-\n-\n-\n-\n(cid:88)\n\n(cid:88)\n(cid:88)\n\nSR\n-\n-\n-\n-\n-\n-\n(cid:88)\n\n(cid:88)\n\nTop-5 Acc. (%)\n\n84.30\n84.76\n85.76\n82.99\n85.39\n85.70\n85.37\n88.01\n90.13\n92.77\n\n6 Discussion\n\nOur experimental results show an interesting phenomena\u2014the usage of d-DNNF (when paired with\nsemantic regularization) signi\ufb01cantly improved performance compared to other forms. This raises\na natural question of whether d-DNNF\u2019s embeddings are easier to learn. Establishing a complete\nformal connection between improved learning and compiled forms is beyond the scope of this work.\nHowever, we use the size of space of the formulas as a way to argue about ease of learning and\nformalize this through the concept of embeddable demanding.\n\nDe\ufb01nition 1 (Embeddable-Demanding) Let L1, L2 be two compilation languages. L1 is at least\nas embeddable-demanding as L2 iif there exists a polynomial p such that for every sentence \u03b1 \u2208\nL2,\u2203\u03b2 \u2208 L1 such that (i) |\u03b2| \u2264 p(|\u03b1|). Here |\u03b1|,|\u03b2| are the sizes of \u03b1, \u03b2 respectively, and \u03b2 may\ninclude auxiliary variables. (ii) The transformation from \u03b1 to \u03b2 is poly time. (iii) There exists a\nbijection between models of \u03b2 and models of \u03b1.\n\nTheorem 1 CNF is at least as embeddable-demanding as d-DNNF, but if d-DNNF is at least as\nembeddable-demanding as CNF then P = P P .\n\nThe proof and detailed theorem statement are provided in the Appendix. More broadly, Theorem 1\na \ufb01rst-step towards a more comprehensive theory of the embeddability for different logical forms.\nFuture work in this area could potentially yield interesting insights and new ways of leveraging\nsymbolic knowledge in deep neural networks.\n\n7 Conclusion\n\nTo summarize, this paper proposed LENSR, a novel framework for leveraging prior symbolic knowl-\nedge. By embedding d-DNNF formulae using an augmented GCN, LENSR boosts the performance of\ndeep NNs on model-checking and VRP tasks. The empirical results indicate that constraining embed-\ndings to be semantically faithful, e.g., by allowing for node heterogeneity and through regularization,\naids model training. Our work also suggests potential future bene\ufb01ts from a deeper examination of\nthe relationship between tractability and embedding, and the extension of semantic-aware embedding\nto alternative graph structures. Future extensions of LENSR that embed other forms of prior symbolic\nknowledge could enhance deep learning where data is relatively scarce (e.g., real-world interactions\nwith humans [31] or objects [32]). To encourage further development, we have made our source code\navailable online at https://github.com/ZiweiXU/LENSR.\n\nAcknowledgments\n\nThis work was supported in part by a MOE Tier 1 Grant to Harold Soh and by the National\nResearch Foundation Singapore under its AI Singapore Programme [AISG-RP-2018-005]. It was\nalso supported by the National Research Foundation, Prime Minister\u2019s Of\ufb01ce, Singapore under its\nStrategic Capability Research Centres Funding Initiative.\n\n9\n\n\fReferences\n[1] Y. LeCun, Y. Bengio, and G. E. Hinton, \u201cDeep learning,\u201d Nature, vol. 521, no. 7553, pp. 436\u2013\n\n444, 2015.\n\n[2] G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. van der Laak,\nB. van Ginneken, and C. I. S\u00e1nchez, \u201cA survey on deep learning in medical image analysis,\u201d\nMedical Image Analysis, vol. 42, pp. 60 \u2013 88, 2017.\n\n[3] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. R.\nBaker, M. Lai, A. Bolton, Y. Chen, T. P. Lillicrap, F. Hui, L. Sifre, G. van den Driessche,\nT. Graepel, and D. Hassabis, \u201cMastering the game of go without human knowledge,\u201d Nature,\nvol. 550, pp. 354\u2013359, 2017.\n\n[4] T. R. Besold, A. d. Garcez, S. Bader, H. Bowman, P. Domingos, P. Hitzler, K.-U. K\u00fchnberger,\nL. C. Lamb, D. Lowd, P. M. V. Lima, et al., \u201cNeural-symbolic learning and reasoning: A survey\nand interpretation,\u201d arXiv preprint arXiv:1711.03902, 2017.\n\n[5] Z. Hu, X. Ma, Z. Liu, E. H. Hovy, and E. P. Xing, \u201cHarnessing deep neural networks with logic\n\nrules,\u201d CoRR, vol. abs/1603.06318, 2016.\n\n[6] A. F. Ansari and H. Soh, \u201cHyperprior induced unsupervised disentanglement of latent representa-\ntions,\u201d in Proceedings of the AAAI Conference on Arti\ufb01cial Intelligence, vol. 33, pp. 3175\u20133182,\n2019.\n\n[7] J. Xu, Z. Zhang, T. Friedman, Y. Liang, and G. Van den Broeck, \u201cA semantic loss function for\n\ndeep learning with symbolic knowledge,\u201d in ICML, 2018.\n\n[8] Y. Kim, Y. Jernite, D. Sontag, and A. M. Rush, \u201cCharacter-aware neural language models,\u201d in\n\nAAAI, 2016.\n\n[9] M. Allamanis, P. Chanthirasegaran, P. Kohli, and C. A. Sutton, \u201cLearning continuous semantic\n\nrepresentations of symbolic expressions,\u201d in ICML, 2017.\n\n[10] R. Evans, D. Saxton, D. Amos, P. Kohli, and E. Grefenstette, \u201cCan neural networks understand\n\nlogical entailment?,\u201d in ICLR, 2018.\n\n[11] X.-D. Zhu, P. Sobhani, and H. Guo, \u201cLong short-term memory over recursive structures,\u201d in\n\nICML, 2015.\n\n[12] K. S. Tai, R. Socher, and C. D. Manning, \u201cImproved semantic representations from tree-\n\nstructured long short-term memory networks,\u201d in ACL, 2015.\n\n[13] A. Darwiche, \u201cOn the tractability of counting theory models and its application to belief revision\n\nand truth maintenance,\u201d Journal of Applied Non-classical Logics - JANCL, vol. 11, 01 2001.\n\n[14] A. Darwiche and P. Marquis, \u201cA knowledge compilation map,\u201d J. Artif. Int. Res., vol. 17,\n\npp. 229\u2013264, Sept. 2002.\n\n[15] T. N. Kipf and M. Welling, \u201cSemi-supervised classi\ufb01cation with graph convolutional networks,\u201d\n\nin ICLR, 2017.\n\n[16] A. Darwiche, \u201cDecomposable negation normal form,\u201d J. ACM, vol. 48, pp. 608\u2013647, July 2001.\n[17] A. Darwiche, \u201cNew advances in compiling CNF into decomposable negation normal form,\u201d in\n\nECAI, 2004.\n\n[18] Y. Liang and G. Van den Broeck, \u201cLearning logistic circuits,\u201d in AAAI, 2019.\n[19] I. Donadello, L. Sera\ufb01ni, and A. S. d\u2019Avila Garcez, \u201cLogic tensor networks for semantic image\n\ninterpretation,\u201d in IJCAI, 2017.\n\n[20] L. Sera\ufb01ni and A. S. d\u2019Avila Garcez, \u201cLogic tensor networks: Deep learning and logical\n\nreasoning from data and knowledge,\u201d in NeSy@HLAI, 2016.\n\n[21] R. Stewart and S. Ermon, \u201cLabel-free supervision of neural networks with physics and domain\n\nknowledge,\u201d in AAAI, 2017.\n\n[22] T. Rockt\u00e4schel, S. Singh, and S. Riedel, \u201cInjecting logical background knowledge into embed-\n\ndings for relation extraction,\u201d in HLT-NAACL, 2015.\n\n[23] T. Demeester, T. Rockt\u00e4schel, and S. Riedel, \u201cLifted rule injection for relation embeddings,\u201d in\n\nEMNLP, 2016.\n\n10\n\n\f[24] P. Le and W. Zuidema, \u201cCompositional distributional semantics with long short term memory,\u201d\n\nin *SEM@NAACL-HLT, 2015.\n\n[25] D. K. Hammond, P. Vandergheynst, and R. Gribonval, \u201cWavelets on graphs via spectral graph\n\ntheory,\u201d Applied and Computational Harmonic Analysis, vol. 30, no. 2, pp. 129 \u2013 150, 2011.\n\n[26] M. Defferrard, X. Bresson, and P. Vandergheynst, \u201cConvolutional neural networks on graphs\n\nwith fast localized spectral \ufb01ltering,\u201d in NIPS, 2016.\n\n[27] A. Ignatiev, A. Morgado, and J. Marques-Silva, \u201cPySAT: A Python toolkit for prototyping with\n\nSAT oracles,\u201d in SAT, pp. 428\u2013437, 2018.\n\n[28] C. Lu, R. Krishna, M. Bernstein, and L. Fei-Fei, \u201cVisual relationship detection with language\n\npriors,\u201d in ECCV, pp. 852\u2013869, Springer, 2016.\n\n[29] J. Pennington, R. Socher, and C. Manning, \u201cGlove: Global vectors for word representation,\u201d in\n\nEMNLP, 2014.\n\n[30] D. P. Kingma and J. Ba, \u201cAdam: A method for stochastic optimization,\u201d CoRR,\n\nvol. abs/1412.6980, 2015.\n\n[31] H. Soh and Y. Demiris, \u201cLearning assistance by demonstration: Smart mobility with shared\ncontrol and paired haptic controllers,\u201d J. Hum.-Robot Interact., vol. 4, pp. 76\u2013100, Dec. 2015.\n[32] T. Taunyazov, H. F. Koh, Y. Wu, C. Cai, and H. Soh, \u201cTowards effective tactile identi\ufb01cation\nof textures using a hybrid touch approach,\u201d in 2019 International Conference on Robotics and\nAutomation (ICRA), pp. 4269\u20134275, IEEE, 2019.\n\n11\n\n\f", "award": [], "sourceid": 2381, "authors": [{"given_name": "Yaqi", "family_name": "Xie", "institution": "National University of Singapore"}, {"given_name": "Ziwei", "family_name": "Xu", "institution": "National University of Singapore"}, {"given_name": "Mohan", "family_name": "Kankanhalli", "institution": "National University of Singapore,"}, {"given_name": "Kuldeep S", "family_name": "Meel", "institution": "National University of Singapore"}, {"given_name": "Harold", "family_name": "Soh", "institution": "National University of Singapore (NUS)"}]}