{"title": "Discriminative Gaifman Models", "book": "Advances in Neural Information Processing Systems", "page_first": 3405, "page_last": 3413, "abstract": "We present discriminative Gaifman models, a novel family of relational machine learning models. Gaifman models learn feature representations bottom up from representations of locally connected and bounded-size regions of knowledge bases (KBs).  Considering local and bounded-size neighborhoods of knowledge bases renders logical inference and learning tractable, mitigates the problem of overfitting, and facilitates weight sharing. Gaifman models sample neighborhoods of knowledge bases so as to make the learned relational models more robust to missing objects and relations which is a common situation in open-world KBs. We present the core ideas of Gaifman models and apply them to large-scale relational learning problems. We also discuss the ways in which Gaifman models relate to some existing relational machine learning approaches.", "full_text": "Discriminative Gaifman Models\n\nMathias Niepert\nNEC Labs Europe\n\nHeidelberg, Germany\n\nmathias.niepert@neclabs.eu\n\nAbstract\n\nWe present discriminative Gaifman models, a novel family of relational machine\nlearning models. Gaifman models learn feature representations bottom up from\nrepresentations of locally connected and bounded-size regions of knowledge bases\n(KBs). Considering local and bounded-size neighborhoods of knowledge bases\nrenders logical inference and learning tractable, mitigates the problem of over-\n\ufb01tting, and facilitates weight sharing. Gaifman models sample neighborhoods\nof knowledge bases so as to make the learned relational models more robust to\nmissing objects and relations which is a common situation in open-world KBs. We\npresent the core ideas of Gaifman models and apply them to large-scale relational\nlearning problems. We also discuss the ways in which Gaifman models relate to\nsome existing relational machine learning approaches.\n\n1\n\nIntroduction\n\nKnowledge bases are attracting considerable interest both from industry and academia [2, 6, 15, 10].\nInstances of knowledge bases are the web graph, social and citation networks, and multi-relational\nknowledge graphs such as Freebase [2] and YAGO [11]. Large knowledge bases motivate the\ndevelopment of scalable machine learning models that can reason about objects as well as their\nproperties and relationships. Research in statistical relational learning (SRL) has focused on particular\nformalisms such as Markov logic [22] and PROBLOG [8] and is often concerned with improving the\nef\ufb01ciency of inference and learning [14, 28]. The scalability problems of these statistical relational\nlanguages, however, remain an obstacle and have prevented a wider adoption. Another line of work\nfocuses on ef\ufb01cient relational machine learning models that perform well on a particular task such\nas knowledge base completion and relation extraction. Examples are knowledge base factorization\nand embedding approaches [5, 21, 23, 26] and random-walk based ML models [15, 10]. We aim to\nadvance the state of the art in relational machine learning by developing ef\ufb01cient models that learn\nknowledge base embeddings that are effective for probabilistic query answering on the one hand, and\ninterpretable and widely applicable on the other.\nGaifman\u2019s locality theorem [9] is a result in the area of \ufb01nite model theory [16]. The Gaifman graph\nof a knowledge base is the undirected graph whose nodes correspond to objects and in which two\nnodes are connected if the corresponding objects co-occur as arguments of some relation. Gaifman\u2019s\nlocality theorem states that every \ufb01rst-order sentence is equivalent to a Boolean combination of\nsentences whose quanti\ufb01ers range over local neighborhoods of the Gaifman graph. With this paper,\nwe aim to explore Gaifman locality from a machine learning perspective. If every \ufb01rst-order sentence\nis equivalent to a Boolean combination of sentences whose quanti\ufb01ers range over local neighborhoods\nonly, we ought to be able to develop models that learn effective representations from these local\nneighborhoods. There is increasing evidence that learning representations that are built up from\nlocal structures can be highly successful. Convolutional neural networks, for instance, learn features\nover locally connected regions of images. The aim of this work is to investigate the effectiveness\nand ef\ufb01ciency of machine learning models that perform learning and inference within and across\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\flocally connected regions of knowledge bases. This is achieved by combining relational features that\nare often used in statistical relatinal learning with novel ideas from the area of deep learning. The\nfollowing problem motivates Gaifman models.\nProblem 1. Given a knowledge base (relational structure, mega-example, knowledge graph) or a\ncollection of knowledge bases, learn a relational machine learning model that supports complex\nrelational queries. The model learns a probability for each tuple in the query answer.\n\nNote that this is a more general problem than knowledge base completion since it includes the\nlearning of a probability distribution for a complex relational query. The query corresponding to\nknowledge base completion is r(x, y) for logical variables x and y, and relation r. The problem also\ntouches on the problem of open-world probabilistic KBs [7] since tuples whose prior probability is\nzero will often have a non-zero probability in the query answer.\n\n2 Background\n\nWe \ufb01rst review some important concepts and notation in \ufb01rst-order logic.\n\n2.1 Relational First-order Logic\n\nAn atom r(t1, ..., tn) consists of predicate r of arity n followed by n arguments, which are either\nelements from a \ufb01nite domain D = {a, b, ...} or logical variables {x, y, ...}. We us the terms domain\nelement and object synonymously. A ground atom is an atom without logical variables. Formulas are\nbuilt from atoms using the usual Boolean connectives and existential and universal quanti\ufb01cation. A\nfree variable in a \ufb01rst-order formula is a variable x not in the scope of a quanti\ufb01er. We write \u03d5(x, y)\nto denote that x, y are free in \u03d5, and free(\u03d5) to refer to the free variables of \u03d5. A substitution\nreplaces all occurrences of logical variable x by t in some formula \u03d5 and is denoted by \u03d5[x/t].\nA vocabulary consists of a \ufb01nite set of predicates R and a domain D. Every predicate r is associated\nwith a positive integer called the arity of r. A R-structure (or knowledge base) D consists of the\ndomain D, a set of predicates R, and an interpretation. The Herbrand base of D is the set of all\nground atoms that can be constructed from R and D. The interpretation assigns a truth value to\nevery atom in the Herbrand base by specifying rD\n\u2286 Dn for each n-ary predicate r \u2208 R. For a\nformula \u03d5(x1, ..., xn) and a structure D, we write D |= \u03d5(d1, ..., dn) to say that D satis\ufb01es \u03d5 if\nthe variables x1, ..., xn are substituted with the domain elements d1, ...., dn. We de\ufb01ne \u03d5(D) :=\nD denotes the\n{(d1, ..., dn) \u2208 Dn | D |= \u03d5(d1, ..., dn)}. For the R-structure D and C \u2286 D, (cid:104)C(cid:105)\nsubstructure induced by C on D, that is, the R-structure C with domain C and rC := rD\n\u2229 Cn for\nevery n-ary r \u2208 R.\n2.2 Gaifman\u2019s Locality Theorem\nThe Gaifman graph of a R-structure D is the graph GD with vertex set D and an edge between\ntwo vertices d, d(cid:48)\n\u2208 D if and only if there exists an r \u2208 R and a tuple (d1, ..., dk) \u2208 rD such that\nd, d(cid:48)\n\u2208 {d1, ..., dk}. Figure 1 depicts a fragment of a knowledge base and the corresponding Gaifman\ngraph. The distance dD(d1, d2) between two elements d1, d2 \u2208 D of a structure D is the length of\nthe shortest path in GD connecting d1 and d2. For r \u2265 1 and d \u2208 D, we de\ufb01ne the r-neighborhood\nof d to be Nr(d) := {x \u2208 D | dD(d, x) \u2264 r}. We refer to r also as the depth of the neighborhood.\nLet d = (d1, ..., dn) \u2208 Dn. The r-neighborhood of d is de\ufb01ned as\n\nn(cid:91)\n\nNr(d) =\n\nNr(di).\n\ni=1\n\nFor the Gaifman graph in Figure 1, we have that N1(d4) = {d1, d2, d5} and N1((d1, d2)) =\n{d1, ..., d6}. \u03d5Nr (x) is the formula obtained from \u03d5(x) by relativizing all quanti\ufb01ers to Nr(x), that\nis, by replacing every subformula of the form \u2203y\u03c8(x, y, z) by \u2203y(dD(x, y) \u2264 r \u2227 \u03c8(x, y, z)) and\nevery subformula of the form \u2200y\u03c8(x, y, z) by \u2200y(dD(x, y) \u2264 r \u2192 \u03c8(x, y, z)). A formula \u03c8(x) of\nthe form \u03d5Nr (x), for some \u03d5(x), is called r-local. Whether an r-local formula \u03c8(x) holds depends\nonly on the r-neighborhood of x, that is, for every structure D and every d \u2208 D we have D |= \u03c8(d)\n\n2\n\n\fFigure 1: A knowledge base fragment for the pair\n(d1, d2) and the corresponding Gaifman graph.\n\nFigure 2: The degree distribution of the Gaifman\ngraph for the Freebase fragment FB15K.\n\nif and only if (cid:104)Nr(d)(cid:105) |= \u03c8(d). For r, k \u2265 1 and \u03c8(x) being r-local, a local sentence is of the form\n\n\uf8eb\uf8ed (cid:94)\n\n1\u2264i<j\u2264k\n\n(cid:94)\n\n\uf8f6\uf8f8 .\n\n\u2203x1 \u00b7\u00b7\u00b7\u2203xk\n\ndD(xi, xj) > 2r \u2227\n\n\u03c8(xi)\n\n1\u2264i\u2264k\n\nWe can now state Gaifman\u2019s locality theorem.\nTheorem 1. [9] Every \ufb01rst-order sentence is equivalent to a Boolean combination of local sentences.\n\nGaifman\u2019s locality theorem states that any \ufb01rst-order sentence can be expressed as a Boolean\ncombination of r-local sentences de\ufb01ned for neighborhoods of objects that are mutually far apart\n(have distance at least 2r + 1). Now, a novel approach to (statistical) relational learning would be to\nconsider a large set of objects (or tuples of objects) and learn models from their local neighborhoods\nin the Gaifman graphs. It is this observation that motivates Gaifman models.\n\n3 Learning Gaifman Models\n\nInstead of taking the costly approach of applying relational learning and inference directly to entire\nknowledge bases, the representations of Gaifman models are learned bottom up, by performing\ninference and learning within bounded-size, locally connected regions of Gaifman graphs. Each\nGaifman model speci\ufb01es the data generating process from a given knowledge base (or collection of\nknowledge bases), a set of relational features, and a ML model class used for learning.\nDe\ufb01nition 1. Given a R-structure D, a discriminative Gaifman model for D is a tuple (q, r, k, \u03a6,M)\nas follows:\n\n\u2022 q is a \ufb01rst-order formula called the target query with at least one free variable;\n\u2022 r is the depth of the Gaifman neighborhoods;\n\u2022 k is the size-bound of the Gaifman neighborhoods;\n\u2022 \u03a6 is a set of \ufb01rst-order formulas (the relational features);\n\u2022 M is the base model class (loss, hyper-parameters, etc.).\n\nThroughout the rest of the paper, we will provide detailed explanations of the different parameters of\nGaifman models and their interaction with data generation, learning, and inference.\nDuring the training of Gaifman models, neighborhoods are generated for tuples of objects d \u2208 Dn\nbased on the parameters r and k. We \ufb01rst describe the procedure for arbitrary tuples d of objects\nand will later explain where these tuples come from. For a given tuple d the r-neighborhood of d\nwithin the Gaifman graph is computed. This results in the set of objects Nr(d). Now, from this\nneighborhood we sample w neighborhoods consisting of at most k objects. Sampling bounded-size\nsub-neighborhoods from Nr(d) is motivated as follows:\n\n3\n\nd1d2d4d3d6d5locatedIn(d6, d5)livesIn(d2, d5)worksAt(d2, d6)studentOf(d1, d2)studentAt(d1, d6)bornIn(d1, d3)studentOf(d4, d2)introducedBy(d1, d4, d2)livesIn(d4, d5)100101102103104degree10\u22121100101102103numberofnodes\f1. The degree distribution of Gaifman graphs is often skewed (see Figure 2), that is, the\nnumber of other objects a domain element is related to varies heavily. Generating smaller,\nbounded-size neighborhoods allows the transfer of learned representations between more\nand less connected objects. Moreover, the sampling strategy makes Gaifman models more\nrobust to object uncertainty [19]. We show empirically that larger values for k reduce the\neffectiveness of the learned models for some knowledge bases.\n\n2. Relational learning and inference is performed within the generated neighborhoods. Nr(d)\ncan be very large, even for r = 1 (see Figure 2), and we want full control over the complexity\nof the computational problems.\n\n3. Even for a single object tuple d we can generate a large number of training examples if\n|Nr(d)| > k. This mitigates the risk of over\ufb01tting. The number of training examples per\ntuple strongly in\ufb02uences the models\u2019 accuracy.\n\nWe can now de\ufb01ne the set of (r, k)-neighborhoods generated from a r-neighborhood.\n\nNr,k(d) :=\n\n{N | N \u2286 Nr(d) and |N| = k}\n{Nr(d)}\n\nif |Nr(d)| \u2265 k\notherwise.\n\n(cid:26)\n\nFor a given tuple of objects d, Algorithm 1 returns a set of w neighborhoods drawn from Nr,k(d)\nsuch that the number of objects for each di is the same in expectation.\nThe formulas in the set \u03a6 are indexed and of the form \u03d5i(s1, ..., sn, u1, ..., um) with sj \u2208 free(q)\nand uj (cid:54)\u2208 free(q). For every tuple d = (d1, ..., dn), generated neighborhood N \u2208 Nr,k(d),\nand \u03d5i \u2208 \u03a6, we perform the substitution [s1/d1, ..., sn/dn] and relativize \u03d5i\u2019s quanti\ufb01ers to N,\nresulting in \u03d5N\ni [s/d]. Let (cid:104)N(cid:105) be the substructure induced\nby N on D. For every formula \u03d5i(s1, ..., sn, u1, ..., um) and every n \u2208 Nm, we now have that\nD |= \u03d5N\ni [s/d, u/n]. In other words, satisfaction is now checked\nlocally within the neighborhoods N, by deciding whether (cid:104)N(cid:105) |= \u03d5N\ni [s/d, u/n]. The relational\nsemantics of Gaifman models is based on the set of formulas \u03a6. The feature vector v = (v1, ..., v|\u03a6|)\nfor tuple d, and neighborhood N \u2208 Nr,k(d), written as vN, is constructed as follows\n\ni [s/d, u/n] if and only if (cid:104)N(cid:105) |= \u03d5N\n\ni [s1/d1, ..., sn/dn] which we write as \u03d5N\n\n\uf8f1\uf8f2\uf8f3 \u03d5N\n\n1\n0\n\ni [s/d]((cid:104)N(cid:105))\n\nvi :=\n\nif free(\u03d5N\nif (cid:104)N(cid:105) |= \u03d5N\notherwise.\n\ni [s/d]) > 0\n\ni [s/d]\n\ni [s/d] has free variables, vi is equal to the number of groundings of \u03d5i[s/d] that are\nThat is, if \u03d5N\nsatis\ufb01ed within the neighborhood substructure (cid:104)N(cid:105); if \u03d5i[s/d] has no free variables, vi = 1 if\nand only if \u03d5i[s/d] is satis\ufb01ed within the neighborhod substructure (cid:104)N(cid:105); and vi = 0 otherwise.\nThe neighborhood representations v capture r-local formulas and help the model learn formula\ncombinations that are associated with negative and positive examples. For the right choices of the\nparameters r and k, the neighborhood representations of Gaifman models capture the relational\nstructure associated with positive and negative examples.\nDeciding D |= \u03d5 for a structure D and a \ufb01rst-order formula \u03d5 is referred to as model checking and\ncomputing \u03d5(D) is called \u03d5-counting. The combined complexity of model checking is PSPACE-\ncomplete [29] and there exists a ||D||O(||\u03d5||) algorithm for both problems where ||\u00b7|| is the size of an\nencoding. Clearly, for most real-world KBs this is not feasible. For Gaifman models, however, where\nthe neighborhoods are bounded-size, typically 10 \u2264 |N| = k \u2264 100, the above representation can\nbe computed very ef\ufb01ciently for a large class of relational features. We can now state the following\ncomplexity result.\nTheorem 2. Let D be a relational structure (knowledge base), let d be the size of the largest r-\nneighborhood of D\u2019s Gaifman graph, and let s be the greatest encoding size of any formula in \u03a6.\nFor a Gaifman model with parameters r and k, the worst-case complexity for computing the feature\nrepresentations of N neighborhoods is O(N (d + |\u03a6|ks)).\nExisting SRL approaches could be applied to the generated neighborhoods, treating each as a possible\nworld for structure and parameter learning. However, our goal is to learn relational models that utilize\nembeddings computed by multi-layered neural networks.\n\n4\n\n\fAlgorithm 1 GENNEIGHS: Computes a list of w\nneighborhoods of size k for an input tuple d.\n1: input: tuple d \u2208 Dn, parameters r, k, and w\n2: S = [ ]\n3: while |S| < w do\n4:\n5: N = Nr(d)\n6:\n7:\n\nS = \u2205\nfor all i \u2208 {1, ..., n} do\n\nU = min((cid:98)k/n(cid:99),|Nr(di)|) elements\nsampled uniformly from Nr(di)\nN = N \\ U\nS = S \u222a U\n\nU = min(|S| \u2212 k,|N|) elements sampled\nuniformly from N\nS = S \u222a U\nS = S + S\n\n8:\n9:\n10:\n\n11:\n12:\n13: return S\n\nFigure 3: Learning of a Gaifman model.\n\nFigure 4: Inference with a Gaifman model.\n\n3.1 Learning Distributions for Relational Queries\nLet q be a \ufb01rst-order formula (the relational query) and S(q) the result set of the query, that is, all\ngroundings that render the formula satis\ufb01ed in the knowledge base. The feature representations\ngenerated for tuples of objects d \u2208 S(q) serve as positive training examples. The Gaifman models\u2019\naim is to learn neighborhood embeddings that capture local structure of tuples for which we know\nthat the target query evaluates to true. Similar to previous work, we generate negative examples by\ncorrupting tuples that correspond to positive examples. The corruption mechanism takes a positive\ninput tuple d = (d1, ..., dn) and substitutes, for each i \u2208 {1, ..., n}, the domain element di with\nobjects sampled from D while keeping the rest of the tuple \ufb01xed.\nThe discriminative Gaifman model performs the following steps.\n\n1. Evaluate the target query q and compute the result set S(q)\n2. For each tuple d in the result set S(q):\n\n\u2022 Compute N , a multiset of w neighborhoods \u02dcN \u2208 Nr,k(d) with Algorithm 1; each\nsuch neighborhood serves as a positive training example\n\u2022 Compute \u02dcN , a multiset of \u02dcw neighborhoods N \u2208 Nr,k(\u02dcd) for corrupted versions of d\nwith Algorithm 1; each such neighborhood serves as a negative training example\n\u2022 Perform model checking and counting within the neighborhoods to compute the feature\nrepresentations vN and v \u02dcN for each N \u2208 N and \u02dcN \u2208 \u02dcN , respectively\n\n3. Learn a ML model with the generated positive and negative training examples.\n\nLearning the \ufb01nal Gaifman model depends on the base ML model class M and its loss function.\nWe obtained state of the art results with neural networks, gradient-based learning, and categorical\ncross-entropy as loss function\n\n\uf8ee\uf8f0(cid:88)\n\nN\u2208N\n\nL = \u2212\n\n(cid:88)\n\n\u02dcN\u2208 \u02dcN\n\nlog pM(vN) +\n\nlog(1 \u2212 pM(v \u02dcN))\n\n\uf8f9\uf8fb ,\n\nwhere pM(vN) is the probability the model returns on input vN. However, other loss functions are\npossible. The probability of a particular substitution of the target query to be true is now\n\nP (q[s/d] = True) =\n\nE\n\nN\u2208N(r,k)(d)\n\n[pM(vN)].\n\nThe expected probability of a representation of a neighborhood drawn uniformly at random from\nN(r,k)(d). It is now possible to generate several neighborhoods N and their representations vN to\n\n5\n\n...W1...Wn\u03a6M!W1...Wn?!\u03a6M\festimate P (q[s/d] = True), simply by averaging the neighborhoods\u2019 probabilities. We have found\nexperimentally that a single neighborhood already leads to highly accurate results but also that more\nneighborhood samples further improve the accurracy.\nLet us emphasize again the novel semantics of Gaifman models. Gaifman models generate a large\nnumber of small, bounded-size structures from a large structure, learn a representation for these\nbounded-size structures, and use the resulting representation to answer queries concerning the\noriginal structure as a whole. The advantages are model weight sharing across a large number of\nneighborhoods and ef\ufb01ciency of the computational problems. Figure 3 and Figure 4 illustrate learning\nfrom bounded-size neighborhood structures and inference in Gaifman models.\n\n3.2 Structure Learning\n\nStructure learning is the problem of determining the set of relational features \u03a6. We provide some\ndirections and leave the problem to future work. Given a collection of bounded-size neighborhoods\nof the Gaifman graph, the goal is to determine suitable relational features for the problem at hand.\nThere is a set of features which we found to be highly effective. For example, formulas of the form\n\u2203x r(s1, x), \u2203x r(s1, x) \u2227 r(x, s2), and \u2203x, y r1(s1, x) \u2227 r2(x, y) \u2227 r3(y, s2) for all relations. The\nlatter formulas capture \ufb01xed-length paths between s1 and s2 in the neighborhoods. Hence, Path\nRanking type features [15] can be used in Gaifman models as a particular relational feature class. For\npath formulas with several different relations we cannot include all |R|3 combinations and, hence,\nwe have to determine a subset occurring in the training data. Fortunately, since the neighborhood\nsize is bounded, it is computationally feasible to compute frequent paths in the neighborhoods and\nto use these as features. The complexity of this learning problem is in the number of elements\nin the neighborhood and not in the number of all objects in the knowledge base. Relation paths\nthat do not occur in the data can be discarded. Gaifman models can also use features of the form\n\u2200x, y r(x, y) \u21d2 r(y, x), \u2203x, y r(x, y), and \u2200x, y, z r(x, y) \u2227 r(y, z) \u21d2 r(x, z), to name but a few.\nMoreover, features with free variables, such as r(s1, x) are counting features (here: the r out-degree\nof s1). It is even computationally feasible to include speci\ufb01c second-order features (for instance,\nquanti\ufb01ers ranging over R) and aggregations of feature values.\n\n3.3 Prior Con\ufb01dence Values, Types, and Numerical Attributes\n\nNumerous existing knowledge bases assign con\ufb01dence values (probabilities, weights, etc.) to their\nstatements. Gaifman models can incorporate con\ufb01dence values during the sampling and learning\nprocess. Instead of adding random noise to the representations, which we have found to be bene\ufb01cial,\nnoise can be added inversely proportional to the con\ufb01dence values. Statements for which the prior\ncon\ufb01dence values are lower are more likely to be dropped out during training than statements with\nhigher con\ufb01dence values. Furthermore, Gaifman models can directly incorporate object types such as\nActor and Action Movie as well as numerical features such as location and elevation. One simply\nhas to specify a \ufb01xed position in the neighborhood representation v for each object position within\nthe input tuples d.\n\n4 Related Work\n\nRecent work on relational machine learning for knowledge graphs is surveyed in [20]. We focus on\na select few methods we deem most related to Gaifman models and refer the interested reader to\nthe above article. A large body of work exists on learning inference rules from knowledge bases.\nExamples include [31] and [1] where inference rules of length one are learned; and [25] where general\ninference rules are learned by applying a support threshold. Their method does not scale to large KBs\nand depends on predetermined thresholds. Lao et al. [15] train a logistic regression classi\ufb01er with\npath features to perform KB completion. The idea is to perform a random walk between objects and\nto exploit the discovered paths as features. SFE [10] improves PRA by making the generation of\nrandom walks more ef\ufb01cient. More recent embedding methods have combined paths in KBs with\nKB embedding methods [17]. Gaifman models support a much broader class of relational features\nsubsuming path features. For instance, Gaifman models incorporate counting features that have\nshown to be bene\ufb01cial for relational models.\n\n6\n\n\fLatent feature models learn features for objects and relations that are not directly observed in\nthe data. Examples of latent feature models are tensor factorization [21, 23, 26] and embedding\nmodels [5, 3, 4, 18, 13, 27]. The majority of these models can be understood as more or less complex\nneural networks operating on object and relation representations. Gaifman models can also be used\nto learn knowledge base embeddings. Indeed, one can show that it generalizes or complements\nexisting approaches. For instance, the universal schema [23] considers pairs of objects where relation\nmembership variables comprise the model\u2019s features. We have the following interesting relationship\nbetween universal schemas [23] and Gaifman models. Given a knowledge base D. The Gaifman\nr\u2208R{r(s1, s2), r(s2, s1)}, w = 1 and \u02dcw = 0 is equivalent\nto the Universal Schema [23] for D up to the base model class M. More recent methods combine\nembedding methods and inference-based logical approaches for relation extraction [24]. Contrary\nto most existing multi-relational ML models [20], Gaifman models natively support higher-arity\nrelations, functional and type constraints, numerical features, and complex target queries.\n\nmodel for D with r = 0, k = 2, \u03a6 =(cid:83)\n\n5 Experiments\n\n|R|\n18\n\n1,345\n\n# test\n5,000\n59,071\n\nDataset\nWN18\nFB15k\n\n|D|\n40,943\n14,951\n\n# train\n141,442\n483,142\n\nTable 1: The statistics of the data sets.\n\nThe aim of the experiments is to understand the\nef\ufb01ciency and effectiveness of Gaifman models for\ntypical knowledge base inference problems. We\nevaluate the proposed class of models with two data\nsets derived from the knowledge bases WORDNET\nand FREEBASE [2]. Both data sets consist of a list of statements r(d1, d2) that are known to be true.\nFor a detailed description of the data sets, whose statistics are listed in Table 1, we refer the reader to\nprevious work [4].\nAfter training the models, we perform entity prediction as follows. For each statement r(d1, d2) in\nthe test set, d2 is replaced by each of the KB\u2019s objects in turn. The probabilities of the resulting\nstatements are predicted and sorted in descending order. Finally, the rank of the correct statement\nwithin this ordered list is determined. The same process is repeated now with replacements of d1. We\ncompare Gaifman models with q = r(x, y) to state of the art knowledge base completion approaches\nwhich are listed in Table 2. We trained Gaifman models with r = 1 and different values for k, w,\nand \u02dcw. We use a neural network architecture with two hidden layers, each having 100 units and\nsigmoid activations, dropout of 0.2 on the input layer, and a softmax layer. Dropout makes the model\nmore robust to missing relations between objects. We trained one model per relation and left the\nhyper-parameters \ufb01xed across models. We did not perform structure learning and instead used the\nfollowing set of relational features\n\n(cid:91)\n\n(cid:26) r(s1, s2), r(s2, s1),\u2203x r(x, si),\u2203x r(si, x),\n\n\u2203x r(s1, x) \u2227 r(x, s2),\u2203x r(s2, x) \u2227 r(x, s1)\n\n(cid:27)\n\n.\n\n\u03a6 :=\n\nr\u2208R, i\u2208{1,2}\n\nTo compute the probabilities, we averaged the probabilities of N = 1, 2, or 3 generated (r, k)-\nneighborhoods.\nWe performed runtime experiments to evaluate the\nmodels\u2019 ef\ufb01ciency. Embedding models have the\nadvantage that one dot product for every candidate\nobject is suf\ufb01cient to compute the score for the\ncorresponding statement and we need to assess the\nperformance of Gaifman models in this context.\nAll experiments were run on commodity hardware\nwith 64G RAM and a single 2.8 GHz CPU.\nTable 2 lists the experimental results for different\nparameter settings [N, k, w, \u02dcw]. The Gaifman mod-\nels achieve the highest hits@10 and hits@1 values\nfor both data sets. As expected, the more neighbor-\nhood samples are used to compute the probability\nestimate (N = 1, 2, 3) the better the result. When\nthe entire 1-neighborhood is considered (k = \u221e),\nthe performance for WN18 does not deteriorate as it does for FB15k. This is due to the fact that\n\nFigure 5: Query answers per second rates for\ndifferent values of the parameter k.\n\n7\n\n5102050100inf0.51.01.5Queryanswerspersecond\u00d7104WN18FB15k\fData Set\nMetric\n\nRESCAL[21]\n\nSE[5]\n\nLFM[12]\nTransE[4]\nTransR[18]\nDistMult[30]\n\nGaifman [1, \u221e, 1, 5]\nGaifman [1, 20, 1, 2]\nGaifman [1, 20, 5, 25]\nGaifman [2, 20, 5, 25]\nGaifman [3, 20, 5, 25]\n\n1,163\n985\n456\n251\n219\n902\n298\n357\n392\n378\n352\n\n44.1\n39.8\n33.1\n71.5\n65.5\n82.8\n78.1\n79.2\n82.1\n83.4\n84.2\n\n-\n-\n-\n\n-\n\n28.1\n\n44.3\n59.8\n60.1\n65.6\n68.5\n69.2\n\nTable 2: Results of the entity prediction experiments.\n\nWN18\n\nFB15K\n\nMean rank Hits@10 Hits@1 Mean rank Hits@10 Hits@1\n\n52.8\n80.5\n81.6\n89.2\n91.7\n93.7\n93.9\n88.1\n93.6\n93.9\n93.9\n\n-\n-\n-\n8.9\n-\n\n76.1\n75.8\n66.8\n76.4\n76.7\n76.1\n\n683\n162\n164\n51\n78\n97\n124\n114\n97\n84\n75\n\nobjects in WN18 have on average few neighbors. FB15k has more variance in the Gaifman graph\u2019s\ndegree distribution (see Figure 2) which is re\ufb02ected in the better performance for smaller k values.\nThe experiments also show that it is bene\ufb01cial to generate a large number of representations (both\npositive and negative ones). The performance improves with larger number of training examples.\nThe runtime experiments demonstrate that Gaifman models perform inference very ef\ufb01ciently for\nk \u2264 20. Figure 5 depicts the number of query answers the Gaifman models are able to serve per\nsecond, averaged over relation types. A query answer returns the probability for one object pair.\nThese numbers include neighborhood generation and network inference. The results are promising\nwith about 5000 query answers per second (averaged across relation types) as long as k remains\nsmall. Since most object pairs of WN18 have a 1-neighborhood whose size is smaller than 20, the\nanswers per second rates for k > 20 is not reduced as drastically as for FB15k.\n\n6 Conclusion and Future Work\n\nGaifman models are a novel family of relational machine learning models that perform learning\nand inference within and across locally connected regions of relational structures. Future directions\nof research include structure learning, more sophisticated base model classes, and application of\nGaifman models to additional relational ML problems.\n\nAcknowledgements\n\nMany thanks to Alberto Garc\u00eda-Dur\u00e1n, Mohamed Ahmed, and Kristian Kersting for their helpful\nfeedback.\n\nReferences\n[1] J. Berant, I. Dagan, and J. Goldberger. Global learning of typed entailment rules. In Annual Meeting of the\n\nAssociation for Computational Linguistics, pages 610\u2013619, 2011.\n\n[2] K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: A collaboratively created graph\n\ndatabase for structuring human knowledge. In SIGMOD, pages 1247\u20131250, 2008.\n\n[3] A. Bordes, X. Glorot, J. Weston, and Y. Bengio. Joint learning of words and meaning representations for\nopen-text semantic parsing. In Conference on Arti\ufb01cial Intelligence and Statistics, pages 127\u2013135, 2012.\n\n[4] A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko. Translating embeddings for\n\nmodeling multi-relational data. In Neural Information Processing Systems, pages 2787\u20132795. 2013.\n\n[5] A. Bordes, J. Weston, R. Collobert, and Y. Bengio. Learning structured embeddings of knowledge bases.\n\nIn AAAI Conference on Arti\ufb01cial Intelligence, 2011.\n\n[6] A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. Hruschka, and T. M. Mitchell. Toward an architecture\nfor never-ending language learning. In Twenty-Fourth AAAI Conference on Arti\ufb01cial Intelligence, 2010.\n[7] I. I. Ceylan, A. Darwiche, and G. Van den Broeck. Open-world probabilistic databases. In Proceedings of\nthe 15th International Conference on Principles of Knowledge Representation and Reasoning (KR), 2016.\n\n8\n\n\f[8] A. Dries, A. Kimmig, W. Meert, J. Renkens, G. Van den Broeck, J. Vlasselaer, and L. De Raedt. ProbLog2:\n\nProbabilistic logic programming. Lecture Notes in Computer Science, 9286:312\u2013315, 2015.\n\n[9] H. Gaifman. On local and non-local properties.\n\ncolloquium, volume 81, pages 105\u2013135, 1982.\n\nIn Proceedings of the herbrand symposium, logic\n\n[10] M. Gardner and T. M. Mitchell. Ef\ufb01cient and expressive knowledge base completion using subgraph\nfeature extraction. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language\nProcessing, pages 1488\u20131498, 2015.\n\n[11] J. Hoffart, F. M. Suchanek, K. Berberich, and G. Weikum. Yago2: A spatially and temporally enhanced\n\nknowledge base from wikipedia. Artif. Intell., 194:28\u201361, 2013.\n\n[12] R. Jenatton, N. L. Roux, A. Bordes, and G. R. Obozinski. A latent factor model for highly multi-relational\n\ndata. In Neural Information Processing Systems, pages 3167\u20133175, 2012.\n\n[13] G. Ji, K. Liu, S. He, and J. Zhao. Knowledge graph completion with adaptive sparse transfer matrix. In\nD. Schuurmans and M. P. Wellman, editors, Proceedings of the Thirtieth AAAI Conference on Arti\ufb01cial\nIntelligence, pages 985\u2013991, 2016.\n\n[14] K. Kersting. Lifted probabilistic inference. In European Conference on Arti\ufb01cial Intelligence, pages 33\u201338,\n\n2012.\n\n[15] N. Lao, T. Mitchell, and W. W. Cohen. Random walk inference and learning in a large scale knowledge\n\nbase. In Empirical Methods in Natural Language Processing, pages 529\u2013539, 2011.\n\n[16] L. Libkin. Elements Of Finite Model Theory. SpringerVerlag, 2004.\n\n[17] Y. Lin, Z. Liu, H. Luan, M. Sun, S. Rao, and S. Liu. Modeling relation paths for representation learning\nof knowledge bases. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language\nProcessing, EMNLP 2015, Lisbon, Portugal, September 17-21, 2015, pages 705\u2013714, 2015.\n\n[18] Y. Lin, Z. Liu, M. Sun, Y. Liu, and X. Zhu. Learning entity and relation embeddings for knowledge graph\n\ncompletion. In AAAI Conference on Arti\ufb01cial Intelligence, pages 2181\u20132187, 2015.\n\n[19] B. C. Milch. Probabilistic Models with Unknown Objects. PhD thesis, 2006.\n\n[20] M. Nickel, K. Murphy, V. Tresp, and E. Gabrilovich. A review of relational machine learning for knowledge\n\ngraphs. Proceedings of the IEEE, 104(1):11\u201333, 2016.\n\n[21] M. Nickel, V. Tresp, and H.-P. Kriegel. A three-way model for collective learning on multi-relational data.\n\nIn International conference on machine learning (ICML), pages 809\u2013816, 2011.\n\n[22] M. Richardson and P. Domingos. Markov logic networks. Machine learning, 62(1-2):107\u2013136, 2006.\n\n[23] S. Riedel, L. Yao, B. M. Marlin, and A. McCallum. Relation extraction with matrix factorization and\n\nuniversal schemas. In HLT-NAACL, 2013.\n\n[24] T. Rockt\u00e4schel, S. Singh, and S. Riedel. Injecting logical background knowledge into embeddings for\n\nrelation extraction. In Conference of the North American Chapter of the ACL (NAACL), 2015.\n\n[25] S. Schoenmackers, O. Etzioni, D. S. Weld, and J. Davis. Learning \ufb01rst-order horn clauses from web text.\n\nIn Conference on Empirical Methods in Natural Language Processing, pages 1088\u20131098, 2010.\n\n[26] R. Socher, D. Chen, C. D. Manning, and A. Ng. Reasoning with neural tensor networks for knowledge\n\nbase completion. In Neural Information Processing Systems, pages 926\u2013934. 2013.\n\n[27] T. Trouillon, J. Welbl, S. Riedel, \u00c9. Gaussier, and G. Bouchard. Complex embeddings for simple link\nprediction. In Proceedings of the 33nd International Conference on Machine Learning, volume 48, pages\n2071\u20132080, 2016.\n\n[28] G. Van den Broeck. Lifted inference and learning in statistical relational models. 2013.\n[29] M. Y. Vardi. The complexity of relational query languages. In ACM symposium on Theory of computing,\n\npages 137\u2013146, 1982.\n\n[30] B. Yang, W.-t. Yih, X. He, J. Gao, and L. Deng. Embedding entities and relations for learning and inference\n\nin knowledge bases. In International Conference on Learning Representations, 2015.\n\n[31] A. Yates and O. Etzioni. Unsupervised resolution of objects and relations on the web. In Conference of the\n\nNorth American Chapter of the Association for Computational Linguistics, 2007.\n\n9\n\n\f", "award": [], "sourceid": 1689, "authors": [{"given_name": "Mathias", "family_name": "Niepert", "institution": "NEC Labs Europe"}]}