{"title": "Abstraction and Relational learning", "book": "Advances in Neural Information Processing Systems", "page_first": 934, "page_last": 942, "abstract": "Many categories are better described by providing relational information than listing characteristic features.  We present a hierarchical generative model that helps to explain how relational categories are learned and used. Our model learns abstract schemata that specify the relational similarities shared by members of a category, and our emphasis on abstraction departs from previous theoretical proposals that focus instead on comparison of concrete instances. Our first experiment suggests that our abstraction-based account can address some of the tasks that have previously been used to support comparison-based approaches. Our second experiment focuses on one-shot schema learning, a problem that raises challenges for comparison-based approaches but is handled naturally by our abstraction-based account.", "full_text": "Abstraction and relational learning\n\nCharles Kemp & Alan Jern\nDepartment of Psychology\nCarnegie Mellon University\n{ckemp,ajern}@cmu.edu\n\nAbstract\n\nMost models of categorization learn categories de\ufb01ned by characteristic features\nbut some categories are described more naturally in terms of relations. We present\na generative model that helps to explain how relational categories are learned and\nused. Our model learns abstract schemata that specify the relational similarities\nshared by instances of a category, and our emphasis on abstraction departs from\nprevious theoretical proposals that focus instead on comparison of concrete in-\nstances. Our \ufb01rst experiment suggests that abstraction can help to explain some\nof the \ufb01ndings that have previously been used to support comparison-based ap-\nproaches. Our second experiment focuses on one-shot schema learning, a problem\nthat raises challenges for comparison-based approaches but is handled naturally by\nour abstraction-based account.\n\nCategories such as family, sonnet, above, betray, and imitate differ in many respects but all of them\ndepend critically on relational information. Members of a family are typically related by blood or\nmarriage, and the lines that make up a sonnet must rhyme with each other according to a certain\npattern. A pair of objects will demonstrate \u201caboveness\u201d only if a certain spatial relationship is\npresent, and an event will qualify as an instance of betrayal or imitation only if its participants relate\nto each other in certain ways. All of the cases just described are examples of relational categories.\nThis paper develops a computational approach that helps to explain how simple relational categories\nare acquired.\nOur approach highlights the role of abstraction in relational learning. Given several instances of\na relational category, it is often possible to infer an abstract representation that captures what the\ninstances have in common. We refer to these abstract representations as schemata, although others\nmay prefer to call them rules or theories. For example, a sonnet schema might specify the number\nof lines that a sonnet should include and the rhyming pattern that the lines should follow. Once a\nschema has been acquired it can support several kinds of inferences. A schema can be used to make\npredictions about hidden aspects of the examples already observed\u2014if the \ufb01nal word in a sonnet is\nillegible, the rhyming pattern can help to predict the identity of this word. A schema can be used\nto decide whether new examples (e.g. new poems) qualify as members of the category. Finally, a\nschema can be used to generate novel examples of a category (e.g. novel sonnets).\nMost researchers would agree that abstraction plays some role in relational learning, but Gentner [1]\nand other psychologists have emphasized the role of comparison instead [2, 3]. Given one example\nof a sonnet and the task of deciding whether a second poem is also a sonnet, a comparison-based\napproach might attempt to establish an alignment or mapping between the two. Approaches that rely\non comparison or mapping are especially prominent in the literature on analogical reasoning [4, 5],\nand many of these approaches can be viewed as accounts of relational categorization [6]. For exam-\nple, the problem of deciding whether two systems are analogous can be formalized as the problem\nof deciding whether these systems are instances of the same relational category. Despite some no-\ntable exceptions [6, 7], most accounts of analogy focus on comparison rather than abstraction, and\nsuggest that \u201canalogy passes from one instance of a generalization to another without pausing for\nexplicit induction of the generalization\u201d (p 95) [8].\n\n1\n\n\f0\u2200Q \u2200x \u2200y Q(x) < Q(y) \u2194 D1(x) < D1(y)\n\nSchema s\n\nGroup g\n\nObservation o\n\nFigure 1: A hierarchical generative model for learning and using relational categories. The schema\ns at the top level is a logical sentence that speci\ufb01es which groups are valid instances of the cate-\ngory. The group g at the second level is randomly sampled from the set of valid instances, and the\nobservation o is a partially observed version of group g.\n\nResearchers that focus on comparison sometimes discuss abstraction, but typically suggest that\nabstractions emerge as a consequence of comparing two or more concrete instances of a cate-\ngory [3, 5, 9, 10]. This view, however, will not account for one-shot inferences, or inferences\nbased on a single instance of a relational category. Consider a learner who is shown one instance of\na sonnet then asked to create a second instance. Since only one instance is provided, it is hard to\nsee how comparisons between instances could account for success on the task. A single instance,\nhowever, will sometimes provide enough information for a schema to be learned, and this schema\nshould allow subsequent instances to be generated [11]. Here we develop a formal framework for\nexploring relational learning in general and one-shot schema learning in particular.\nOur framework relies on the hierarchical Bayesian approach, which provides a natural way to com-\nbine abstraction and probabilistic inference [12]. The hierarchical Bayesian approach supports rep-\nresentations at multiple levels of abstraction, and helps to explains how abstract representations (e.g.\na sonnet schema) can be acquired given observations of concrete instances (e.g. individual sonnets).\nThe schemata we consider are represented as sentences in a logical language, and our approach\ntherefore builds on previous probabilistic methods for learning and using logical theories [13, 14].\nFollowing previous authors, we propose that logical representations can help to capture the content\nof human knowledge, and that Bayesian inference helps to explain how these representations are\nacquired and how they support inductive inference.\nThe following sections introduce our framework then evaluate it using two behavioral experiments.\nOur \ufb01rst experiment uses a standard classi\ufb01cation task where participants are shown one example\nof a category then asked to decide which of two alternatives is more likely to belong to the same\ncategory. Tasks of this kind have previously been used to argue for the importance of comparison,\nbut we suggest that these tasks can be handled by accounts that focus on abstraction. Our second\nexperiment uses a less standard generation task [15, 16] where participants are shown a single exam-\nple of a category then asked to generate additional examples. As predicted by our abstraction-based\naccount, we \ufb01nd that people are able to learn relational categories on the basis of a single example.\n\n1 A generative approach to relational learning\n\nOur examples so far have used real-world relational categories such as family and sonnet but we now\nturn to a very simple domain where relational categorization can be studied. Each element in the\ndomain is a group of components that vary along a number of dimensions\u2014in Figure 1, the compo-\nnents are \ufb01gures that vary along the dimensions of size, color, and circle position. The groups can\nbe organized into categories\u2014one such category includes groups where every component is black.\nAlthough our domain is rather basic it allows some simple relational regularities to be explored. We\ncan consider categories, for example, where all components in a group must be the same along some\ndimension, and categories where all components must be different along some dimension. We can\nalso consider categories de\ufb01ned by relationships between dimensions\u2014for example, the category\nthat includes all groups where the size and color dimensions are correlated.\nEach category is associated with a schema, or an abstract representation that speci\ufb01es which groups\nare valid instances of the category. Here we consider schemata that correspond to rules formulated\n\n2\n\n\f\u2203y x 6= y \u2227 \ufb00 Di(x)\u02d8=, 6=, <, >\u00af Di(y)\n\n1 \uf6be\u2200x\n\u2203x\ufb00 Di(x)\u02d8=, 6=, <, >\u00af vk\n2 \uf6be\u2200x\n\u2203x\ufb00\uf6be\u2200y x 6= y \u2192\n3 \u2200x Di(x)\u02d8=, 6=\u00af vk8<\n:\n4 \u2200x\u2200y x 6= y \u2192 0\n\n9=\n;\n\n\u2227\n\u2228\n\u2194\n\nDj(x)\u02d8=, 6=\u00af vl\n@Di(x)\u02d8=, 6=, <, >\u00af Di(y)8<\n9=\n;\n:\n\u2203y x 6= y \u2227 \ufb00 Q(x)\u02d8=, 6=, <, >\u00af Q(y)\n\n\u2227\n\u2228\n\u2194\n\nDj(x)\u02d8=, 6=, <, >\u00af Dj(y)1\nA\n\n\u2203x\ufb00\uf6be\u2200y x 6= y \u2192\n\n5 \uf6be\u2200Q\n6 \uf6be\u2200Q Q 6= Di \u2192\n\n\u2203Q\ufb00\uf6be\u2200x\n\u2203Q Q 6= Di \u2227 \ufb00 \u2200x\u2200y x 6= y \u2192 0\n\u2203Q\ufb00\uf6be\u2200R Q 6= R \u2192\n\n@Q(x)\u02d8=, 6=, <, >\u00af Q(y)8<\n:\n\n9=\n;\n@Q(x)\u02d8=, 6=, <, >\u00af Q(y)8<\n:\n\nDi(x)\u02d8=, 6=, <, >\u00af Di(y)1\nA\n9=\n;\n\n\u2203R Q 6= R \u2227 \ufb00 \u2200x\u2200y x 6= y \u2192 0\n\nR(x)\u02d8=, 6=, <, >\u00af R(y)1\n7 \uf6be\u2200Q\nA\nTable 1: Templates used to construct a hypothesis space of logical schemata. An instance of a given\ntemplate can be created by choosing an element from each set enclosed in braces (some sets are laid\nout horizontally to save space), replacing each occurrence of Di or Dj with a dimension (e.g. D1)\nand replacing each occurrence of vk or vl with a value (e.g. 1).\n\n\u2227\n\u2228\n\u2194\n\n\u2227\n\u2228\n\u2194\n\nin a logical language. The language includes three binary connectives\u2014and (\u2227), or (\u2228), and if\nand only if (\u2194). Four binary relations (=, 6=, <, and >) are available for comparing values along\ndimensions. Universal quanti\ufb01cation (\u2200x) and existential quanti\ufb01cation (\u2203x) are both permitted,\nand the language includes quanti\ufb01cation over objects (\u2200x) and dimensions (\u2200Q). For example, the\nschema in Figure 1 states that all dimensions are aligned. More precisely, if D1 is the dimension\nof size, the schema states that for all dimensions Q, a component x is smaller than a component y\nalong dimension Q if and only if x is smaller in size than y. It follows that all three dimensions must\nincrease or decrease together.\nTo explain how rules in this logical language are learned we work with the hierarchical generative\nmodel in Figure 1. The representation at the top level is a schema s, and we assume that one or\nmore groups g are generated from a distribution P (g|s). Following a standard approach to category\nlearning [17, 18], we assume that g is uniformly sampled from all groups consistent with s:\n\np(g|s) \u221d (cid:26) 1 g is consistent with s\n\n0 otherwise\n\n(1)\n\nFor all applications in this paper, we assume that the number of components in a group is known\nand \ufb01xed in advance.\nThe bottom level of the hierarchy speci\ufb01es observations o that are generated from a distribution\nP (o|g). In most cases we assume that g can be directly observed, and that P (o|g) = 1 if o = g and\n0 otherwise. We also consider the setting shown in Figure 1 where o is generated by concealing a\ncomponent of g chosen uniformly at random. Note that the observation o in Figure 1 includes only\nfour of the components in group g, and is roughly analogous to our earlier example of a sonnet with\nan illegible \ufb01nal word.\nTo convert Figure 1 into a fully-speci\ufb01ed probabilistic model it remains to de\ufb01ne a prior distribution\nP (s) over schemata. An appealing approach is to consider all of the in\ufb01nitely many sentences in\nthe logical language already mentioned, and to de\ufb01ne a prior favoring schemata which correspond\nto simple (i.e. short) sentences. We approximate this approach by considering a large but \ufb01nite\nspace of sentences that includes all instances of the templates in Table 1 and all conjunctions of\nthese instances. When instantiating one of these templates, each occurrence of Di or Dj should be\nreplaced by one of the dimensions in the domain. For example, the schema in Figure 1 is a simpli\ufb01ed\ninstance of template 6 where Di is replaced by D1. Similarly, each instance of vk or vl should be\nreplaced by a value along one of the dimensions. Our \ufb01rst experiment considers a problem where\nthere are are three dimensions and three possible values along each dimension (i.e. vk = 1, 2, or\n3). As a result there are 1568 distinct instances of the templates in Table 1 and roughly one million\n\n3\n\n\fconjunctions of these instances. Our second experiment uses three dimensions with \ufb01ve values along\neach dimension, which leads to 2768 template instances and roughly three million conjunctions of\nthese instances.\nThe templates in Table 1 capture most of the simple regularities that can be formulated in our logical\nlanguage. Template 1 generates all rules that include quanti\ufb01cation over a single object variable and\nno binary connectives. Template 3 is similar but includes a single binary connective. Templates\n2 and 4 are similar to 1 and 3 respectively, but include two object variables (x and y) rather than\none. Templates 5, 6 and 7 add quanti\ufb01cation over dimensions to Templates 2 and 4. Although the\ntemplates in Table 1 capture a large class of regularities, several kinds of templates are not included.\nSince we do not assume that the dimensions are commensurable, values along different dimensions\ncannot be directly compared (\u2203x D1(x) = D2(x) is not permitted. For the same reason, compar-\nisons to a dimension value must involve a concrete dimension (\u2200x D1(x) = 1 is permitted) rather\nthan a dimension variable (\u2200Q \u2200x Q(x) = 1 is not permitted). Finally, we exclude all schemata\nwhere quanti\ufb01cation over objects precedes quanti\ufb01cation over dimensions, and as a result there are\nsome simple schemata that our implementation cannot learn (e.g. \u2203x\u2200y\u2203Q Q(x) = Q(y)).\nThe extension of each schema is a set of groups, and schemata with the same extension can be\nassigned to the same equivalence class. For example, \u2200x D1(x) = v1 (an instance of template 1)\nand \u2200x D1(x) = v1 \u2227 D1(x) = v1 (an instance of template 3) end up in the same equivalence class.\nEach equivalence class can be represented by the shortest sentence that it contains, and we de\ufb01ne\nour prior P (s) over a set that includes a single representative for each equivalence class. The prior\nprobability P (s) of each sentence is inversely proportional to its length: P (s) \u221d \u03bb|s|, where |s| is\nthe length of schema s and \u03bb is a constant between 0 and 1. For all applications in this paper we set\n\u03bb = 0.8.\nThe generative model in Figure 1 can be used for several purposes, including schema learning (in-\nferring a schema s given one or more instances generated from the schema), classi\ufb01cation (deciding\nwhether group gnew belongs to a category given one or more instances of the category) and genera-\ntion (generating a group gnew that belongs to the same category as one or more instances). Our \ufb01rst\nexperiment explores all three of these problems.\n\n2 Experiment 1: Relational classi\ufb01cation\n\nOur \ufb01rst experiment is organized around a triad task where participants are shown one example of a\ncategory then asked to decide which of two choice examples is more likely to belong to the category.\nTriad tasks are regularly used by studies of relational categorization, and have been used to argue\nfor the importance of comparison [1]. A comparison-based approach to this task, for instance, might\ncompare the example object to each of the choice objects in order to decide which is the better\nmatch. Our \ufb01rst experiment is intended in part to explore whether a schema-learning approach can\nalso account for inferences about triad tasks.\nMaterials and Method. 18 adults participated for course credit and interacted with a custom-built\ncomputer interface. The stimuli were groups of \ufb01gures that varied along three dimensions (color,\nsize, and ball position, as in Figure 1). Each shape was displayed on a single card, and all groups in\nExperiment 1 included exactly three cards. The cards in Figure 1 show \ufb01ve different values along\neach dimension, but Experiment 1 used only three values along each dimension.\nThe experiment included inferences about 10 triads. Participants were told that aliens from a certain\nplanet \u201cenjoy organizing cards into groups,\u201d and that \u201cany group of cards will probably be liked\nby some aliens and disliked by others.\u201d The ten triad tasks were framed as questions about the\npreferences of 10 aliens. Participants were shown a group that Mr X likes (different names were\nused for the ten triads), then shown two choice groups and told that \u201cMr X likes one of these groups\nbut not the other.\u201d Participants were asked to select one of the choice groups, then asked to generate\nanother 3-card group that Mr X would probably like. Cards could be added to the screen using an\n\u201cAdd Card\u201d button, and there were three pairs of buttons that allowed each card to be increased or\ndecreased along the three dimensions. Finally, participants were asked to explain in writing \u201cwhat\nkind of groups Mr X likes.\u201d\nThe ten triads used are shown in Figure 2. Each group is represented as a 3 by 3 matrix where\nrows represent cards and columns show values along the three dimensions. Triad 1, for example,\n\n4\n\n\f(b) D2 uniform\n\n132\n332\n233\n\n4 8 12 16 20 24\n\n1\n\n0.5\n\n4 8 12 16 20 24\n\n311\n113\n313\n\n311\n113\n323\n\n1\n0.5\n\n1\n0.5\n\n4 8 12 16 20 24\n\n4 8 12 16 20 24\n\n(d) D1 and D3 anti-aligned\n\n(e) Two dimensions aligned\n\n(f) Two dimensions anti-aligned\n\n4 8 12 16 20 24\n\n1\n\n0.5\n\n4 8 12 16 20 24\n\n321\n222\n123\n\n321\n122\n223\n\n4 8 12 16 20 24\n\n1\n\n0.5\n\n(a) D1 value always 3\n1\n0.5\n\n321\n332\n313\n\n1\n\n0.5\n\n1\n0.5\n\n331\n323\n333\n\n231\n323\n333\n\n(c) D2 and D3 aligned\n1\n0.5\n\n311\n122\n333\n\n1\n\n0.5\n\n1\n0.5\n\n211\n222\n233\n\n211\n232\n223\n\n311\n322\n333\n\n133\n133\n133\n\n331\n122\n213\n\n1\n\n0.5\n\n1\n\n0.5\n\n1\n\n0.5\n\n1\n0.5\n\n1\n0.5\n\n1\n0.5\n\n1\n0.5\n\n1\n0.5\n\n1\n0.5\n\n331\n212\n133\n\n331\n322\n313\n\n231\n132\n333\n\n112\n212\n312\n\n1\n0.5\n\n1\n0.5\n\n4 8 12 16 20 24\n\n4 8 12 16 20 24\n\n1\n0.5\n\n1\n0.5\n\n4 8 12 16 20 24\n\n4 8 12 16 20 24\n\n1\n0.5\n\n1\n0.5\n\n4 8 12 16 20 24\n\n4 8 12 16 20 24\n\n1\n0.5\n\n1\n0.5\n\n4 8 12 16 20 24\n\n4 8 12 16 20 24\n\n111\n212\n313\n\n211\n312\n113\n\n4 8 12 16 20 24\n\n311\n212\n113\n\n111\n312\n213\n\n(g) All dimensions uniform\n\n(h) Some dimension uniform\n\n4 8 12 16 20 24\n\n1\n\n0.5\n\n312\n312\n312\n\n313\n312\n312\n\n4 8 12 16 20 24\n\n231\n222\n213\n\n231\n322\n213\n\n(i) All dimensions have no repeats\n\n(j) Some dimension has no repeats\n\n4 8 12 16 20 24\n\n1\n\n0.5\n\n311\n232\n123\n\n211\n232\n123\n\n4 8 12 16 20 24\n\n311\n321\n331\n\n311\n331\n331\n\nFigure 2: Human responses and model predictions for the ten triads in Experiment 1. The plot at the\nleft of each panel shows model predictions (white bars) and human preferences (black bars) for the\ntwo choice groups in each triad. The plots at the right of each panel summarize the groups created\nduring the generation phase. The 23 elements along the x-axis correspond to the regularities listed\nin Table 2.\n\n5\n\n\fAll dimensions aligned\nTwo dimensions aligned\n\n1\n2\n3 D1 and D2 aligned\n4 D1 and D3 aligned\n5 D2 and D3 aligned\n6\n7\n8 D1 and D2 anti-aligned\n9 D1 and D3 anti-aligned\n10 D2 and D3 anti-aligned\n11 All dimensions have no repeats\nTwo dimensions have no repeats\n12\n\nAll dimensions aligned or anti-aligned\nTwo dimensions anti-aligned\n\n13 One dimension has no repeats\n14 D1 has no repeats\n15 D2 has no repeats\n16 D3 has no repeats\n17 All dimensions uniform\n18\nTwo dimensions uniform\n19 One dimension uniform\n20 D1 uniform\n21 D2 uniform\n22 D3 uniform\n23 D1 value is always 3\n\nTable 2: Regularities used to code responses to the generation tasks in Experiments 1 and 2\n\nhas an example group including three cards that each take value 3 along D1. The \ufb01rst choice group\nis consistent with this regularity but the second choice group is not. The cards in each group were\narrayed vertically on screen, and were initially sorted as shown in Figure 2 (i.e. \ufb01rst by D3, then by\nD2 and then by D1). The cards could be dragged around on screen, and participants were invited\nto move them around in order to help them understand each group. The mapping between the three\ndimensions in each matrix and the three dimensions in the experiment (color, position, and size) was\nrandomized across participants, and the order in which triads were presented was also randomized.\nModel predictions and results. Let ge be the example group presented in the triad task and g1\nand g2 be the two choice groups. We use our model to compute the relative probability of two\nhypotheses: h1 which states that ge and g1 are generated from the same schema and that g2 is sam-\npled randomly from all possible groups, and h2 which states that ge and g2 are generated from the\nsame schema. We set P (h1) = P (h2) = 0.5, and compute posterior probabilities P (h1|ge, g1, g2)\nand P (h2|ge, g1, g2) by integrating over all schemata in the hypothesis space already described.\nOur model assumes that two groups are considered similar to the extent that they appear to have\nbeen generated by the same underlying schema, and is consistent with the generative approach to\nsimilarity described by Kemp et al. [19].\nModel predictions for the ten triads are shown in Figure 2. In each case, the choice probabilities\nplotted (white bars) are the posterior probabilities of hypotheses h1 and h2. In nine out of ten cases\nthe best choice according to the model is the most common human response. Responses to triads 2c\nand 2d support the idea that people are sensitive to relationships between dimensions (i.e. alignment\nand anti-alignment). Triads 2e and 2f are similar to triads studied by Kotovsky and Gentner [1], and\nwe replicate their \ufb01nding that people are sensitive to relationships between dimensions even when\nthe dimensions involved vary from group to group. The one case where human responses diverge\nfrom model predictions is shown in Figure 2h. Note that the schema for this triad involves existential\nquanti\ufb01cation over dimensions (some dimension is uniform), and according to our prior P (s) this\nkind of quanti\ufb01cation is no more complex than other kinds of quanti\ufb01cation. Future applications of\nour approach can explore the idea that existential quanti\ufb01cation over dimensions (\u2203Q) is psycholog-\nically more complex than universal quanti\ufb01cation over dimensions (\u2200Q) or existential quanti\ufb01cation\nover cards (\u2203x), and can consider logical languages that incorporate this inductive bias.\nTo model the generation phase of the experiment we computed the posterior distribution\n\nP (gnew|ge, g1, g2) = Xs,h\n\nP (gnew|s)P (s|h, ge, g1, g2)P (h|ge, g1, g2)\n\nwhere P (h|ge, g1, g2) is the distribution used to model selections in the triad task. Since the space\nof possible groups is large, we visualize this distribution using a pro\ufb01le that shows the posterior\nprobability assigned to groups consistent with the 23 regularities shown in Table 2. The white bar\nplots in Figure 2 show pro\ufb01les predicted by the model, and the black plots immediately above show\npro\ufb01les computed over the groups generated by our 18 participants.\nIn many of the 10 cases the model accurately predicts regularities in the groups generated by people.\nIn case 2c, for example, the model correctly predicts that generated groups will tend to have no\nrepeats along dimensions D2 and D3 (regularities 15 and 16) and that these two dimensions will be\naligned (regularities 2 and 5). There are, however, some departures from the model\u2019s predictions,\nand a notable example occurs in case 2d. Here the model detects the regularity that dimensions D1\nand D3 are anti-aligned (regularity 9). Some groups generated by participants are consistent with\n\n6\n\n\f(a) All dimensions aligned\n\n111\n333\n444\n555\n\n121\n232\n443\n555\n\n1\n\n0.5\n\n1\n0.5\n\n1\n0.5\n\n4 8 12 16 20 24\n\n1\n\n0.5\n\n(b) D2 and D3 aligned\n1\n0.5\n\n311\n322\n333\n355\n\n311\n322\n333\n354\n\n2\n2\n2\n\n3\n3\n3\n\n5\n5\n5\n\n3\n2\n2\n\n4 8 12 16 20 24\n(c) D1 has no repeats, D2 and D3 uniform\n\n4\n4\n3\n\n2\n2\n3\n\n2\n1\n3\n\n4\n5\n3\n\n(d) D2 uniform\n\n124\n224\n324\n524\n\n1\n\n0.5\n\n1\n0.5\n\n1\n0.5\n\n431\n433\n135\n335\n\n4 8 12 16 20 24\n\n1\n\n0.5\n\n4 8 12 16 20 24\n\n4 8 12 16 20 24\n\n4 8 12 16 20 24\n\n1\n0.5\n\n1\n0.5\n\n1\n0.5\n\n4\n2\n4\n\n4\n2\n2\n\n4\n2\n3\n\n4\n1\n5\n\n4 8 12 16 20 24\n\n2\n3\n4\n\n4\n3\n1\n\n3\n3\n4\n\n5\n3\n3\n\n4 8 12 16 20 24\n\n(e) All dimensions uniform\n\n(f) All dimensions have no repeats\n\n314\n314\n314\n314\n\n1\n\n0.5\n\n4\n1\n3\n\n5\n1\n3\n\n4\n1\n2\n\n4\n1\n4\n\n1\n0.5\n\n1\n0.5\n\n4 8 12 16 20 24\n\n1\n\n0.5\n\n4 8 12 16 20 24\n\n251\n532\n314\n145\n\n1\n0.5\n\n1\n0.5\n\n3\n2\n4\n\n3\n4\n4\n\n4\n2\n3\n\n4\n1\n3\n\n4 8 12 16 20 24\n\n4 8 12 16 20 24\n\nFigure 3: Human responses and model predictions for the six cases in Experiment 2. In (a) and (b),\nthe 4 cards used for the completion and generation phases are shown on either side of the dashed line\n(completion cards on the left). In the remaining cases, the same 4 cards were used for both phases.\nThe plots at the right of each panel show model predictions (white bars) and human responses (black\nbars) for the generation task. In each case, the 23 elements along each x-axis correspond to the\nregularities listed in Table 2. The remaining plots show responses to the completion task. There are\n125 possible responses, and the four responses shown always include the top two human responses\nand the top two model predictions.\n\nthis regularity, but people also regularly generate groups where two dimensions are aligned rather\nthan anti-aligned (regularity 2). This result may indicate that some participants are sensitive to\nrelationships between dimensions but do not consider the difference between a positive relationship\n(alignment) and an inverse relationship (anti-alignment) especially important.\nKotovsky and Gentner [1] suggest that comparison can explain how people respond to triad tasks,\nalthough they do not provide a computational model that can be compared with our approach. It is\nless clear how comparison might account for our generation data, and our next experiment considers\na one-shot generation task that raises even greater challenges for a comparison-based approach.\n\n3 Experiment 2: One-shot schema learning\n\nAs described already, comparison involves constructing mappings between pairs of category in-\nstances. In some settings, however, learners make con\ufb01dent inferences given a single instance of a\ncategory [15, 20], and it is dif\ufb01cult to see how comparison could play a major role when only one\ninstance is available. Models that rely on abstraction, however, can naturally account for one-shot\nrelational learning, and we designed a second experiment to evaluate this aspect of our approach.\n\n7\n\n\fSeveral previous studies have explored one-shot relational learning. Holyoak and Thagard [21]\ndeveloped a study of analogical reasoning using stories as stimuli and found little evidence of one-\nshot schema learning. Ahn et al. [11] demonstrated, however, that one-shot learning can be achieved\nwith complex materials such as stories, and modeled this result using explanation-based learning.\nHere we use much simpler stimuli and explore a probabilistic approach to one-shot learning.\nMaterials and Method. 18 adults participated for course credit. The same individuals completed\nExperiments 1 and 2, and Experiment 2 was always run before Experiment 1. The same computer\ninterface was used in both experiments, and the only important difference was that the \ufb01gures in\nExperiment 2 could now take \ufb01ve values along each dimension rather than three.\nThe experiment included two phases. During the generation phase, participants saw a 4-card group\nthat Mr X liked and were asked to generate two 5-card groups that Mr X would probably like.\nDuring the completion phase, participants were shown four members of a 5-card group and were\nasked to generate the missing card. The stimuli used in each phase are shown in Figure 3. In the\n\ufb01rst two cases, slightly different stimuli were used in the generation and completion phases, and in\nall remaining cases the same set of four cards was used in both cases. All participants responded to\nthe six generation questions before answering the six completion questions.\nModel predictions and results. The generation phase is modeled as in Experiment 1, but now the\nposterior distribution P (gnew|ge) is computed after observing a single instance of a category. The\nhuman responses in Figure 3 (white bars) are consistent with the model in all cases, and con\ufb01rm that\na single example can provide suf\ufb01cient evidence for learners to acquire a relational category. For\nexample, the most common response in case 3a was the 5-card group shown in Figure 1\u2014a group\nwith all three dimensions aligned.\nTo model the completion phase, let oe represent a partial observation of group ge. Our model\ninfers which card is missing from ge by computing the posterior distribution P (ge|oe) \u221d\nP (oe|ge)Ps P (ge|s)P (s), where P (oe|ge) captures the idea that oe is generated by randomly con-\ncealing one component of ge. The white bars in Figure 3 show model predictions, and in \ufb01ve out of\nsix cases the best response according to the model is the same as the most common human response.\nIn the remaining case (Figure 3d) the model generates a diffuse distribution over all cards with value\n3 on dimension 2, and all human responses satisfy this regularity.\n\n4 Conclusion\nWe presented a generative model that helps to explain how relational categories are learned and\nused. Our approach captures relational regularities using a logical language, and helps to explain\nhow schemata formulated in this language can be learned from observed data. Our approach differs\nin several respects from previous accounts of relational categorization [1, 5, 10, 22]. First, we focus\non abstraction rather than comparison. Second, we consider tasks where participants must generate\nexamples of categories [16] rather than simply classify existing examples. Finally, we provide a\nformal account that helps to explain how relational categories can be learned from a single instance.\nOur approach can be developed and extended in several ways. For simplicity, we implemented our\nmodel by working with a \ufb01nite space of several million schemata, but future work can consider\nhypothesis spaces that assign non-zero probability to all regularities that can be formulated in the\nlanguage we described. The speci\ufb01c logical language used here is only a starting point, and future\nwork can aim to develop languages that provide a more faithful account of human inductive biases.\nFinally, we worked with a domain that provides one of the simplest ways to address core questions\nsuch as one-shot learning. Future applications of our general approach can consider domains that\ninclude more than three dimensions and a richer space of relational regularities.\nRelational learning and analogical reasoning are tightly linked, and hierarchical generative models\nprovide a promising approach to both problems. We focused here on relational categorization, but\nfuture studies can explore whether probabilistic accounts of schema learning can help to explain\nthe inductive inferences typically considered by studies of analogical reasoning. Although there are\nmany models of analogical reasoning, there are few that pursue a principled probabilistic approach,\nand the hierarchical Bayesian approach may help to \ufb01ll this gap in the literature.\nAcknowledgments We thank Maureen Satyshur for running the experiments. This work was supported in part\nby NSF grant CDI-0835797.\n\n8\n\n\fReferences\n[1] L. Kotovsky and D. Gentner. Comparison and categorization in the development of relational\n\nsimilarity. Child Development, 67:2797\u20132822, 1996.\n\n[2] D. Gentner and A. B. Markman. Structure mapping in analogy and similarity. American\n\nPsychologist, 52:45\u201356, 1997.\n\n[3] D. Gentner and J. Medina. Similarity and the development of rules. Cognition, 65:263\u2013297,\n\n1998.\n\n[4] B. Falkenhainer, K. D. Forbus, and D. Gentner. The structure-mapping engine: Algorithm and\n\nexamples. Arti\ufb01cial Intelligence, 41:1\u201363, 1989.\n\n[5] J. E. Hummel and K. J. Holyoak. A symbolic-connectionist theory of relational inference and\n\ngeneralization. Psychological Review, 110:220\u2013264, 2003.\n\n[6] M. Mitchell. Analogy-making as perception: a computer model. MIT Press, Cambridge, MA,\n\n1993.\n\n[7] D. R. Hofstadter and the Fluid Analogies Research Group. Fluid concepts and creative analo-\n\ngies: computer models of the fundamental mechanisms of thought. 1995.\n\n[8] W. V. O. Quine and J. Ullian. The Web of Belief. Random House, New York, 1978.\n[9] J. Skorstad, D. Gentner, and D. Medin. Abstraction processes during concept learning: a\nstructural view. In Proceedings of the 10th Annual Conference of the Cognitive Science Society,\npages 419\u2013425. 2009.\n\n[10] D. Gentner and J. Loewenstein. Relational language and relational thought.\n\nIn E. Amsel\nand J. P. Byrnes, editors, Language, literacy and cognitive development: the development and\nconsequences of symbolic communication, pages 87\u2013120. 2002.\n\n[11] W. Ahn, W. F. Brewer, and R. J. Mooney. Schema acquisition from a single example. Journal\n\nof Experimental Psychology: Learning, Memory and Cognition, 18(2):391\u2013412, 1992.\n\n[12] A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin. Bayesian data analysis. Chapman &\n\nHall, New York, 2nd edition, 2003.\n\n[13] C. Kemp, N. D. Goodman, and J. B. Tenenbaum. Learning and using relational theories. In J.C.\nPlatt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing\nSystems 20, pages 753\u2013760. MIT Press, Cambridge, MA, 2008.\n\n[14] S. Kok and P. Domingos. Learning the structure of Markov logic networks. In Proceedings of\n\nthe 22nd International Conference on Machine Learning, 2005.\n\n[15] J. Feldman. The structure of perceptual categories. Journal of Mathematical Psychology, 41:\n\n145\u2013170, 1997.\n\n[16] A. Jern and C. Kemp. Category generation. In Proceedings of the 31st Annual Conference of\nthe Cognitive Science Society, pages 130\u2013135. Cognitive Science Society, Austin, TX, 2009.\n[17] D. Conklin and I. H. Witten. Complexity-based induction. Machine Learning, 16(3):203\u2013225,\n\n1994.\n\n[18] J. B. Tenenbaum and T. L. Grif\ufb01ths. Generalization, similarity, and Bayesian inference. Be-\n\nhavioral and Brain Sciences, 24:629\u2013641, 2001.\n\n[19] C. Kemp, A. Bernstein, and J. B. Tenenbaum. A generative theory of similarity. In B. G. Bara,\nL. Barsalou, and M. Bucciarelli, editors, Proceedings of the 27th Annual Conference of the\nCognitive Science Society, pages 1132\u20131137. Lawrence Erlbaum Associates, 2005.\n\n[20] C. Kemp, N. D. Goodman, and J. B. Tenenbaum. Theory acquisition and the language of\nIn Proceedings of the 30th Annual Conference of the Cognitive Science Society,\n\nthought.\npages 1606\u20131611. Cognitive Science Society, Austin, TX, 2008.\n\n[21] K. J. Holyoak and P. Thagard. Analogical mapping by constraint satisfaction. Cognitive Sci-\n\nence, 13(3):295\u2013355, 1989.\n\n[22] L. A. A. Doumas, J. E. Hummel, and C. M. Sandhofer. A theory of the discovery and predica-\n\ntion of relational concepts. Psychological Review, 115(1):1\u201343, 2008.\n\n[23] M. L. Gick and K. J. Holyoak. Schema induction and analogical transfer. Cognitive Psychol-\n\nogy, 15:1\u201338, 1983.\n\n9\n\n\f", "award": [], "sourceid": 177, "authors": [{"given_name": "Charles", "family_name": "Kemp", "institution": null}, {"given_name": "Alan", "family_name": "Jern", "institution": null}]}