{"title": "A Knowledge-Based Model of Geometry Learning", "book": "Advances in Neural Information Processing Systems", "page_first": 887, "page_last": 894, "abstract": null, "full_text": "A Knowledge-Based Model of Geometry Learning \n\nGeoffrey Towell \n\nSiemens Corporate Research \n\n755 College Road East \nPrinceton, NJ 08540 \n\ntowe ll@ learning. siemens. com \n\nRichard Lehrer \n\nEducational Psychology \nUniversity of Wisconsin \n1025 West Johnson St. \nMadison, WI 53706 \n\nlehrer@vms.macc. wisc.edu \n\nAbstract \n\nWe propose a model of the development of geometric reasoning in children that \nexplicitly involves learning. The model uses a neural network that is initialized \nwith an understanding of geometry similar to that of second-grade children. \nThrough the presentation of a series of examples, the model is shown to develop \nan understanding of geometry similar to that of fifth-grade children who were \ntrained using similar materials. \n\n1 Introduction \n\nOne of the principal problems in instructing children is to develop sequences of examples \nthat help children acquire useful concepts. In this endeavor it is often useful to have a \nmodel of how children learn the material, for a good model can guide an instructor towards \nparticularly effective examples. In short, good models of learning help a teacher maximize \nthe utility of the example presented. \n\nThe particular problem with which we are concerned is learning about conventional \nconcepts in geometry, like those involved in identifying, and recognizing similarities and \ndifferences among, shapes. This is a difficult subject to teach because children (and adults) \nhave a complex set of informal rules for geometry (that are often at odds with conventional \nrules). Hence, instruction must supplant this informal geometry with a common formalism. \nTo be efficient in their instruction, teachers need a model of geometric learning which, at \nthe very least: \n\n1. can represent children's understanding of geometry prior to instruction, \n2. can describe how understanding changes as a result of instruction, \n3. can predict the effect of differing instructional sequences. \n\nIn this paper we describe a neural network based model that has these properties. \n\n887 \n\n\f888 \n\nTowell and Lehrer \n\nAn extant model of geometry learning, the \"van Hiele model\" [6] represents children's \nunderstanding as purely perceptual -- appearances dominate reasoning. However, our \nresearch suggests that children's reasoning is better characterized as a mix of perception \nand rules. Moreover, unlike the model we propose, the van Hiele model can neither be used \nto test the effectiveness of instruction prior to trying that instruction on children nor can it \nbe used to describe how understanding changes as a result of a specific type of instruction. \n\nBriefly, our model uses a set of rules derived from interviews with first and second \ngrade children [1, 2], to produce a stereotypical informal conception of geometry. These \nrules, described in more detail in Section 2.1, give our model an explicit representation \nof pre-instructional geometry understanding. The rules are then translated into a neural \nnetwork using the KBANN algorithm [3]. As a neural network, our model can test the effect \nof differing instructional sequences by simply training two instances with different sets of \nexamples. The experiments in Section 3 take advantage of this ability of our model; they \nshow that it is able to accurately model the effect of two different sets of instruction. \n\n2 ANew Model \n\nThis section describes the initial state of our model and its implementation as a neural \nnetwork. The initial state of the model is intended to reproduce the decision processes \nof a typical child prior to instruction. The methodology used to derive this information \nand a brief description of this information both are in the first subsection. In addition, \nthis subsection contains a small experiment that shows the accuracy of the initial state of \nthe model. In the next subsection, we briefly describe the translation of those rules into a \nneural network. \n\n2.1 The initial state of the model \n\nOur model is based upon interviews with children in first and second grade [1, 2]. In these \ninterviews, children were presented with sets of three figures such as the triad in Figure 1. \nThey were asked which pair of the three figures is the most similar and why they made \ntheir decision. These interviews revealed that, prior to instruction, children base judgments \nof similarity upon the seven attributes in Table 1. \n\nFor the triad discrimination task, children find ways in which a pair is similar that is not \nshared by the other two pairs. For instance, Band C in Figure 1.2 are both pointy but A \nis not. As a result, the modal response of children prior to instruction is that {B C} is the \nmost similar pair. This decision making process is described by the rules in Table 2. \n\nIn addition to the rules in Table 2, we include in our initial model a set of rules that describe \ntemplates for standard geometric shapes. This addition is based upon interviews with \nchildren which suggest that they know the names of shapes such as triangles and squares, \nand that they associate with each name a small set of templates. Initially, children treat \nthese shape names as having no more importance than any of the attributes in Table 1. So, \nour model initial treats shape names exactly as one of those attributes. Over time children \nlearn that the names of shapes are very important because they are diagnostic (the name \nindicates properties). Our hope was that the model would make a similar transition so that \nthe shape names would become sufficient for similarity determination. \n\nNote that the rules in Table 2 do not always yield a unique decision. Rather, there are \n\n\fA Knowledge-Based Model of Geometry Learning \n\n889 \n\nTable 1: Attributes used by children prior to instruction. \n\nAttribute name \nTilt \nArea \nPointy \n2 long & short \n\nPossible values \n0, 10,20,30,40 \n\nsmall, medium, large \n\nyes, no \nyes, no \n\nAttribute name \nSlant \nShape \nDirection \n\nPossible values \n\nyes, no \n\nskinny, medium, fat \n\n+-,-,j,l \n\nTable 2: Rules for similarity judgment in the triad discrimination task. \n\n1. IF fig-val(figl?, att?) = fig-val(fig2?, att?) THEN \n\nsame-att-value(figl?, fig2?, att?). \n\n2. IF not (same-att-value(figl?, fig3?, att?)) AND figl? * fig3? \n\nAND fig2? * fig3? THEN unq-sim(figl?, fig2?, att?). \n\n3. IF c(unq-sim(figl?, fig2?, att?)) > \n\nc(unq-sim(figl?, fig2?, att?)) AND \nc(unq-sim(figl?, fig3?, att?)) > c(unq-sim(fig2?, fig3?, att?)) \nAND figl? * fig3? AND fig2?* fig3? THEN \nmost-similar(figl?, fig2?). \n\nLabels followed by a '?' indicate variables. \nfig-val(fig?, att?) returns the value of att? in fig? \ncO counts the number of instances. \n\nA \n\nC \n\nB \n\n0 <> \nA ~ 7 \nb \n0 \n\nD \nD \n0 \nC7 D ~ 9 \n~ ~ \n\nt. \n\n~ \n\n1 \n\n2 \n\n3 \n\n4 \n\n5 \n\nA \n\nB \n\nC \n\n6 D \n\nc==--\n\n8 ~ \n\n0 \n\nD ~ \n~ ~ \nC> ~ \n/ \\ ~ \n\nFigure 1: Triads used to test learning. \n\ntriads for which these rules cannot decide which pair is most similar. This is not often \nthe case for a particular child, who usually finds one attribute more salient than another. \nYet, frequently when the rules cannot uniquely identify the most similar pair, a classroom \nof children is equally divided. Hence, the model may not accurately predict an individual \nresponse, but is it usually correct at identifying the modal responses. \n\nTo verify the accuracy of the initial state of our model, we used the set of nine testing triads \nshown in Figure 1 which were developed for the interviews with children. As shown in \nTable 3, the model matches very nicely responses obtained from a separate sample of 48 \nsecond grade children. Thus, we believe that we have a valid point from which to start. \n\n2.2 The translation of rule sets into a neural network \n\nWe translate rules sets into neural networks using the KBANN algorithm [3] which uses a \nset of hierarchically-structured rules in the form of propositional Horn clauses to set the \ntopology and initial weights of an artificial neural network. Because the rules in Table 2 are \n\n\f890 \n\nTowell and Lehrer \n\nTable 3: Initial responses by the model. \n\nTriad Number \nInitial Model \nSecond Grade Children BC BC AC AC BC ABIBC AC \n\n1 \nBC BC AC AC BC ABIBC AC ADIBC ACIBC \nACIBC \n\nAD \n\n6 \n\n9 \n\n2 \n\n3 \n\n4 \n\n5 \n\n7 \n\n8 \n\nAnswers in the \"initial model\" row indicate the responses generated by the initial rules. \nMore than response in a column indicates that the rules could not differentiate among two \npairs. \n\nAnswers in the \"second grade\" row are the modal responses of second grade children. More \nthan one answer in a column indicates that equal numbers of children judged the pairs most \nsimilar. \n\nTable 4: Properties used to describe figures. \n\nProperty name \nConvex \n# Sides \n# Angles \nAll Sides Equal \n# Right Angles \nAll Angles Equal \n# Equal Angles \n\nvalues \nYes No \n34568 \n34568 \nYes No \n01234 \nYes No \n\n0234568 \n\nProperty name \n# Pairs Equal Opposite Angles \n# Pairs Opposite Sides Equal \n# Pairs Parallel Sides \nAdj acent Angles = 180 \n# Lines of Symmetry \n# Equal Sides \n\nvalues \n01234 \n01234 \n01234 \nYes No \n\n01234568 \n0234568 \n\nnot in propositional form, they must be expanded before they can be accepted by KBANN. \nThe expansion turns a simple set of three rules into an ugly set of approximately 100 rules. \n\nFigure 2 is a high-level view of the structure of the neural network that results from the \nrules. In this implementation we present all three figures at the same time and all decisions \nare made in parallel. Hence, the rules described above must be repeated at least three \ntimes. In the neural network that results from the rule translation, these repeated rules are \nnot independent. Rather they are linked so that modifications of the rules are shared across \nevery pairing. Thus, the network cannot learn a rule which applies only to one pair. \n\nFinally, the model begins with the set of 13 properties listed in Table 4 in addition to the \nattributes of Table 1. (Note that we use \"attribute\" to refer to the informal, visual features \nin Table 1 and \"property\" to refer to the symbolic features in Table 4.) As a result, each \nfigure is described to the model as a 74 position vector (18 positions encode the attributes; \nthe remaining 56 positions encode the properties). \n\n3 An Experiment Using the Model \n\nOne of the points we made in the introduction is that a useful model of geometry learning \nshould be able to predict the effect of instruction. The experiment reported in this section \ntests this facet of our model. Briefly, this experiment trains two instances of our model \nusing different sets of data. We then compare the instances to children who have been \ntrained using a set of problems similar to one of those used to train the model. Our results \nshow that the two instances learn quite different things. Moreover, the instance trained \nwitn material similar to the children predicts the children's responses on test problems with \na high level of accuracy. \n\n\fA Knowledge-Based Model of Geometry Learning \n\n891 \n\nrll\"Bci~''''''~ \n.... \n. ....... .. \ni most similar \u00a7 \n\n.. .. .. ... .. .. \n\nUnique \n\nSame \n\n? \nAB \n\nBoxes indicate one or more units. \nDashed boxes indicate units associated with duplicated rules \nDashed lines indicate one or more negatively weighted links. \nSolid lines indicate one or more positively weighted links. \n\nFigure 2: The structure of the neural network for our model. \n\n3.1 Training the model \n\nFor this experiment, we developed two sets of training shapes. One set contains every \npolygon in a fifth-grade math textbook [4] (Figure 3). The other set consists of 81 items \nwhich might be produced by a child using a modified version of LOGO (Figure 4). Here \nwe assume that one of the effects of learning geometry with a tool like LOGO is simply to \nincrease the extent and range of possible examples. A collection of 33 triads were selected \nfrom each set to train the model. 1 Training consisted of repeated presentations of each of \nthe 33 triads until the network correctly identified the most similar pair for each triad. \n\n3.2 Tests of the model \n\nIn this section, we test the ability of the model to accurately predict the effects of instruction. \nWe do this by comparing the two trained instances of the model to the modal responses \nof fifth graders who had used LOGO for two weeks. In those two weeks, the children had \ngenerates many (but not all) of the figures in Figure 4. Hence, we expected that the instance \n\n1 In choosing the same number of triads for each training set, we are being very generous to the \ntextbook. In reality, not only do children see more figures when using LOGO, they are also able to \nmake many more contrasts between figures. Hence, it might be more accurate to make the LOGO \ntraining set much larger than the textbook training set. \n\n\f892 \n\nTowell and Lehrer \n\nFigure 3: Representative textbook shapes. \n\nFigure 4: Representative shapes encountered using a modified version of LOGo. \n\nof the model trained using triads drawn from Figure 4 would better predict the responses \nof these children than the other instance of the model. \n\nClearly, the results in Table 5 verify our expectations. The LOGo-trained model agrees with \nthe modal responses of children on an average of six examples while the textbook-trained \nmodel agrees on an average of three examples. The respective binomial probabilities of six \nand three matches is 0.024 and 0.249. These probabilities suggest that the match between \nthe LOGo-trained model and the children is unlikely to have occurred by chance. On \nthe other hand, the instance of the model trained by the textbook examples has the most \nprobable outcome from simply random guessing. Thus, we conclude that the LOGo-trained \nmodel is a good predictor of children's learning when using LOGO. \n\nIn addition, whereas the textbook-trained model was no better than chance at estimating \nthe conventional response, the LOGo-trained model matched convention on an average \nof seven triads. \nInterestingly, on both triads where the LOGo-trained model did not \nmatch convention, it could not due to lack of appropriate information. For triad 3, \nconvention matches the trapezoid with the parallelogram rather than either of these with the \nquadrilateral because the trapezoid and the parallelogram both have some pairs of parallel \nlines. The model, however, has only information about the number of pairs of parallel \nlines. On the basis of this feature, the three figures are equally dissimilar. For triad 7, the \nother triad for which the LOGo-trained model did not match convention, the conventional \nparing matches two obtuse triangles. However, the model has no information about angles \nother than number and number of right angles. Hence, it could not possibly get this triad \ncorrect (at least not for the right reason). We expect that correcting these minor weaknesses \nwill improve the model's ability to make the conventional response. \n\nTable 5: Responses after learning by trained instances of the model and children. \n\n1 \n\nTriad Number \n9 \nTextbook Trained \nAC \nLOGO Trained \nBC \nFifth Grade Children ABIBC AB AC/AB BC AB ABIBC AC ABIBC BC \nConvention \nBC \n\n2 \nABIBC BC \nAB/BC AB \n\n5 \n4 \nAC BC \nBC AB \n\n3 \nAC \n?? \n\n8 \nAB \nAB \n\n6 \nAB \nAB \n\nAB \n\nAB \n\n7 \nAC \nAB \n\nAB \n\nBC AB \n\nAB \n\nBC \n\nAB \n\nResponses by the model are the modal responses over 500 trials. \n?? indicates that the model was unable to select among the pairings. \n\nThe success of our model in the prediction experiment lead us to investigate the reasons \nunoerlying the answers generated by its two instances. In so doing we hoped to gain \nan understanding of the networks' reasoning processes. Such an understanding would \n\n\fA Knowledge-Based Model of Geometry Learning \n\n893 \n\nbe invaluable in the design of instruction for it would allow the selection of examples \nthat fill specific learning deficits. Unfortunately, trained neural networks are often nearly \nimpossible to comprehend. However, using tools such as those described by Towell and \nShavlik [5], we believe that we developed a reasonably clear understanding of the effects \nof each set of training examples. \n\nThe LOGo-trained model made comprehensive adjustments of its initial conditions. Of the \neight attributes, it attends to only size and 2 long & short after training. \\Vhile learning \nto ignore most of the attributes, the model also learned to pay attention to several of the \nproperties. In particular, number of angles, number of sides, all angles, equal, all sides \nequal, and number of pairs of opposite sides parallel, all were important to the network \nafter training. Thus, the LOGo-trained instance of the model made a significant transition in \nits basis for geometric reasoning. Sadly, in making this transition, the declarative clarity of \nthe initial rules was lost. Hence, it is impossible to precisely state the rules that the trained \nmodel used to make its final decisions. \n\nBy contrast, the textbook-trained instance of the model failed to learn that most of \nthe attributes were unimportant. Instead, the model simply learned that several of the \nproperties were also important. As a result, reasons for answers on the test set often seemed \nschizophrenic. For instance, in responding Be on test triad 2, the network attributed \nthe decision to similarities in: area, pointiness, point-direction, number of sides, number \nof angles, number of right angle and all angles equal. Given this combination, it is \nnot surprising that the example is answered incorrectly. This result suggests that typical \ntextbooks may accentuate the importance of conventional properties, but they provide little \ngrist for abandoning the mill of informal attributes. \n\n3.3 Discussion \n\nThis experiment demonstrated the utility of our model in several ways. First, it showed \nthat the model is sensitive to differences in training set. Of itself, this is neither a surprising \nnor interesting conclusion. What is important about the difference in learning is that the \nmodel trained in a manner similar to a classroom of fifth grade children made responses to \nthe test set that we quite similar to those of fifth grade children. \n\nIn addition to making different responses to the test set, the two trained instances of the \nmodel appeared to learn different things. In particular, the LOGO-trained instance essentially \nreplaced its initial knowledge with something much more like the formal geometry. On \nthe other hand, the textbook-trained instance simply added several concepts from formal \ngeometry to the informal concept with which it was initialized. An improved transition from \ninformal to formal geometry is one of the advantages claimed for LOGO based instruction \n[2]. Hence, the difference between the two instances of the model agrees with observation \nof children. \n\nThis result suggests that our model is able to predict the effect of differing instructional \nsequences. A further experiment of this hypothesis would be to use our model to design a \nset of instruction materials. This could be done by starting with an apparently good set of \nmaterials, training the model, examining its deficiencies and revising the training materials \nappropriately. Our hypothesis is that a set of materials so constructed would be superior to \nthe materials normally used in classrooms. Testing of this hypothesis is one of our major \ndirections for future research. \n\n\f894 \n\nTowell and Lehrer \n\n4 Conclusions \n\nIn this paper we have described a model of the initial stages of geometry learning by \nelementary school children. This model is initialized using a set of rules based upon \ninterviews with first and second grade children. This set of rules is shown to accurately \npredict the responses of second grade children on a hard set of similarity determination \nproblems. \n\nGiven that we have a valid starting point for our model, we test it by training those rules, \nafter re-representing them in a neural network, with two different sets of training materials. \nEach instance of the model is analyzed in two ways. First, they are compared, on an \nindependent set of testing examples, to fifth grade children who had been trained using \nmaterials similar to one of the model's training sets. This comparison showed that the \nmodel trained with materials similar to the children accurately reproduced the responses of \nthe children. The second analysis involved examining the model after training to determine \nwhat it had learned. Both instances of the model learned to attend to the properties that were \nnot mentioned in the initial rules. The model trained with the richer (LOGo-based) training \nset also learned that the informal attributes were relatively unimportant. Conversely, the \nmodel trained with the textbook-based training examples merely added information about \nproperties to the pre-existing information. Therefore, we believe that the model we have \ndescribed is has the potential to become a valuable tool for teachers. \n\nReferences \n\n[1] R. Lehrer, W. Knight, M. Love, and L. Sancilio. Software to link action and description in \npre-proof geometry. Presented at the Annual Meeting of the American Educational Research \nAssociation, 1989. \n\n[2] R. Lehrer, L. Randle, and L. Sancilio. Learning preproof geometry with LOGO. Cognition and \n\nInstruction, 6:159--184, 1989. \n\n[3] M. O. Noordewier, G. G. Towell, and J. W. Shavlik. Training knowledge-based neural networks \nto recognize genes in DNA sequences. In Advances in Neural Infonnation Processing Systems, \nvolume 3, pages 530--536, Denver, CO, 1991. Morgan Kaufmann. \n[4] M. A. Sobel, editor. Mathematics. McGraw-Hill, New York, 1987. \n[5] G. G. Towell and J. W. Shavlik. Interpretation of artificial neural networks: Mapping knowledge(cid:173)\n\nbased neural networks into rules. \nvolume 4, pages 977--984, Denver, CO, 1991. Morgan Kaufmann. \n\nIn Advances in Neural Infonnation Processing Systems, \n\n[6] P. M. van HieJe. Structure and Insight. Academic Press, New York, 1986. \n\n\f", "award": [], "sourceid": 707, "authors": [{"given_name": "Geoffrey", "family_name": "Towell", "institution": null}, {"given_name": "Richard", "family_name": "Lehrer", "institution": null}]}