{"title": "A Model for Associative Multiplication", "book": "Advances in Neural Information Processing Systems", "page_first": 17, "page_last": 23, "abstract": null, "full_text": "A Model for Associative Multiplication \n\nG. Bjorn Christianson* \nDepartment of Psychology \n\nMcMaster University \n\nHamilton,Ont. L8S 4Kl \n\nbjorn@caltech.edu \n\nSuzanna Becker \n\nDepartment of Psychology \n\nMcMaster University \n\nHamilton, Onto L8S 4Kl \n\nbecker@mcmaster.ca \n\nAbstract \n\nDespite the fact that mental arithmetic is based on only a few hun(cid:173)\ndred basic facts and some simple algorithms, humans have a diffi(cid:173)\ncult time mastering the subject, and even experienced individuals \nmake mistakes. Associative multiplication, the process of doing \nmultiplication by memory without the use of rules or algorithms, \nis especially problematic. Humans exhibit certain characteristic \nphenomena in performing associative multiplications, both in the \ntype of error and in the error frequency. We propose a model for \nthe process of associative multiplication, and compare its perfor(cid:173)\nmance in both these phenomena with data from normal humans \nand from the model proposed by Anderson et al (1994). \n\n1 \n\nINTRODUCTION \n\nAssociative mUltiplication is defined as multiplication done without recourse to \ncomputational algorithms, and as such is mainly concerned with recalling the basic \ntimes table. Learning up to the ten times table requires learning at most 121 \nfacts; in fact, if we assume that normal humans use only four simple rules, the \nnumber of facts to be learned reduces to 39. In theory, associative multiplication is \ntherefore a simple problem. In reality, school children find it difficult to learn, and \neven trained adults have a relatively high rate of error, especially in comparison \nto performance on associative addition, which is superficially a similar problem. \nThere has been surprisingly little work done on the methods by which humans \nperform basic multiplication problems; an excellent review of the current literature \nis provided by McCloskey et al (1991). \n\nIf a model is to be considered plausible, it must have error characteristics similar to \n\n* Author to whom correspondence should be addressed. Current address: Computation \n\nand Neural Systems, California Institute of Technology 139-74, Pasadena, CA 91125. \n\n\f18 \n\nG. B. Christianson and S. Becker \n\nthose of humans at the same task. In arithmetic, this entails accounting for, at a \nminimum, two phenomena. The first is the problem size effect, as noted in various \nstudies (e.g. Stazyk et ai, 1982), where response times and error rates increase for \nproblems with larger operands. Secondly, humans have a characteristic distribution \nin the types of errors made. Specifically, errors can be classified as one of the \nfollowing five types, as suggested by Campbell and Graham (1985), Siegler (1988), \nMcCloskey et al (1991), and Girelli et al (1996): operand, where the given answer is \ncorrect with one of the operands replaced (e.g. 4 x 7 = 21; this category accounts \nfor 66.4% of all errors made by normal adults); close-miss, where the result is within \nten percent of the correct response (4 x 7 = 29; 20.0%); table, where the result is \ncorrect for a problem with both operands replaced (4 x 7 = 25; 3.9%); non-table, \nwhere the result is not on the times table (4 x 7 = 17; 6.7%); or operation, where \nthe answer would have been correct for a different arithmetic operation, such as \naddition (4 x 7 = 11; 3.0%)1. \nIt is reasonable to assume that humans use at least two distinct representations \nwhen dealing with numbers. The work by Mandler and Shebo (1982) on modeling \nthe performance of various species (including humans, monkeys, and pigeons) on \nnumerosity judgment tasks suggests that in such cases a coarse coding is used. On \nthe other hand, humans are capable of dealing with numbers as abstract symbolic \nconcepts, suggesting the use of a precise localist coding. Previous work has either \nused only one of these coding ideas (for example, Sokol et ai, 1991) or a single \nrepresentation which combined aspects of both (Anderson et ai, 1994). \n\nWarrington (1982) documented DRC, a patient who suffered dyscalculia following \na stroke. DRC retained normal intelligence and a grasp of numerical and arithmetic \nconcepts. When presented with an arithmetic problem, DRC was capable of rapidly \nproviding an approximate answer. However, when pressed for a precise answer, he \nwas incapable of doing so without resorting to an explicit computational algorithm \nsuch as counting. One possible interpretation of this case study is that D RC retained \nthe ability to work with numbers in a magnitude-related fashion, but had lost the \nability to treat numbers as symbolic concepts. This suggests the hypothesis that \nhumans may use two separate, concurrent representations for numbers: both a \ncoarse coding and a more symbolic, precise coding in the course of doing associative \narithmetic in general, and multiplication in particular, and switch between the \ncodings at various points in the process. This hypothesis will form the basis of our \nmodeling work. To guide the placement of these transitions between representations, \nwe assume the further constraint that the coarse coding is the preferred coding (as \nit is conserved across a wide variety of species) and will tend to be expressed before \nthe precise coding. \n\n1 1 1 1 1 1 1 1 \n\n1 1 1 1 1 1 1 1 \n\n1 1 1 1 1 1 1 1 \n\n\u2022 6 \u2022 1 \n\nI \n\n, \n\n1 1 1 1 1 1 1 1 \n\n1 1 1 1 1 1 1 \n\n1 1 1 1 1 1 1 \n\n1 1 1 1 1 1 \n\nI \nI \n\n1 1 1 1 1 1 \n\n1 1 1 1 1 1 \n\n123450188 11 11 111 1 1122222222223333333333444\"44444456056 ( i \n0 12 3 45 ' 1 81012346178001234056780012345 '188 01234 \n\n1 1 1 1 1 1 \n\nFigure 1: The coarse coding for digits. Numbers along the left are the digit; numbers \nalong the bottom are position numbers. Blank regions in the grid represent zero \nactivity. \n\nIData taken from Girelli et al (1996). \n\n\fA Model for Associative Multiplication \n\n2 METHODOLOGY \n\n19 \n\nFollowing the work of Mandler and Shebo (1982), our coarse coding consists of a \n54-dimensional vector, with a sliding \"bump\" of ones corresponding to the magni(cid:173)\ntude of the digit represented. The size of the bump decreases and the degree of \noverlap increases as the magnitude of the digit increases (Figure 1). Noise in this \nrepresentation is simulated by the probability that a given bit will be in the wrong \nstate. The precise representation, intended for symbolic manipulation of numbers, \nconsists of a 10-dimensional vector with the value of the coded digit given by the \ndimension of greatest activity. Both of these representations are digit-based: each \nvector codes only for a number between 0 and 9, with concatenations of vectors \nused for numbers greater than 9. \n\no \u2022 o \n\no \no \n\no \u2022 o \n\no \no \n\ndirection of flow \n\nFigure 2: Schematic of the network architecture. (A) The coarse coding. (B) The \nwinner-take-all network. (C) The precise coding. (D) The feed-forward look-up \ntable. See text for details. \nThe model is trained in three distinct phases. A simple one-layer perceptron \ntrained by a winner-take-all competitive learning algorithm is used to map the \ninput operands from the original coarse coding into the precise representation. \nThe network was trained for 10 epochs, each with a different set of 5 samples \nof noisy coarse-coded digits. At the end of training, the winner-take-all network \nperformed at near-perfect levels. The translated operands are then presented to a \ntwo-layer feed-forward network with a logistic activation function trained by back(cid:173)\npropagation. The number of hidden units was equal to the number of problems in \nthe training set (in this case, 32) to force look-up table behaviour. The look-up \ntable was trained independently for varying numbers of iterations, using a learn(cid:173)\ning rate constant of 0.01. The output of the look-up table is coarse coded as in \nFigure 1. In the final phase, the table output is translated by the winner-take-all \nnetwork to provide the final answer in the precise coding. A schematic of the net(cid:173)\nwork architecture is given in Figure 2. The operand vectors used for training of \nboth networks had a noise parameter of 5%, while the vectors used in the analysis \nhad 7.5% noise. Both the training and the testing problem set consisted of ten \ncopies of each of the problems listed in Table 2, which are the problems used in \n\n\f20 \n\nG. B. Christianson and S. Becker \n\nAnderson et al (1994). Simulations were done in MATLAB v5.1 (Mathworks, Inc., \n24 Prime Park Way, Natick MA, 01760-1500). \n\n3 RESULTS \n\nOO r---------~====================~ \n80 \n\ni Normal humans (Girelli eta/1996) \n\nModel 01 Anderson eta/(1994) \nModel, 200 iterations training \nModel, 400 iterations training \nModel, 600 iterations training \n\no \n\noperand \n\nclose-miss \n\ntable \n\nError Category \n\nnon-table \n\noperation \n\nFigure 3: Error distributions for human data (Girelli et al 1996), the model of \nAnderson et al (1994), and our model. \n\nOnce a model has been trained, its errors on the training data can be categorized \naccording to the error types listed in the Introduction section; a summary of the \nperformance of our model is presented in Table 1. For comparison, we plot data \ngenerated by our model, the model of Anderson et al (1994), and human data from \nGirelli et al (1996) in Figure 3. In no case did the model generate an operation error. \nThis is to be expected, as the model was only trained on multiplication, it should \npermit no way in which to make an operation error, other than by coincidence. A \nfull set of results obtained from the model with 400 training iterations is presented \nin Table 22. \n\nTable 1: Error rates generated by our model. A column for operation errors is not \nincluded, as in no instance did our model generate an operation error. \n\nIterations Errors in Operand Close-miss Table Non-table \n\n320 trials \n\n200 \n400 \n600 \n\n114 \n85 \n65 \n\n(%) \n61.4 \n65.9 \n63.7 \n\n(%) \n21.0 \n20.0 \n16.9 \n\n(%) \n8.8 \n7.1 \n9.2 \n\n(%) \n8.8 \n7.1 \n10.8 \n\n2 As in Anderson et al (1994) , we have set 8 x 9 = 67 deliberately so that it is not the \n\nonly problem with an answer greater than 70. \n\n\fA Model for Associative Multiplication \n\n21 \n\nTable 2: Results from ten trials run with the model after 400 training iterations. \nErrors are marked in boldface. \n\nI Problem II 1 I 2 I 3 I 4 I 5Trta~ I 7 I 8 I 9 I 10 I \n\n2x2 \n2x4 \n2 x 5 \n3x7 \n3 x 8 \n3 x 9 \n4 x 2 \n4x5 \n4x6 \n4x8 \n4x9 \n5 x 2 \n5 x 7 \n5 x 8 \n6 x 3 \n6x4 \n6x5 \n6 x 6 \n6 x 7 \n6x8 \n7 x 3 \n7x4 \n7 x 5 \n7x6 \n7x7 \n7x8 \n8x3 \n8x4 \n8 x 6 \n8 x 7 \n8 x 8 \n8 x 9 \n\n4 \n8 \n10 \n21 \n24 \n27 \n8 \n20 \n24 \n32 \n36 \n30 \n30 \n\n4 \n8 \n10 \n21 \n64 \n27 \n8 \n20 \n20 \n32 \n36 \n10 \n35 \n\n4 \n8 \n10 \n21 \n24 \n27 \n8 \n20 \n24 \n32 \n36 \n10 \n35 \n\n12 \n24 \n30 \n36 \n42 \n\n4 \n8 \n10 \n21 \n24 \n27 \n8 \n30 \n20 \n22 \n21 \n10 \n35 \n\n4 \n4 \n4 \n4 \n4 \n4 \n8 \n8 \n8 \n8 \n8 \n8 \n10 \n10 \n10 \n10 \n10 \n10 \n21 \n21 \n21 \n21 \n21 \n21 \n24 \n24 \n24 \n24 \n21 \n21 \n27 \n27 \n27 \n27 \n27 \n21 \n8 \n8 \n8 \n10 \n8 \n8 \n20 \n20 \n20 \n20 \n20 \n20 \n24 \n24 \n24 \n24 \n35 \n20 \n32 \n32 \n32 \n32 \n32 \n32 \n36 \n36 \n36 \n36 \n36 \n30 \n10 \n10 \n10 \n10 \n10 \n10 \n35 \n35 \n30 42 \n30 30 \n40 \n34 \n30 30 30 35 30 34 30 30 \n18 \n18 \n24 \n24 \n24 \n24 \n24 \n24 \n18 18 \n30 \n30 \n30 \n30 \n30 \n36 \n36 \n36 \n36 \n42 \n42 \n42 \n42 \n42 \n32 \n48 \n40 44 \n64 49 \n21 \n21 \n21 \n24 \n24 \n28 \n28 \n28 \n32 \n22 \n35 \n35 \n35 \n35 \n35 \n42 \n42 \n42 \n42 \n49 \n49 \n49 \n42 \n29 \n42 \n56 \n56 \n64 \n64 64 \n24 \n24 \n24 \n24 \n24 \n32 \n32 \n32 \n32 \n32 \n49 44 56 \n44 49 \n56 \n56 \n52 \n64 \n64 \n64 \n67 \n67 \n67 \n\n18 \n28 \n24 \n24 \n30 \n30 \n36 \n36 \n42 \n42 \n44 44 64 \n21 \n21 \n21 \n28 \n28 \n28 \n35 \n35 \n30 \n42 \n42 \n42 \n49 \n49 \n52 \n56 \n56 \n64 \n24 \n24 \n34 \n32 \n32 \n64 \n44 46 \n42 \n62 \n54 \n67 \n\n18 \n24 \n24 \n18 \n30 \n30 \n36 \n36 \n42 \n49 \n49 \n42 \n21 \n21 \n28 \n28 \n35 \n35 \n42 \n42 \n49 \n49 \n64 \n56 \n24 \n21 \n32 \n32 \n49 44 \n56 \n49 \n64 \n64 \n67 \n67 \n\n46 64 64 49 \n64 \n64 \n67 \n67 \n\n64 \n67 \n\n64 \n67 \n\nThe convention in the current arithmetic literature is to test for the existence of a \nproblem-size effect by fitting a line to the errors made versus the sum of operands \nin the problem. Positive slopes to such fits would demonstrate the existence of a \nproblem size effect. The results of this analysis are shown in Figure 4. The model \nhad a problem size effect in all instances. Note that no claims are made of the \nappropriateness of a linear model for the given data, nor should any conclusions be \ndrawn from the specific parameters of the fit, especially given the sparsity of the \ndata. The sole point of this analysis is to highlight a generally increasing trend. \n\n4 DISCUSSION \n\nAs noted in the Results section above, our model demonstrates the problem-size \neffect in number of errors made (see Figure 4), though the chosen architecture does \nnot permit a response time effect. The presence of this effect is hardly surprising, \nas all models which use a representation similar to our coarse coding (Mandler & \nShebo, 1982; Anderson et al, 1994) display a problem-size effect. \n\n\f22 \n\nG. B. Christianson and S. Becker \n\n80 \n\n70 \n\n60 \n\n~ 50 \nu \n~ \n840 \n\nc:: -~30 \n\n20 \n\n10 \n\ny=3.6x-13 \n\n\u2022 \n\n\u2022 \n\n\u2022 \n\n\u2022 \u2022 \n\n10 \n\n12 \n\nSum of Operands \n\n\u2022 \n\n\u2022 \n\n\u2022 \n\n16 \n\n18 \n\n14 \n\nFigure 4: Demonstration of the problem size effect. The data plotted here is for \nthe model trained for 400 iterations, as it proved the best fit to the distribution of \nerrors in humans (Figure 3); a similar analysis gives a best-fit slope of 1.9 for 200 \ntraining iterations and 1.1 for 600 training iterations. \n\nIt has been suggested by a few researchers (e.g. Campbell & Graham, 1985) that \nthe problem-size effect is simply a frequency effect, as humans encounter problems \ninvolving smaller operands more often in real life. While there is some evidence to \nthe contrary (Hamman and Ashcraft, 1986) , it remains a possibility. \nIt is immediately apparent from Figure 3 that our model has much the same distri(cid:173)\nbution of errors as seen in normal humans, and is superior to the model of Anderson \net al (1994) in this regard. That model, implemented as an auto-associative network \nusing a Brain State in a Box (BSB) architecture (Anderson et al, 1994; Anderson \n1995) generates too many operand errors, and no table, non-table or operation \nerrors. These deficiencies can be predicted from the attractor nature of an auto(cid:173)\nassociative network. It is the process of translating between representations for \ndigits, and the possibility for error in doing so, which we believe allows our model \nto produce its various categories of errors. \n\nAn interesting aspect of our model is revealed by Figure 3 and Table 1. While in(cid:173)\ncreased training of the look-up table improves the overall performance of the model, \nthe error distribution remains relatively constant across the length of training stud(cid:173)\nied. This suggests that in this model, the error distribution is an inherent feature \nof the architecture, and not a training artifact. This corresponds with data from \nnormal humans , in which the error distribution remains relatively constant across \nindividuals (Girelli et al, 1996). As noted above, the design of our model should \npermit the occurrence of all the various error types, save for operation errors. How(cid:173)\never, at this point, we do not have a clear understanding of the exact architectural \nfeatures that generate the error distribution itself. \n\nDefining a model for associative multiplication is only a single step towards the goal \nof understanding how humans perform general arithmetic. Rumelhart et al (1986) \nproposed a mechanism for multi-digit arithmetic operations given a mechanism for \nsingle-digit operations , which addresses part of the issue; this model has been im(cid:173)\nplemented for addition by Cottrell and T 'sung (1991). The fact that humans make \noperation errors suggests that there might be interactions between the mechanisms \n\n\fA Model for Associative Multiplication \n\n23 \n\nof associative multiplication and associative addition; conversely, errors on these \ntasks may occur on different processing levels entirely. \n\nIn summary, this model, despite several outstanding questions, shows great potential \nas a description of the associative multiplication process. Eventually, we expect it \nto form the basis for a more complete model of arithmetic in human cognition. \n\nAcknowledgements \n\nThe first author acknowledges financial support from McMaster University and In(cid:173)\ndustry Canada. The second author acknowledges financial support from the Natural \nSciences and Engineering Research Council of Canada. We would like to thank J . \nLinden, D. Meeker, J. Pezaris, and M. Sahani for their feedback and comments on \nthis work. \n\nReferences \n\nAnderson J.A. et al. (1994) In Neural Networks for Knowledge Inference and Rep(cid:173)\nresentation, Levine D.S. & Aparcicio M., Eds. \nHillsdale NJ) pp. 311-335. \n\n(Lawrence Erlbaum Associates, \n\nAnderson J.A. (1995) An Introduction to Neural Networks. (MIT Press/Bradford, \nCambridge MA) pp. 493-544. \nCampbell J.I.D. & Graham D.J. (1985) Canadian Journal of Psychology. 39338. \nCottrell G.W. & T'sung F.S . (1991) In Advances in Connectionist and Neural Com(cid:173)\nputation Theory, Burnden J.A. & Pollack J.B., Eds. (Ablex Publishing Co., Nor(cid:173)\nwood NJ) pp. 305-321. \nGirelli L. et al. (1996) Cortex. 32 49. \nHamman M.S. & Ashcraft M.H. (1986) Cognition and Instruction. 3 173. \nMandler G. & Shebo B.J. (1982) Journal of Experimental Psychology: General. 111 \n1. \n\nMcCloskey M. et al. (1991) Journal of Experimental Psychology: Learning, Mem(cid:173)\nory, and Cognition. 17 377. \n\nRumelhart D.E. et al. (1986) In Parallel distributed processing: Explorations in the \nmicrostructure of cognition. Vol. 2: Psychological and biological models, McClelland \nJL, Rumelhart DE, & the PDP Research Groups, Eds. \n(MIT Press/Bradford, \nCambridge MA) pp. 7-57. \nSiegler R. (1988) Journal of Experimental Psychology: General. 117 258. \nStazyk E.H. et al. (1982) Journal of Experimental Psychology: Learning, Memory, \nand Cognition. 8 355. \n\nWarrington E.K. (1982) Quarterly Journal of Experimental Psychology. 34A 31. \n\n\f", "award": [], "sourceid": 1488, "authors": [{"given_name": "G.", "family_name": "Christianson", "institution": null}, {"given_name": "Suzanna", "family_name": "Becker", "institution": null}]}