{"title": "Analysis of Unstandardized Contributions in Cross Connected Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 601, "page_last": 608, "abstract": null, "full_text": "Analysis  of Unstandardized  Contributions \n\nin  Cross  Connected  Networks \n\nThomas  R.  Shultz \nshultz@psych.mcgill.ca \n\nYuriko  Oshima-Takane \n\nyuriko@psych.mcgill.ca \n\nYoshio  Takane \n\ntakane@psych.mcgill.ca \n\nDepartment of Psychology \n\nMcGill University \n\nMontreal, Quebec, Canada H3A IBI \n\nAbstract \n\nUnderstanding  knowledge  representations  in  neural  nets  has  been  a \ndifficult  problem.  Principal  components  analysis  (PCA)  of \ncontributions (products of sending activations and connection weights) \nhas yielded valuable insights into knowledge representations, but much \nof this work has focused on the correlation matrix of contributions. The \npresent work  shows  that analyzing  the  variance-covariance matrix  of \ncontributions yields more valid  insights by taking account of weights. \n\n1  INTRODUCTION \nThe  knowledge  representations  learned  by  neural  networks  are  usually  difficult  to \nunderstand because of the non-linear properties of these nets and the fact that knowledge is \noften distributed across  many  units.  Standard network  analysis techniques,  based on  a \nnetwork's connection weights or on its hidden unit activations, have been limited. Weight \ndiagrams are typically complex and weights vary across mUltiple networks trained on the \nsame problem.  Analysis of activation patterns on  hidden  units  is limited  to  nets  with  a \nsingle layer of hidden units without cross connections. \nCross connections are direct connections that bypass intervening hidden unit layers. They \nincrease  learning  speed  in  static  networks  by  focusing  on  linear  relations  (Lang  & \nWitbrock,  1988)  and are  a  standard  feature  of generative algorithms  such  as  cascade(cid:173)\ncorrelation  (Fahlman  & Lebiere,  1990). Because such cross connections do so  much of \nthe  work,  analyses  that  are  restricted  to  hidden  unit activations  furnish  only  a partial \npicture of the network's knowledge. \nContribution analysis  has  been  shown  to  be  a  useful  technique  for  multi-layer,  cross \nconnected nets.  Sanger (1989) defined a contribution as the product of an output weight, \nthe  activation  of a  sending  unit,  and  the  sign  of the  output  target  for  that input.  Such \ncontributions are potentially  more informative than either weights alone or hidden  unit \nactivations alone since they take account of both weight and  sending activation.  Shultz \nand Elman (1994) used PCA to reduce the dimensionality of such contributions in several \ndifferent types of cascade-correlation nets. Shultz and Oshima-Takane (1994) demonstrated \nthat PCA of unscaled contributions produced even better insights into cascade-correlation \nsolutions  than  did  comparable analyses  of contributions  scaled  by  the  sign  of output \ntargets.  Sanger (1989)  had  recommended  scaling  contributions by the  signs  of output \ntargets in  order to determine whether the contributions helped or hindered the network's \nsolution. But since the signs of output targets are only available to  networks during error \n\n\f602 \n\nThomas  R.  Shultz,  Yuriko  Oshima-Takane,  Yoshio  Takane \n\ncorrection  learning,  it  is  more  natural  to  use  unscaled  contributions  in  analyzing \nknowledge representations. \nThere  is an  issue in  PCA about  whether to  use  the correlation  matrix  or the  variance(cid:173)\ncovariance  matrix.  The  correlation  matrix  contains  Is  in  the  diagonal  and  Pearson \ncorrelation  coefficients between contributions  off the diagonal.  This  has the effect of \nstandardizing  the variables  (contributions)  so  that each  has  a  mean  of 0  and  standard \ndeviation  of 1.  Effectively,  this  ensures  that  the  PCA  of a  correlation  matrix  exploits \nvariation in  input activation patterns but ignores variation in connection weights (because \nvariation in connection weights is eliminated as the contributions are standardized). \nHere,  we  report on  work  that investigates  whether  more  useful  insights  into  network \nknowledge structures can be revealed by PCA of un standardized contributions. To do this, \nwe apply PCA to the variance-covariance matrix of contributions. The variance-covariance \nmatrix  has  contribution  variances  along  the  diagonal  and  covariances  between \ncontributions  off the  diagonal.  Taking  explicit account of the  variation  in  connection \nweights in this way may produce a more valid picture of the network's knowledge. \nWe use some of the same networks and problems employed in our earlier work (Shultz & \nElman,  1994;  Shultz &  Oshima-Takane,  1994)  to facilitate comparison  of results.  The \nproblems  include  continuous  XOR,  arithmetic  comparisons  involving  addition  and \nmUltiplication, and distinguishing between two interlocking spirals. All of the nets were \ngenerated with the cascade-correlation algorithm (Fahlman & Lebiere, 1990). \nCascade-correlation begins as a perceptron and recruits hidden units into the network as it \nneeds them in order to reduce error. The recruited hidden unit is the one whose activations \ncorrelate best with the network's current error. Recruited units are installed in a cascade, \neach on a separate layer and receiving input from the input units and from  any previously \nexisting hidden units. We used the default values for all cascade-correlation parameters. \nThe goal of understanding knowledge representations learned by  networks ought to be \nuseful in a variety of contexts. One such context is cognitive modeling, where the ability \nof nets  to  merely  simulate  psychological  phenomena  is  not  sufficient  (McCloskey, \n1991). In addition, it is important to determine whether the network representations bear \nany systematic relation to the representations employed by human subjects . \n\n2  PCA  OF  CONTRIBUTIONS \nSanger's (1989)  original  contribution  analysis began  with  a  three-dimensional  array of \ncontributions (output unit x hidden unit x input pattern). In contrast, we start with a two(cid:173)\ndimensional output weight x input pattern array of contributions. This is more efficient \nthan the slicing technique used by Sanger to focus on particular output or hidden units and \nstill allows identification of the roles of specific  contributions (Shultz  &  Elman,  1994; \nShultz & Oshima-Takane, 1994). \nWe subject the variance-covariance matrix of contributions to PCA in order to identify the \nmain dimensions of variation in the contributions (Jolliffe, 1986). A component is a line \nof best fit  to  a  set  of data points  in  multi-dimensional  space.  The goal  of PCA  is  to \nsummarize  a  multivariate  data  set with  a  relatively  small  number  of components  by \ncapitalizing on covariance among the variables (in this case, contributions). \nWe use the scree test  (Cattell,  1966) to determine how  many components are useful  to \ninclude in the analysis. Varimax rotation is applied to improve the interpretability of the \nsolution. Component scores are plotted to identify the function of each component \n\n3  APPLICATION  TO  CONTINUOUS  XOR \nThe  classical  binary  XOR  problem  does  not  have  enough  training  patterns  to  make \ncontribution analysis  worthwhile.  However, we constructed a continuous version  of the \nXOR problem  by dividing the input space into four quadrants. Starting from  0.1, input \nvalues  were  incremented  in  steps  of 0.1,  producing  100  x, y  input pairs  that  can  be \npartitioned into  four  quadrants of the input space.  Quadrant a had values of x less than \n\n\fAnalysis of Unstandardized  Contributions  in  Cross  Connected Networks \n\n603 \n\n0.55 combined with values of y above 0.55. Quadrant b had values of x and y greater than \n0.55. Quadrant c had values of x  and y less than 0.55. Quadrant d had values of x greater \nthan 0.55 combined with values of y below 0.55. Similar to binary XOR, problems from \nquadrants a and d had a positive output target (0.5)  for  the net, whereas problems from \nquadrants band c had a negative output target (-0.5). There was a single output unit with \na sigmoid activation. \nThree  cascade-correlation  nets  were  trained  on  continuous  XOR.  Each  of these  nets \ngenerated a unique solution, recruiting five or six hidden units and taking from  541  to 765 \nepochs  to  learn  to  correctly  classify  all  of the  input  patterns.  Generalization  to  test \npatterns  not  in  the  training  set  was  excellent.  PCA  of  unscaled,  unstandardized \ncontributions yielded three components.  A plot of rotated component scores for  the 100 \ntraining  patterns  of net  1  is  shown  in  Figure  1.  The  component  scores  are  labeled \naccording to their respective quadrant in the input space. Three components are required to \naccount for 96.0% of the variance in the contributions. \nFigure 1 shows  that component  1,  with 44.3% of the variance in contributions,  has the \nrole of distinguishing  those quadrants with a positive output target (a  and d)  from  those \nwith a negative output target (b and c). This is indicated by the fact that the black shapes \nare at the top of the component space cube in  Figure  1 and the white  shapes are at  the \nbottom.  Components  2  and  3 represent  variation  along  the x  and y  input  dimensions, \nrespectively.  Component  2  accounted  for  26.1 % of the  variance  in  contributions,  and \ncomponent  3  accounted  for  25.6%  of the  variance  in  contributions.  Input pairs  from \nquadrants b and d (square shapes) are concentrated on the negative end of component 2, \nwhereas input pairs from quadrants a and c (circle shapes) are concentrated on the positive \nend of component 2.  Similarly, input pairs from quadrants a and b cluster on the negative \nend of component 3, and input pairs from  quadrants c and d cluster on the positive end of \ncomponent  3.  Although  the network  was  not explicitly trained to represent the x and y \ninput dimensions, it did so as an incidental feature of its learning the distinction between \nquadrants a and d vs. quadrants band c.  Similar results were obtained from  the other two \nnets learning the continuous XOR problem. \nIn contrast, PCA of the correlation matrix  from  these nets had yielded a somewhat less \nclear picture with  the third component separating quadrants a and d from quadrants b and c, \nand the first two  components representing  variation along the x and y input dimensions \n(Shultz & Oshima-Takane,  1994). PCA of the correlation matrix of scaled contributions \nhad  performed  even  worse,  with  plots  of  component  scores  indicating  interactive \nseparation of the  four quadrants, but with  no clear roles  for  the individual components \n(Shultz & Elman,  1994). \nStandardized, rotated component loadings for net 1 are plotted in Figure 2.  Such plots can \nbe  examined  to  determine  the  role played  by  each  contribution  in  the  network.  For \nexample, hidden units 2,  3, and 4 all playa major role in  the job done by component  1, \ndistinguishing positive from  negative outputs. \n\n4  APPLICATION  TO  COMPARATIVE  ARITHMETIC \nArithmetic  comparison requires  a  net  to  conclude whether a  sum  or a product of two \nintegers is greater than, less than, or equal to a comparison integer. Several psychological \nsimulations have used neural  nets to make additive and multiplicative comparisons and \nthis  has enhanced interest in this  type  of problem  (McClelland,  1989;  Shultz, Schmidt, \nBuckingham, & Mareschal, in press). \nThe first input unit coded the type of arithmetic operation to be performed: 0 for addition \nand 1 for multiplication. Three additional linear input units encoded the integers. Two of \nthese input units each coded a randomly selected integer in  the range of 0 to 9, inclusive; \nanother input unit coded a randomly selected comparison integer. For addition problems, \ncomparison  integers  ranged  from  0  to  i9,  inclusive;  for  multiplication,  comparison \nintegers ranged from 0  to  82, inclusive.  Two sigmoid output units coded the results of \nthe comparison operation. Target outputs of 0.5, -0.5  represented a greater than result, \ntargets of -0.5, 0.5 represented less than, and targets of 0.5,0.5 represented equal to. \n\n\f604 \n\nThomas  R.  Shultz,  Yuriko  Oshima-Takane,  Yoshio  Takane \n\n2 \n\nComponent 1  o \n\n-1 \n\n3 \n\nComponent 2 \n\n2 \n\n-2 \n\n-2 \n\nComponent 3 \n\nFigure 1. Rotated component scores for a continuous XOR net. Component scores for the \nx, y input pairs in quadrant a are labeled with black circles, those from  quadrant b with \nwhite squares, those from  quadrant c with white circles, and those from  quadrant d with \nblack squares. The network's task is to distinguish pairs from quadrants a and d (the black \nshapes) from  pairs from  quadrants b and c (the white shapes). Some of the white shapes \nappear black because  they are so densely  packed, but all  of the truly  black  shapes are \nrelatively high in the cube. \n\nHidden6 \n\nHidden5 \n\nHidden4 \n\nHidden3 \n\nc: \n0 \n'5 \n:g \nC  Hidden2 \n0 \n() \n\nHidden1 \n\nInput2 \n\nInput1 \n\nComponent \n\nIII  3 \n\nII  2 \u2022 \n\n-1.0 \n\n-0.5 \n\n0.0 \n\nLoading \n\n0.5 \n\n1.0 \n\nFigure 2.  Standardized, rotated component loadings for a continuous XOR net. Rotated \nloadings were standardized by dividing them by the standard deviation of the respective \ncontribution scores. \n\n\fAnalysis of Unstandardized  Contributions  in  Cross  Connected Networks \n\n605 \n\nThe  training  patterns  had  100  addition  and  100  multiplication  problems,  randomly \nselected, with the restriction that 45 of each had correct answers of greater than, 45 of each \nhad correct answers of less than, and 10 of each had correct answers of equal to.  These \nconstraints were designed to reduce the natural skew of comparative values in  the  high \ndirection on multiplication problems. \nWe ran three nets for  1000 epochs each, at which point they were very close to mastering \nthe  training  patterns.  Either seven or eight hidden  units  were recruited  along  the way. \nGeneralization to previously  unseen  test problems was  very accurate.  Four components \nwere sufficient to account for most the variance in un standardized contributions, 88.9% in \nthe case of net 1. \nFigure  3 displays  the  rotated component scores for  the  first  two components of net  1. \nComponent I, accounting for 51.1 % of the variance, separated problems with greater than \nanswers  from  problems  with  less  than  answers,  and  located  problems  with  equal  to \nanswers  in  the  middle,  at least for addition problems. Component 2, with  20.2% of the \nvariance, clearly separated multiplication from  addition. Contributions from  the first input \nunit were  strongly  associated  with  component 2.  Similar results obtained for the  other \ntwo nets. \nComponents 3 and 4, with 10.6% and 7.0% of the variance, were sensitive to variation in \nthe  second  and third  inputs,  respectively.  This  is  supported by  an  examination  of the \nmean input values of the 20 most extreme component scores on  these two  components. \nRecall that the  second and third inputs coded the two integers to be added or multiplied. \nThe negative end of component 3 had a mean second input value of 8.25; the positive end \nof this component had a mean second input value of 0.55. Component 4 had mean third \ninput value of 2.00 on the negative end and 7.55 on the positive end. \nIn  contrast, PCA of the correlation matrix for  these nets had yielded a far more clouded \npicture, with  the  largest components focusing  on input variation and lesser components \ndoing bits and pieces of the  separation of answer types and operations in an interactive \nmanner  (ShUltz  &  Oshima-Takane,  1994).  Problems  with  equal  to  answers  were  not \nisolated by  any  of the  components.  PCA  of scaled  contributions  had  produced  three \ncomponents that interactively separated the three answer types and operations, but failed \nto  represent  variation  in  input  integers  (ShUltz  &  Elman,  1994).  Essentially  similar \nadvantages for using the variance-covariance matrix were  found for nets learning either \naddition alone or multiplication alone. \n\n5  APPLICATION  TO  THE  TWO-SPIRALS  PROBLEM \nThe two-spirals problem requires a particularly difficult discrimination and a large number \nof hidden  units. The input space is defined by  two interlocking spirals that wrap around \ntheir  origin  three  times.  There are  two sets of 97  real-valued x, y pairs,  with  each  set \nrepresenting one of the spirals, and a single sigmoid output unit coded for the identity of \nthe spiral.  Our three nets took between  1313 and  1723  epochs to  master the distinction, \nand recruited from  12 to  16  hidden  units.  All three nets  generalized well  to  previously \nunseen input pairs on the paths of the two spirals. \nPCA of the variance-covariance matrix for net 1 revealed that six components accounted \nfor  a  total  of 97.9%  of the  variance  in  contributions.  The  second and fourth  of these \ncomponents together distinguished one spiral from the other, with 20.7% and 9.8% of the \nvariance respectively. Rotated component scores for these two components are plotted in \nFigure 4. A diagonal line drawn on Figure 4 from coordinates -2,2 to 2, -2 indicates that \n11  points from each spiral were misclassified by components 2 and 4. This is only  11.3% \nof the data points in  the training patterns. The fact  that the net learned all of the training \npatterns implies that these exceptions were picked up by other components. \nComponents 1 and 6, with 40.7% and 6.4% of the variance, were sensitive to variation in \nthe x and y inputs, respectively. Again, this was confirmed by the mean  input values of \nthe 20 most extreme component scores on these two components. On component  I, the \nnegative end had a mean x value of 3.55 and the positive end had a mean y value of -3.55. \n\n\f606 \n\nThomas  R.  Shultz.  Yuriko  Oshima-Takane.  Yoshio  Takane \n\n2 \n\nx>  It\" \n#~ \u2022 . \u00b7x \n\nComponent 1 \n\no \n\n-1 \n\n+< \n\n-2  L..-__  ......L ___  ....L... __  --iL..-__  -J \n2 \n\n-2 \n\n-1 \n\n0 \n\nComponent 2 \n\nFigure  3.  Rotated  component scores  for  an  arithmetic  comparison  net.  Greater  than \nproblems are  symbolized by  circles,  less  than problems by  squares, addition  by  white \nshapes,  and  multiplication  by black  shapes.  For  equal  to  problems  only,  addition  is \nrepresented by  + and multiplication by X.  Although some densely packed white shapes \nmay appear black, they have no overlap with truly black shapes. All of the black squares \nare concentrated around coordinates -1, -1. \n\n2 \n\nComponent 2 \n\no \n\n-1 \n\no \n\no \n\no \n\no \n\nSpiral 1 \n\n-2  1....-__  --' _ __  ....... _ __  ........ __  _ \n\n-2 \n\n-1 \n\no \n\n2 \n\nFigure 4.  Rotated component scores for a two-spirals net.  Squares represent data points \nfrom  spiral 1, and circles represent data points from spiral 2. \n\nComponent 4 \n\n\fAnalysis of Unstandardized  Contributions  in  Cross  Connected Networks \n\n607 \n\nOn  component 6, the negative end had a mean x value of 2.75 and the positive end had a \nmean y value of -2.75. The skew-symmetry of these means is  indicative of the perfectly \nsymmetrical  representations  that  cascade-correlation  nets  achieve  on  this  highly \nsymmetrical problem. Every data point on every component has a mirror image negative \nwith  the  opposite  signed component score on  that same component.  This -x, -y  mirror \nimage point is  always on the other spiral. Other components concentrated on particular \nregions of the spirals. The other two nets yielded essentially similar results. \nThese results can be contrasted with  our previous analyses of the two-spirals  problem, \nnone of which succeeded in showing a clear separation of the two spirals. PCAs based on \nscaled (Shultz &  Elman, 1994) or unscaled (Shultz &  Oshima-Takane,  1994) correlation \nmatrices  showed extensive  symmetries but  never a distinction  between  one  spiral  and \nanother.1  Thus,  although  it was  clear that  the nets  had  encoded the  problem's inherent \nsymmetries,  it  was  still  unclear  from  previous  work  how  the  nets  used  this  or other \ninformation to distinguish points on one spiral from  points on the other spiral. \n\n6  DISCUSSION \nOn each of these problems, there was considerable variation among network solutions, as \nrevealed, for  example, by  variation in  numbers of hidden  units recruited and signs and \nsizes of connection weights. In spite of such variation, the present technique of applying \npeA to the variance-covariance matrix of contributions yielded results that are sufficiently \nabstract  to  characterize  different  nets  learning  the  same  problem.  The  knowledge \nrepresentations produced by this analysis clearly identify the essential information that the \nnet is being trained to utilize as  well as more incidental features of the  training patterns \nsuch as the nature of the input space. \nThis  research  strengthens  earlier  conclusions  that  PCA of network  contributions  is  a \nuseful  technique  for  understanding  network  performance  (Sanger,  1989),  including \nrelatively  intractable multi-level cross connected nets (Shultz &  Elman,  1994; Shultz & \nOshima-Takane,  1994). However, the current study underscores the point that there are \nseveral ways  to  prepare  a contribution matrix for  PCA,  not all of which  yield equally \nvalid or useful results. Rather than starting with a three dimensional matrix of output unit \nx  hidden  unit x  input pattern  and  focusing  on  either one output  unit  at a  time  or one \nhidden unit at a time (Sanger,  1989), it is  preferable to collapse contributions into a two \ndimensional  matrix  of  output  weight  x  input  pattern.  The  latter  is  not  only  more \nefficient, but yields more valid results that characterize the network as a whole, rather than \nsmall parts of the network. \nAlso, rather than  scaling contributions by the sign of the output target (Sanger,  1989), it \nis better to  use unsealed contributions. Unsealed contributions are not only more realistic, \nsince the network has no knowledge of output targets during its feed-forward phase, but \nalso  produce  clearer interpretations  of the  nefs knowledge  representations  (Shultz  & \nOshima-Takane,  1994).  The latter claim  is particularly  true in  terms  of sensitivity  to \ninput dimensions and to operational distinctions between adding and multiplying. Plots of \ncomponent scores based on  unscaled contributions are  typically  not as  dense as  those \nbased on sealed contributions but are more revealing of the network's knowledge. \nFinally,  rather  than  applying peA to  the  correlation  matrix  of contributions,  it makes \nmore sense to apply it to  the variance-covariance matrix.  As  noted  in  the  introduction, \nusing the correlation matrix effectively  standardizes the  contributions to  have identical \nmeans and variances, thus obseuring the role of network connection weights. The present \nresults indicate much  clearer knowledge representations  when  the  variance-covariance \nmatrix  is  used  since  connection  weight  information  is  explicitly  retained.  Matrix \ndifferences were especially marked on  the more difficult problems, such as two-spirals, \nwhere the  only peAs to  reveal  how  nets  distinguished the spirals were those  based on \n\n1 Results  from  un scaled  contributions  on  the  two-spirals  problem  were  not  actually \npresented in Shultz & Oshima-Takane (1994) since they were not very clear. \n\n\f608 \n\nThomas  R.  Shultz,  Yuriko  Oshima-Takane,  Yoshio  Takane \n\nvariance-covariance matrices. But the relative advantages of using the variance-covariance \nmatrix were evident on the easier problems too. \nThere has been recent rapid progress in the study of the knowledge representations leamed \nby neural nets.  Feed-forward nets can be viewed as function  approximators  for relating \ninputs to outputs. Analysis of their knowledge representations should reveal how  inputs \nare encoded and transformed to produce the correct outputs. PCA of network contributions \nsheds light on  how these function approximations are done. Components emerging from \nPCA are orthonormalized ingredients of the transformations of inputs  that produce the \ncorrect outputs. Thus, PCA helps to identify the nature of the required transformations. \nFurther  progress  might  be  expected  from  combining  PCA  with  other  matrix \ndecomposition  techniques.  Constrained PCA  uses  external  information  to  decompose \nmultivariate data matrices before applying PCA (Takane & Shibayama, 1991). \nAnalysis  techniques  emerging  from  this research  will  be useful  in  understanding  and \napplying neural net research. Component loadings, for example, could be used to predict \nthe results of lesioning experiments with neural nets. Once the role of a hidden  unit has \nbeen identified by virtue of its association with a particular component, then one could \npredict that lesioning this unit would impair the function served by the component. \n\nAcknowledgments \nThis research was supported by the Natural Sciences and Engineering Research Council of \nCanada. \n\nReferences \nCattell, R.  B. (1966). The scree test for the number of factors.  Multivariate  Behavioral \n\nResearch, 1,245-276. \n\nFahlman, S.  E., &  Lebiere, C.  (1990.) The Cascade-Correlation learning architecture.  In \nD.  Touretzky (Ed.), Advances in neural information processing systems 2, (pp.  524-\n532). Mountain View, CA: Morgan Kaufmann. \n\nJolliffe, I. T. (1986). Principal component analysis. Berlin: Springer Verlag. \nLang,  K.  J.,  &  Wi tbrock ,  M.  J.  (1988).  Learning  to  tell  two  spirals  apart.  In  D. \nTouretzky,  G.  Hinton,  &  T.  Sejnowski  (Eds).,  Proceedings  of the  Connectionist \nModels Summer School, (pp. 52-59). Mountain View, CA:  Morgan Kaufmann. \n\nMcClelland, J.  L. (1989).  Parallel distributed processing:  Implications for cognition and \ndevelopment. In Morris, R. G. M.  (Ed.), Parallel distributed processing: Implications \nfor psychology and neurobiology, pp. 8-45. Oxford University Press. \n\nMcCloskey, M.  (1991). Networks and theories: The place of connectionism in cognitive \n\nscience. Psychological Science, 2, 387-395. \n\nSanger,  D.  (1989).  Contribution analysis:  A  technique for  assigning  responsibilities  to \n\nhidden units in connectionist networks. Connection  Science, 1,  115-138. \n\nShultz,  T.  R.,  &  Elman,  J.  L.  (1994).  Analyzing  cross  connected  networks.  In  J.  D. \nCowan,  G.  Tesauro,  &  J.  Alspector  (Eds.),  Advances  in  Neural  Information \nProcessing Systems 6. San Francisco, CA:  Morgan Kaufmann. \n\nShUltz,  T.  R.,  &  Oshima-Takane, Y.  (1994).  Analysis of un scaled contributions in cross \nconnected networks. In Proceedings of the World Congress on Neural Networks (Vol. \n3, pp. 690-695). Hillsdale, NJ: Lawrence Erlbaum. \n\nShultz, T. R., Schmidt, W.  C., Buckingham, D., &  Mareschal, D.  (In  press).  Modeling \ncognitive development with a generative connectionist algorithm. In G.  Halford & T. \nSimon  (Eds.),  Developing  cognitive  competence:  New  approaches  to  process \nmodeling. Hillsdale, NJ:  Erlbaum. \n\nTakane,  Y.,  &  Shibayama,  T.  (1991).  Principal  component  analysis  with  external \n\ninformation on both subjects and variables. Psychometrika, 56, 97-120. \n\n\f", "award": [], "sourceid": 881, "authors": [{"given_name": "Thomas", "family_name": "Shultz", "institution": null}, {"given_name": "Yuriko", "family_name": "Oshima-Takane", "institution": null}, {"given_name": "Yoshio", "family_name": "Takane", "institution": null}]}