{"title": "Putting It All Together: Methods for Combining Neural Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 1188, "page_last": 1189, "abstract": "", "full_text": "Pulling It All Together: Methods for \n\nCombining Neural Networks \n\nMichael P. Perrone \n\nInstitute for Brain and Neural Systems \n\nBrown University \n\nProvidence, RI \n\nmpp@cns. brown. edu \n\nThe past several years have seen a tremendous growth in the complexity of the \nrecognition, estimation and control tasks expected of neural networks. In solving \nthese tasks, one is faced with a large variety of learning algorithms and a vast \nselection of possible network architectures. After all the training, how does one know \nwhich is the best network? This decision is further complicated by the fact that \nstandard techniques can be severely limited by problems such as over-fitting, data \nsparsity and local optima. The usual solution to these problems is a winner-take-all \ncross-validatory model selection. However, recent experimental and theoretical work \nindicates that we can improve performance by considering methods for combining \nneural networks. \n\nThis workshop examined current neural network optimization methods based on \ncombining estimates and task decomposition, including Boosting, Competing Ex(cid:173)\nperts, Ensemble Averaging, Metropolis algorithms, Stacked Generalization and \nStacked Regression. The issues covered included Bayesian considerations, the \nrole of complexity, the role of cross-validation, incorporation of a priori knowl(cid:173)\nedge, error orthogonality, task decomposition, network selection techniques, over(cid:173)\nfitting, data sparsity and local optima. Highlights of each talk are given below. \nTo obtain the workshop proceedings, please contact the author or Norma Caccia \n(norma_caccia@brown.edu) and ask for IBNS ONR technical report #69. \n\nM. Perrone (Brown University, \"Averaging Methods: Theoretical Issues and Real \nWorld Examples\") presented weighted averaging schemes [7], discussed their theo(cid:173)\nretical foundation [6], and showed that averaging can improve performance when(cid:173)\never the cost function is (positive or negative) convex which includes Mean Square \nError, a general class of Lp-norm cost functions, Maximum Likelihood Estimation, \nMaximum Entropy, Maximum Mutual Information, the Kullback-Leibler Informa(cid:173)\ntion (Cross Entropy), Penalized Maximum Likelihood Estimation and Smoothing \nSplines [6]. Averaging was shown to improve performance on the NIST OCR data, \na human face recognition task and a time series prediction task [5]. \nJ. Friedman (Stanford, \"A New Approach to Multiple Outputs Using Stacking\") \npresented a detailed analysis of a method for averaging estimators and noted sim(cid:173)\nulations showed that averaging with a positivity constraint was better than cross-\n\n1188 \n\n\fPulling It All Together: Methods for Combining Neural Networks \n\n1189 \n\nvalidation estimator selection [1]. \nS. Nowlan (Synaptics, \"Competing Experts\") emphasized the distinctions between \nstatic and dynamic algorithms and between averaged and stacked algorithms; and \npresented results of the mixture of experts algorithm [3] on a vowel recognition task \nand a hand tracking task. \nH. Drucker (AT&T, \"Boosting Compared to Other Ensemble Methods\") reviewed \nthe boosting algorithm [2] and showed how it can improve performance for OCR \ndata. \nJ. Moody (OGI, \"Predicting the U.S. Index ofIndustrial Production\") showed that \nneural networks make better predictions for the US IP index than standard models \n[4] and that averaging these estimates improves prediction performance further. \nW. Buntine (NASA Ames Research Cent.er, \"Averaging and Probabilistic Networks: \nAutomating the Process\") discussed placing combination techniques within the \nBayesian framework. \nD. Wolpert (Santa Fe Institute, \"Infen ing a Function vs. Inferring an Inference \nAlgorithm\") argued that theory can not, in general, identify the optimal network; \nso one must make assumptions in order to improve performance. \nH. Thodberg (Danish Meat Research Institute, \"Error Bars on Predictions from \nDeviations among Committee Member~ (within Bayesian Backprop)\") raised the \nprovocative (and contentious) point that Bayesian arguments support averaging \nwhile Occam's Razor (seemingly?) does not. \nS. Hashem (Purdue University, \"Merits of Combining Neural Networks: Potential \nBenefits and Risks\") emphasized the importance of dealing with collinearity when \nusing averaging methods. \n\nReferences \n\n[1] Leo Breiman. Stacked regression. Technical Report TR-367, Department of \n\nStatistics, University of California, Berkeley, August 1992. \n\n[2] Harris Drucker, Robert Schapire, and Patrice Simard. Boosting performance \nin neural networks. International Journal of Pattern Recognition and Artificial \nIntelligence, [To appear]. \n\n[3] R. A. Jacobs, M. 1. Jordan, S. J. Nowlan, and G. E. Hinton. Adaptive mixtures \n\nof local experts. Neural Computation, 3(2), 1991. \n\n[4] U. Levin, T. Leen, and J. Moody. Fa.st pruning using principal components. In \nSteven J. Hanson, Jack D. Cowan, and C. Lee Giles, editors, Advances in Neural \nInformation Processing Systems 6. Morgan Kaufmann, 1994. \n\n[5] M. P. Perrone. Improving Regression Estimation: A veraging ~Methods for Vari(cid:173)\n\nance Reduction with Eztensions to General Convez Measure Optimization. PhD \nthesis, Brown University, Institute for Brain and Neural Systems; Dr. Leon N \nCooper, Thesis Supervisor, May 1993. \n\n[6] M. P. Perrone. General averaging results for convex optimization. In Proceedings \nof the 1993 Connectionist Models Su,mmer School, pages 364-371, Hillsdale, N.T, \n1994. Erlbaum Associates. \n\n[7] M. P. Perrone and L. N Cooper. '!\\Then networks disagree: Ensemble method for \n\nneural networks. In Artificial Neuml Networks for Speech and l!ision. Chapman(cid:173)\nHall, 1993. Chapter 10. \n\n\f\f", "award": [], "sourceid": 744, "authors": [{"given_name": "Michael", "family_name": "Perrone", "institution": null}]}