Part of Advances in Neural Information Processing Systems 3 (NIPS 1990)
Terrence L. Fine
The three problems that concern us are identifying a natural domain of pattern classification applications of feed forward neural networks, select(cid:173) ing an appropriate feedforward network architecture, and assessing the tradeoff between network complexity, training set size, and statistical reli(cid:173) ability as measured by the probability of incorrect classification. We close with some suggestions, for improving the bounds that come from Vapnik(cid:173) Chervonenkis theory, that can narrow, but not close, the chasm between theory and practice.
1 Speculations on Neural Network Pattern Classifiers
The goal is to provide rapid, reliable classification of new inputs from a (1) pattern source. Neural networks are appropriate as pattern classifiers when the pattern sources are ones of which we have little understanding, beyond perhaps a nonparametric statistical model, but we have been provided with classified samples of features drawn from each of the pattern categories. Neural networks should be able to provide rapid and reliable computation of complex decision functions. The issue in doubt is their statistical response to new inputs.
(2) The pursuit of optimality is misguided in the context of Point (1). Indeed, it is unclear what might be meant by 'optimality' in the absence of a more detailed mathematical framework for the pattern source.
(3) The well-known, oft-cited 'curse of dimensionality' exposed by Richard Bell(cid:173) man may be a 'blessing' to neural networks. Individual network processing nodes (e.g., linear threshold units) become more powerful as the number of their inputs increases. For a large enough number n of points in an input space of d dimensions, the number of dichotomies that can be generated by such a node grows exponen(cid:173) tially in d. This suggests that, unlike all previous efforts at pattern classification that required substantial effort directed at the selection of low-dimensional feature vectors so as to make the decision rule calculable, we may now be approaching a