Given some training data how should we choose a particular network clas(cid:173) sifier from a family of networks of different complexities? In this paper we discuss how the application of stochastic complexity theory to classifier design problems can provide some insights into this problem. In particular we introduce the notion of admissible models whereby the complexity of models under consideration is affected by (among other factors) the class entropy, the amount of training data, and our prior belief. In particular we discuss the implications of these results with respect to neural architec(cid:173) tures and demonstrate the approach on real data from a medical diagnosis task.
Introduction and Motivation
In this paper we examine in a general sense the application of Minimum Description Length (MDL) techniques to the problem of selecting a good classifier from a large set of candidate models or hypotheses. Pattern recognition algorithms differ from more conventional statistical modeling techniques in the sense that they typically choose from a very large number of candidate models to describe the available data. Hence, the problem of searching through this set of candidate models is frequently a formidable one, often approached in practice by the use of greedy algorithms. In this context, techniques which allow us to eliminate portions of the hypothesis space are of considerable interest. We will show in this paper that it is possible to use the intrinsic structure of the MDL formalism to eliminate large numbers of candidate models given only minimal information about the data. Our results depend on the