Guy Mayraz, Geoffrey E. Hinton
The product of experts learning procedure  can discover a set of stochastic binary features that constitute a non-linear generative model of handwritten images of digits. The quality of generative models learned in this way can be assessed by learning a separate model for each class of digit and then comparing the unnormalized probabilities of test images under the 10 different class-specific models. To improve discriminative performance, it is helpful to learn a hierarchy of separate models for each digit class. Each model in the hierarchy has one layer of hidden units and the nth level model is trained on data that consists of the activities of the hidden units in the already trained (n - l)th level model. After train(cid:173) ing, each level produces a separate, unnormalized log probabilty score. With a three-level hierarchy for each of the 10 digit classes, a test image produces 30 scores which can be used as inputs to a supervised, logis(cid:173) tic classification network that is trained on separate data. On the MNIST database, our system is comparable with current state-of-the-art discrimi(cid:173) native methods, demonstrating that the product of experts learning proce(cid:173) dure can produce effective generative models of high-dimensional data.
1 Learning products of stochastic binary experts
Hinton  describes a learning algorithm for probabilistic generative models that are com(cid:173) posed of a number of experts. Each expert specifies a probability distribution over the visible variables and the experts are combined by multiplying these distributions together and renormalizing.
where d is a data vector in a discrete space, Om is all the parameters of individual model m, Pm(dIOm) is the probability of d under model m, and c is an index over all possible vectors in the data space.
A Restricted Boltzmann machine [2, 3] is a special case of a product of experts in which each expert is a single, binary stochastic hidden unit that has symmetrical connections to a set of visible units, and connections between the hidden units are forbidden. Inference in an RBM is much easier than in a general Boltzmann machine and it is also much easier
than in a causal belief net because there is no explaining away. There is therefore no need to perform any iteration to determine the activities of the hidden units. The hidden states, Sj , are conditionally independent given the visible states, Si, and the distribution of Sj is given by the standard logistic function :