Part of Advances in Neural Information Processing Systems 10 (NIPS 1997)
Raphaël Feraud, Olivier Bernier
A new learning model based on autoassociative neural networks is developped and applied to face detection. To extend the de(cid:173) tection ability in orientation and to decrease the number of false alarms, different combinations of networks are tested: ensemble, conditional ensemble and conditional mixture of networks. The use of a conditional mixture of networks allows to obtain state of the art results on different benchmark face databases.
1 A constrained generative model
Our purpose is to classify an extracted window x from an image as a face (x E V) or non-face (x EN). The set of all possible windows is E = V uN, with V n N = 0. Since collecting a representative set of non-face examples is impossible, face detection by a statistical model is a difficult task. An autoassociative network, using five layers of neurons, is able to perform a non-linear dimensionnality reduction [Kramer, 1991]. However, its use as an estimator, to classify an extracted window as face or non-face, raises two problems:
solution: the principal components analysis.
Our approach is to use counter-examples in order to find a sub-manifold as close as possible to V and to constrain the algorithm to converge to a non-linear solution [Feraud, R. et al., 1997]. Each non-face example is constrained to be reconstructed as its projection on V. The projection P of a point x of the input space E on V, is defined by:
·email: feraud@lannion.cnet.fr t email: bernier@lannion.cnet.fr
Ensemble and Modular Approaches for Face Detection: A Comparison
473
• if x E V, then P{ x) = x, • if x rJ. V: P{x) = argminYEv{d(x, V)), where d is the Euclidian distance.
During the learning process, the projection P of x on V is approximated by: P(x) ,...., ~ 2:7=1 Vi, where VI, V2, •.. , Vn , are the n nearest neighbours, in the training set of faces, of v, the nearest face example of x.
The goal of the learning process is to approximate the distance V of an input space element x to the set of faces V:
• V{x, V) = Ilx - P(x)11 ,...., it (x - £)2, where M is the size of input image x
and £ the image reconstructed by the neural network,
• let x E £, then x E V if and only if V{x, V) S T, with T E IR, where T is a
threshold used to adjust the sensitivity of the model.