Part of Advances in Neural Information Processing Systems 12 (NIPS 1999)
Brendan J. Frey
Ever since Pearl's probability propagation algorithm in graphs with cycles was shown to produce excellent results for error-correcting decoding a few years ago, we have been curious about whether local probability propagation could be used successfully for ma(cid:173) chine learning. One of the simplest adaptive models is the factor analyzer, which is a two-layer network that models bottom layer sensory inputs as a linear combination of top layer factors plus in(cid:173) dependent Gaussian sensor noise. We show that local probability propagation in the factor analyzer network usually takes just a few iterations to perform accurate inference, even in networks with 320 sensors and 80 factors. We derive an expression for the algorithm's fixed point and show that this fixed point matches the exact solu(cid:173) tion in a variety of networks, even when the fixed point is unstable. We also show that this method can be used successfully to perform inference for approximate EM and we give results on an online face recognition task. 1 Factor analysis A simple way to encode input patterns is to suppose that each input can be well(cid:173) approximated by a linear combination of component vectors, where the amplitudes of the vectors are modulated to match the input. For a given training set, the most appropriate set of component vectors will depend on how we expect the modula(cid:173) tion levels to behave and how we measure the distance between the input and its approximation. These effects can be captured by a generative probabilit~ model that specifies a distribution p(z) over modulation levels z = (Zl, ... ,ZK) and a distribution p(xlz) over sensors x = (Xl, ... ,XN)T given the modulation levels. Principal component analysis, independent component analysis and factor analysis can be viewed as maximum likelihood learning in a model of this type, where we as(cid:173) sume that over the training set, the appropriate modulation levels are independent and the overall distortion is given by the sum of the individual sensor distortions.
In factor analysis, the modulation levels are called factors and the distributions have the following form:
p(Zk) = N(Zk; 0,1),
p(z) = nf=lP(Zk) = N(z; 0, I),
p(xnl z) = N(xn; E~=l AnkZk, 'l/Jn),
(1) The parameters of this model are the factor loading matrix A, with elements Ank, and the diagonal sensor noise covariance matrix 'It, with diagonal elements 'l/Jn. A belief network for the factor analyzer is shown in Fig. 1a. The likelihood is
p(xlz) = n:=IP(xnlz) = N(x; Az, 'It).
p(x) = 1 N(z; 0, I)N(x; Az, 'It)dz = N(x; 0, AA T + 'It),