Part of Advances in Neural Information Processing Systems 15 (NIPS 2002)
Alistair Bray, Dominique Martinez
In Slow Feature Analysis (SFA [1]), it has been demonstrated that high-order invariant properties can be extracted by projecting in(cid:173) puts into a nonlinear space and computing the slowest changing features in this space; this has been proposed as a simple general model for learning nonlinear invariances in the visual system. How(cid:173) ever, this method is highly constrained by the curse of dimension(cid:173) ality which limits it to simple theoretical simulations. This paper demonstrates that by using a different but closely-related objective function for extracting slowly varying features ([2, 3]), and then ex(cid:173) ploiting the kernel trick, this curse can be avoided. Using this new method we show that both the complex cell properties of transla(cid:173) tion invariance and disparity coding can be learnt simultaneously from natural images when complex cells are driven by simple cells also learnt from the image.
The notion of maximising an objective function based upon the temporal pre(cid:173) dictability of output has been progressively applied in modelling the development of invariances in the visual system. F6ldiak used it indirectly via a Hebbian trace rule for modelling the development of translation invariance in complex cells [4] (closely related to many other models [5,6,7]); this rule has been used to maximise invariance as one component of a hierarchical system for object and face recognition [8]. On the other hand, similar functions have been maximised directly in networks for extracting linear [2] and nonlinear [9, 1] visual invariances. Direct maximisation of such functions have recently been used to model complex cells [10] and as an alternative to maximising sparseness/independence in modelling simple cells [11]. Slow Feature Analysis [1] combines many of the best properties of these methods to provide a good general nonlinear model. That is, it uses an objective function that minimises the first-order temporal derivative of the outputs; it provides a closed(cid:173) form solution which maximises this function by projecting inputs into a nonlinear
http://www.loria.fr/equipes/cortex/
space; it exploits sphering (or PCA-whitening) of the data to ensure that all outputs have unit variance and are uncorrelated. However, the method suffers from the curse of dimensionality in that the nonlinear feature space soon becomes very large as the input dimension grows, and yet this feature space must be represented explicitly in order for the essential sphering to occur.
The alternative that we propose here is to use the objective function of Stone [2, 9], that maximises output variance over a long period whilst minimising variance over a shorter period; in the linear case, this can be implemented by a biologically plausible mixture of Hebbian and anti-Hebbian learning on the same synapses [2]. In recent work, Stone has proposed a closed-form solution for maximising this function in the linear domain of blind source separation that does not involve data-sphering. This paper describes how this method can be kernelised. The use of the "kernel trick" allows projection of inputs into a nonlinear kernel induced feature space of very high (possibly infinite) dimension which is never explicitly represented or accessed. This leads to an efficient method that maps to an architecture that could be biologically implemented either by Sigma-Pi neurons, or fixed REF networks (as described for SFA [1]). We demonstrate that using this method to extract features that vary slowly in natural images leads to the development of both the complex-cell properties of translation invariance and disparity coding simultaneously.
1 Finding Slow Features with kernels
Given I time-series vectors X i
F _ V _ L.i Yi 2 L.i Yi