Part of Advances in Neural Information Processing Systems 10 (NIPS 1997)
Michael Lewicki, Terrence J. Sejnowski
We derive a learning algorithm for inferring an overcomplete basis by viewing it as probabilistic model of the observed data. Over(cid:173) complete bases allow for better approximation of the underlying statistical density. Using a Laplacian prior on the basis coefficients removes redundancy and leads to representations that are sparse and are a nonlinear function of the data. This can be viewed as a generalization of the technique of independent component anal(cid:173) ysis and provides a method for blind source separation of fewer mixtures than sources. We demonstrate the utility of overcom(cid:173) plete representations on natural speech and show that compared to the traditional Fourier basis the inferred representations poten(cid:173) tially have much greater coding efficiency.
A traditional way to represent real-values signals is with Fourier or wavelet bases. A disadvantage of these bases, however, is that they are not specialized for any particular dataset. Principal component analysis (PCA) provides one means for finding an basis that is adapted for a dataset, but the basis vectors are restricted to be orthogonal. An extension of PCA called independent component analysis (Jutten and Herault, 1991; Comon et al., 1991; Bell and Sejnowski, 1995) allows the learning of non-orthogonal bases. All of these bases are complete in the sense that they span the input space, but they are limited in terms of how well they can approximate the dataset's statistical density.
Representations that are overcomplete, i. e. more basis vectors than input variables, can provide a better representation, because the basis vectors can be specialized for
Learning Nonlinear Overcomplete Representations for Efficient Coding
557
a larger variety of features present in the entire ensemble of data. A criticism of overcomplete representations is that they are redundant, i.e. a given data point may have many possible representations, but this redundancy is removed by the prior probability of the basis coefficients which specifies the probability of the alternative representations.
Most of the overcomplete bases used in the literature are fixed in the sense that they are not adapted to the structure in the data. Recently Olshausen and Field (1996) presented an algorithm that allows an overcomplete basis to be learned. This algorithm relied on an approximation to the desired probabilistic objective that had several drawbacks, including tendency to breakdown in the case of low noise levels and when learning bases with higher degrees of overcompleteness. In this paper, we present an improved approximation to the desired probabilistic objective and show that this leads to a simple and robust algorithm for learning optimal overcomplete bases.
1
Inferring the representation
The data, X 1 :L ' are modeled with an overcomplete linear basis plus additive noise: