An information-theoretic optimization principle ('infomax') has previously been used for unsupervised learning of statistical reg(cid:173) ularities in an input ensemble. The principle states that the input(cid:173) output mapping implemented by a processing stage should be cho(cid:173) sen so as to maximize the average mutual information between input and output patterns, subject to constraints and in the pres(cid:173) ence of processing noise. In the present work I show how infomax, when applied to a class of nonlinear input-output mappings, can under certain conditions generate optimal filters that have addi(cid:173) tional useful properties: (1) Output activity (for each input pat(cid:173) tern) tends to be concentrated among a relatively small number (2) The filters are sensitive to higher-order statistical of nodes. structure (beyond pairwise correlations). If the input features are localized, the filters' receptive fields tend to be localized as well. (3) Multiresolution sets of filters with subsampling at low spatial frequencies - related to pyramid coding and wavelet representations - emerge as favored solutions for certain types of input ensembles.