{"title": "Deciphering subsampled data: adaptive compressive sampling as a principle of brain communication", "book": "Advances in Neural Information Processing Systems", "page_first": 910, "page_last": 918, "abstract": "A new algorithm is proposed for a) unsupervised learning of sparse representations from subsampled measurements and b) estimating the parameters required for linearly reconstructing signals from the sparse codes. We verify that the new algorithm performs efficient data compression on par with the recent method of compressive sampling. Further, we demonstrate that the algorithm performs robustly when stacked in several stages or when applied in undercomplete or overcomplete situations. The new algorithm can explain how neural populations in the brain that receive subsampled input through fiber bottlenecks are able to form coherent response properties.", "full_text": "Deciphering subsampled data: adaptive compressive\n\nsampling as a principle of brain communication\n\nRedwood Center for Theoretical Neuroscience\n\nMathematical Sciences Research Institute\n\nChristopher J. Hillar\n\nchillar@msri.org\n\nGuy Isely\n\nUniversity of California, Berkeley\n\nguyi@berkeley.edu\n\nFriedrich T. Sommer\n\nUniversity of California, Berkeley\n\nfsommer@berkeley.edu\n\nAbstract\n\nA new algorithm is proposed for a) unsupervised learning of sparse representa-\ntions from subsampled measurements and b) estimating the parameters required\nfor linearly reconstructing signals from the sparse codes. We verify that the new\nalgorithm performs ef\ufb01cient data compression on par with the recent method of\ncompressive sampling. Further, we demonstrate that the algorithm performs ro-\nbustly when stacked in several stages or when applied in undercomplete or over-\ncomplete situations. The new algorithm can explain how neural populations in\nthe brain that receive subsampled input through \ufb01ber bottlenecks are able to form\ncoherent response properties.\n\n1\n\nIntroduction\n\nIn the nervous system, sensory and motor information, as well as internal brain states, are repre-\nsented by action potentials in populations of neurons. Most localized structures, such as sensory\norgans, subcortical nuclei and cortical regions, are functionally specialized and need to communi-\ncate through \ufb01ber projections to produce coherent brain function [14]. Computational studies of the\nbrain usually investigate particular functionally and spatially de\ufb01ned brain structures. Our scope\nhere is different as we are not concerned with any particular brain region or function. Rather, we\nstudy the following fundamental communication problem: How can a localized neural population\ninterpret a signal sent to its synaptic inputs without knowledge of how the signal was sampled or\nwhat it represents? We consider the generic case that information is encoded in the activity of a local\npopulation (e.g. neurons of a sensory organ or a peripheral sensory area) and then communicated to\nthe target region through an axonal \ufb01ber projection. Any solution of this communication problem is\nconstrained by the following known properties of axonal \ufb01ber projections:\nExact point-to-point connectivity genetically unde\ufb01ned: During development, genetically in-\nformed chemical gradients coarsely guide the growth of \ufb01ber projections but are unlikely to specify\nthe precise synaptic patterns to target neurons [17]. Thus, learning mechanisms and synaptic plas-\nticity seem necessary to form the precise wiring patterns from projection \ufb01bers to target neurons.\nFiber projections constitute wiring bottlenecks: The number of axons connecting a pair of regions\nis often signi\ufb01cantly smaller than the number of neurons encoding the representation within each\nregion [10]. Thus, communication across \ufb01ber projections seems to rely on a form of compression.\n\n1\n\n\fSizes of origin and target regions may differ: In general, the sizes of the region sending the \ufb01bers\nand the region targeted by them will be different. Thus, communication across \ufb01ber projections will\noften involve a form of recoding.\nWe present a new algorithm for establishing and maintaining communication that satis\ufb01es all three\nconstraints above. To model imprecise wiring, we assume that connections between regions are\ncon\ufb01gured randomly and that the wiring scheme is unknown to the target region. To account for\nthe bottleneck, we assume these connections contain only subsampled portions of the information\nemanating from the sender region; i.e., learning in the target region is based on subsampled data and\nnot the original.\nOur work suggests that axon \ufb01ber projections can establish interfaces with other regions according\nto the following simple strategy: Connect to distant regions randomly, roughly guided by chemical\ngradients, then use local unsupervised learning at the target location to form meaningful repre-\nsentations of the input data. Our results can explain experiments in which retinal projections were\nredirected neonatally to the auditory thalamus and the rerouting produced visually responsive cells in\nauditory thalamus and cortex, with properties that are typical of cells in visual cortex [12]. Further,\nour model makes predictions about the sparsity of neural representations. Speci\ufb01cally, we predict\nthat neuronal \ufb01ring is sparser in locally projecting neurons (upper cortical layers) and less sparse in\nneurons with nonlocal axonal \ufb01ber projections. In addition to the neurobiological impact, we also\naddress potential technical applications of the new algorithm and relations to other methods in the\nliterature.\n\n2 Background\n\nSparse signals: It has been shown that many natural signals falling onto sensor organs have a higher-\norder structure that can be well-captured by sparse representations in an adequate basis; see [9, 6]\nfor visual input and [1, 11] for auditory. The following de\ufb01nitions are pivotal to this work.\nDe\ufb01nition 1: An ensemble of signals X within Rn has sparse underlying structure if there is a\ndictionary \u2126 \u2208 Rn\u00d7p so that any point x \u2208 Rn drawn from X can be expressed as x = \u2126v for a\nsparse vector v \u2208 Rp.\nDe\ufb01nition 2: An ensemble of sparse vectors V within Rp is a sparse representation of a signal\nensemble X in Rn if there exists a dictionary \u2126 \u2208 Rn\u00d7p such that the random variable X satis\ufb01es\nX = \u2126V .\nFor theoretical reasons, we consider ensembles of random vectors (i.e. random variables) which\narise from an underlying probability distribution on some measure space, although for real data sets\n(e.g. natural image patches) we cannot guarantee this to be the case. Nonetheless, the theoretical\nconsequences of this assumption (e.g. Theorem 4.2) appear to match what happens in practice for\nreal data (\ufb01gures 2-4).\nCompressive sampling with a \ufb01xed basis: Compressive sampling (CS) [2] is a recent method for\nrepresenting data with sparse structure using fewer samples than required by the Nyquist-Shannon\ntheorem. In one formulation [15], a signal x \u2208 Rn is assumed to be k-sparse in an n \u00d7 p dictionary\nmatrix \u03a8; that is, x = \u03a8a for some vector a \u2208 Rp with at most k nonzero entries. Next, x is\nsubsampled using an m \u00d7 n incoherent matrix \u03a6 to give noisy measurements y = \u03a6x + w with\nm (cid:28) n and independent noise w \u223c N (0, \u03c32Im\u00d7m). To recover the original signal, the following\nconvex optimization problem (called Lasso in the literature) is solved:\n2 + \u03bb|b|1\n\n(cid:98)b(y) := arg min\n\n||y \u2212 \u03a6\u03a8b||2\n\n(cid:26) 1\n\n(cid:27)\n\n(1)\n\n,\n\na\n\n2n\n\nand then(cid:98)x := \u03a8(cid:98)b is set to be the approximate recovery of x. Remarkably, as can be shown using\n[15, Theorem 1], the preceding algorithm determines a unique(cid:98)b and is guaranteed to be exact within\nhypotheses, \u03bb = \u0398(\u03c3(cid:112)(log p)/m), and the sparsity is on the order k = O(m/ log p).\n\n(2)\nwith high probability (exponential in m/k) as long as the matrix \u03a6\u03a8 satis\ufb01es mild incoherence\n\n||x \u2212(cid:98)x||2 = O(\u03c3)\n\nthe noise range:\n\n2\n\n\fTypically, the matrix \u03a8 is p \u00d7 p orthogonal, and the incoherence conditions reduce to deterministic\nconstraints on \u03a6 only. Although in general it is very dif\ufb01cult to decide whether a given \u03a6 satis\ufb01es\nthese conditions, it is known that many random ensembles, such as i.i.d. \u03a6ij \u223c N (0, 1/m), satisfy\nthem with high probability. In particular, compression ratios on the order (k log p)/p are achievable\nfor k-sparse signals using a random \u03a6 chosen this way.\nDictionary learning by sparse coding: For some natural signals there are well-known bases (e.g.\nGabor wavelets, the DCT) in which those signals are sparse or nearly sparse. However, an arbitrary\nclass of signals can be sparse in unknown bases, some of which give better encodings than others.\nIt is compelling to learn a sparse dictionary for a class of signals instead of specifying one in ad-\nvance. Sparse coding methods [6] learn dictionaries by minimizing the empirical mean of an energy\nfunction that combines the (cid:96)2 reconstruction error with a sparseness penalty on the encoding:\n\n(3)\nA common choice for the sparsity penalty S(a) that works well in practice is the (cid:96)1 penalty\n\nS(a) = |a|1. Fixing \u03a8 and x and minimizing (3) with respect to a produces a vector(cid:98)a(x) that\n\n2 + \u03bbS(a).\n\nE(x, a, \u03a8) = ||x \u2212 \u03a8a||2\n\napproximates a sparse encoding for x.1 For a \ufb01xed set of signals x and encodings a, minimizing\nthe mean value of (3) with respect to \u03a8 and renormalizing columns produces an improved sparse\ndictionary. Alternating optimization steps of this form, one can learn a dictionary that is tuned to the\nstatistics of the class of signals studied. Sparse coding on natural stimuli has been shown to learn\nbasis vectors that resemble the receptive \ufb01elds of neurons in early sensory areas [6, 7, 8]. Notice that\n\nonce an (incoherent) sparsity-inducing dictionary \u03a8 is learned, inferring sparse vectors(cid:98)a(x) from\n\nsignals x is an instance of the Lasso convex optimization problem.\nBlind Compressed Sensing: With access to an uncompressed class of sparse signals, dictionary\nlearning can \ufb01nd a sparsity-inducing basis which can then be used for compressive sampling. But\nwhat if the uncompressed signal is unavailable? Recently, this question was investigated in [4] using\nthe following problem statement.\nBlind compressed sensing (BCS): Given a measurement matrix \u03a6 and measurements {y1, . . . , yN}\nof signals {x1, . . . , xN} drawn from an ensemble X, \ufb01nd a dictionary \u03a8 and k-sparse vectors\n{b1, . . . , bN} such that xi = \u03a8bi for each i = 1, . . . , N.\nIt turns out that the BCS problem is ill-posed in the general case [4]. The dif\ufb01culty is that though\nit is possible to learn a sparsity-inducing dictionary \u0398 for the measurements Y , there are many\ndecompositions of this dictionary into \u03a6 and a matrix \u03a8 since \u03a6 has a nullspace. Thus, without\nadditional assumptions, one cannot uniquely recover a dictionary \u03a8 that can reconstruct x as \u03a8b.\n\n3 Adaptive Compressive Sampling\n\nIt is tantalizing to hypothesize that a neural population in the brain could combine the principles\nof compressive sampling and dictionary learning to form sparse representations of inputs arriving\nthrough long-range \ufb01ber projections. Note that information processing in the brain should rely on\nfaithful representations of the original signals but does not require a solution of the ill-posed BCS\nproblem which involves the full reconstruction of the original signals. Thus, the generic challenge\na neural population embedded in the brain might have to solve can be captured by the following\nproblem.\nAdaptive compressive sampling (ACS): Given measurements Y = \u03a6X generated from an unknown\n\u03a6 and unknown signal ensemble X with sparse underlying structure, \ufb01nd signals B(Y ) which are\nsparse representations of X.\nNote the two key differences between the ACS and the BCS problem. First, the ACS problem\nasks only for sparse representations b of the data, not full reconstruction. Second, the compression\nmatrix \u03a6 is unknown in the ACS problem but is known in the BCS problem. Since it is unrealistic\nto assume that a brain region could have knowledge of how an efferent \ufb01ber bundle subsamples the\nbrain region it originates from, the second difference is also crucial. We propose a relatively simple\nalgorithm for potentially solving the ACS problem: use sparse coding for dictionary learning in the\n\n1As a convention in this paper, a vs. b denotes a sparse representation inferred from full vs. compressed\n\nsignals.\n\n3\n\n\fFigure 1: ACS schematic. A sig-\nnal x with sparse structure in dictio-\nnary \u03a8 is sampled by a compressing\nmeasurement matrix \u03a6, constituting\na transmission bottleneck. The ACS\ncoding circuit learns a dictionary \u0398\nfor y in the compressed space, but\ncan be seen to form sparse represen-\ntations b of the original data x as\nwitnessed by the matrix RM in (6).\n\ncompressed space. The proposed ACS objective function is de\ufb01ned as:\n\nE(y, b, \u0398) = ||y \u2212 \u0398b||2\n\n2 + \u03bbS(b).\n\n(4)\nIterated minimization of the empirical mean of this function \ufb01rst with respect to b and then with\nrespect to \u0398 will produce a sparsity dictionary \u0398 for the compressed space and sparse representations\n\n(cid:98)b(y) of the y. Our results verify theoretically and experimentally that once the dictionary matrix \u0398\n\nhas converged, the objective (4) can be used to infer sparse representations of the original signals x\nfrom the compressed data y. As has been shown in the BCS work, one cannot uniquely determine\n\u03a8 with access only to the compressed signals y. But this does not imply that no such matrix exists.\nIn fact, given a separate set of uncompressed signals x(cid:48), we calculate a reconstruction matrix RM\n\ndemonstrating that the(cid:98)b are indeed sparse representations of the original x. Importantly, the x(cid:48) are\n\nnot used to solve the ACS problem, but rather to demonstrate that a solution was found.\nThe process for computing RM using the x(cid:48) is analogous to the process used by electrophysiologists\nto measure the receptive \ufb01elds of neurons. Electrophysiologists are interested in characterizing how\nneurons in a region respond to different stimuli. They use a simple approach to determine these\nstimulus-response properties: probe the neurons with an ensemble of stimuli and compute stimulus-\nresponse correlations. Typically it is assumed that a neural response b is a linear function of the\nstimulus x; that is, b = RF x for some receptive \ufb01eld matrix RF . One may then calculate an RF\nby minimizing the empirical mean of the prediction error: E(RF ) = (cid:107)b \u2212 RF x(cid:107)2\n2. As shown in\n[13], the closed-form solution to this minimization is RF = C\u22121\nss Csr, in which Css is the stimulus\nautocorrelation matrix (cid:104)xx(cid:62)(cid:105)X, and Csr is the stimulus-response cross-correlation matrix (cid:104)xb(cid:62)(cid:105)X.\nIn contrast to the assumption of a linear response typically made in electrophysiology, here we\nassume a linear generative model: x = \u03a8a. Thus, instead of minimizing the prediction error, we\nask for the reconstruction matrix RM that minimizes the empirical mean of the reconstruction error:\n(5)\n\nE(RM) = (cid:107)x \u2212 RMb(cid:107)2\n2.\n\nIn this case, the closed form solution of this minimization is given by\n\nRM = CsrC\u22121\nrr ,\n\n(6)\nin which Csr is the stimulus-response cross-correlation matrix as before and Crr is the response\n\nautocorrelation matrix (cid:104)(cid:98)b(y(x))(cid:98)b(y(x))(cid:62)(cid:105)X. As we show below, calculating (6) from a set of\nuncompressed signals x(cid:48) yields an RM that reconstructs the original signal x from(cid:98)b as x = RM(cid:98)b.\nThus, we can conclude that encodings(cid:98)b computed by ACS are sparse representations of the original\n\nsignals.\n\n4 Theoretical Results\n\nThe following hold for ACS under mild hypotheses (we postpone details for a future work).\n\nTheorem 4.1 Suppose that an ensemble of signals is compressed with a random projection \u03a6. If\nACS converges on a sparsity-inducing dictionary \u0398 and Crr is invertible, then \u0398 = \u03a6 \u00b7 RM.\nTheorem 4.2 Suppose that an ensemble of signals has a sparse representation with dictionary \u03a8.\nIf ACS converges on a sparsity-inducing dictionary, then the outputs of ACS are a sparse represen-\ntation for the original signals in the dictionary of the reconstruction matrix RM given by (6). More-\nover, there exists a diagonal matrix D and a partial permutation matrix P such that \u03a8 = RM \u00b7 DP .\n\n4\n\nyb \u0398\u03a6xa \u03a8RM \f(a)\n\n(b)\n\n(c)\n\nFigure 2: Subsets of the reconstruction matrices RM for the ACS networks trained on synthetic\nsparse data generated using bases (a) standard 2D, (b) 2D DCT, (c) learned by sparse coding on\nnatural images. The components of RM in (a) and (b) are arranged by spatial location and spatial\nfrequency respectively to help with visual interpretation.\n\n5 Experimental results\n\nTo demonstrate that the ACS algorithm solves the ACS problem in practice, we train ACS networks\non synthetic and natural image patches. We use 16 \u00d7 16 image patches which are compressed by\nan i.i.d. gaussian measurement matrix before ACS sees them. Unless otherwise stated we use a\ncompression factor of 2; that is, the 256 dimensional patches were captured by 128 measurements\nsent to the ACS circuit (current experiments are successful with a compression factor of 10). The\nfeature sign algorithm developed in [5] is used for inference of b in (4). After the inference step,\n\u0398 is updated using gradient decent in (4). The matrix \u0398 is initialized randomly and renormalized\nto have unit length columns after each learning step. Learning is performed until the ACS circuit\nconverges on a sparsity basis for the compressed space.\nTo assess whether the sparse representations formed by the ACS circuit are representations of the\noriginal data, we estimate a reconstruction matrix RM as in (6) by correlating a set of 10,000\nuncompressed image patches with their encodings b in the ACS circuit. Using RM and the ACS\ncircuit, we reconstruct original data from compressed data. Reconstruction performance is evaluated\non a test set of 1000 image patches by computing the signal-to-noise ratio of the reconstructed\n. For comparison, we also performed CS using the\n\nsignals (cid:98)x: SN R = 10 log10\nfeature sign algorithm to solve (1) using a \ufb01xed sparsity basis \u03a8 and reconstruction given by(cid:98)x =\n\u03a8(cid:98)b.\n\n(cid:19)\n\n(cid:18) (cid:104)||x||2\n(cid:104)||x\u2212(cid:98)x||2\n\n2(cid:105)X\n2(cid:105)X\n\nSynthetic Data: To assess ACS performance on data of known sparsity we \ufb01rst generate synthetic\nimage patches with sparse underlying structure in known bases. We test with three different bases:\nthe standard 2D basis (i.e. single pixel images), the 2D DCT basis, and a Gabor-like basis learned\nby sparse coding on natural images. We generate random sparse binary vectors with k = 8, multiply\nthese vectors by the chosen basis to get images, and then compress these images to half their original\nlengths to get training data. For each type of synthetic data, a separate ACS network is trained with\n\u03bb = .1 and reconstruction matrix RM is computed. The RM corresponding to each generating basis\ntype is shown in Figure 2(a)-(c). We can see that RM closely resembles a permutation of generating\nbasis as predicted by Theorem 4.2. The mean SNR of the reconstructed signals in each case is 34.05\ndB, 47.05 dB, and 36.38 dB respectively. Further, most ACS encodings are exact in the sense that\nthey exactly recovered the components used to synthesize the original image. Speci\ufb01cally, for the\nDCT basis 95.4% of ACS codes have the same eight active basis vectors as were used to generate\nthe original image patch. Thresholding to remove small coef\ufb01cients (coring) makes it 100%.\nTo explore how ACS performs in cases where the signals cannot be modeled exactly with sparse\nrepresentations, we generate sparse synthetic data (k = 8) with the 2D DCT basis and add gaussian\nnoise. Figure 3(a) compares reconstruction \ufb01delity of ACS and CS for increasing levels of noise.\n\n5\n\n\f(a)\n\n(c)\n\n(b)\n\n(d)\n\nFigure 3: Mean SNR of reconstructions. (a) compares ACS performance to CS performance with\ntrue generating basis (DCT) for synthetic images with increasing amounts of gaussian noise. (b) and\n(c) compare the performances of ACS, CS with a basis learned by sparse coding on natural images\nand CS with the DCT basis. Performances plotted against the compression factor (b) and the value\nof \u03bb used for encoding. (d) shows ACS performance on natural images vs. the completeness factor.\n\n(a)\n\n(b)\n\nFigure 4: (a) RM for an ACS network trained on natural images with compression factor of 2, (b)\nACS reconstruction of a 128 \u00d7 128 image using increasing compression factors. Clockwise from\nthe top left: the original image, ACS with compression factors of 2, 4, and 8.\n\nFor pure sparse data (noise \u03c32 = 0) CS outperforms ACS signi\ufb01cantly. Without noise, CS is limited\nby machine precision and reaches a mean SNR which is off the chart at 308.22 dB whereas ACS\nis limited by inaccuracies in the learning process as well as inaccuracies in computing RM. For a\nlarge range of noise levels CS and ACS performance become nearly identical. For very high levels\nof noise CS and ACS performances begin to diverge as the advantage of knowing the true sparsity\nbasis becomes apparent again.\nNatural Images: Natural image patches have sparse underlying structure in the sense that they can\nbe well approximated by sparse linear combinations of \ufb01xed bases, but they cannot be exactly recon-\nstructed at a level of sparsity required by the theorems of CS and ACS. Thus, CS and ACS cannot be\n\n6\n\n\fexpected to produce exact reconstructions of natural image patches. To explore the performance of\nACS on natural images we train ACS models on compressed image patches from whitened natural\nimages. The RM matrix for an ACS network using the default compression factor of 2 is shown in\nFigure 4(a).\nNext we explore how the \ufb01delity of ACS reconstructions varies with the compression factor. Figure\n4(b) shows an entire image portion reconstructed patch-wise by ACS for increasing compression\nfactors. Figure 3(b) compares the SNR of these reconstructions to CS reconstructions. Since there\nis no true sparsity basis for natural images, we perform CS either with a dictionary learned from\nuncompressed natural images using sparse coding or with the 2D DCT. Both the ACS sparsity basis\nand sparse coding basis used with CS are learned with \u03bb \ufb01xed at .1 in eq. (3). 3(b) demonstrates\nthat CS performs much better with the learned dictionary than with the standard 2D DCT. Further,\nthe plot shows that ACS reconstructions produces slightly higher \ufb01delity reconstructions than CS.\nHowever, the comparison between CS and ACS might be confounded by the sensitivity of these\nalgorithms to the value of \u03bb used during encoding.\nIn the context of CS, there is a sweet spot for the sparsity of representations. More sparse encodings\nhave a better chance of being accurately recovered from the measurements because they obey con-\nditions of the CS theorems better. At the same time, these are less likely to be accurate encodings\nof the original signal since they are limited to fewer of the basis vectors for their reconstructions.\nAs a result, reconstruction \ufb01delity as a function of \u03bb has a maximum at the sweet spot of sparsity\nfor CS (decreasing the value of \u03bb leads to sparser representations). Values of \u03bb below this point\nproduce representations that are not sparse enough to be accurately recovered from the compressed\nmeasurements, while values of \u03bb above it produce representations that are too sparse to accurately\nmodel the original signal even if they could be accurately recovered.\nTo explore how the performance of CS and ACS depends on the sparseness of their representations,\nwe vary the value of \u03bb used while encoding. Figure 3(c) compares ACS, CS with a sparse coding\nbasis, and CS with the 2D DCT basis. Once again we see that ACS performs slightly better than\nCS with a learned dictionary, and much better than CS with the DCT basis. However, the shape\nof the curves with respect to the choice of \u03bb while encoding suggests that our choice of value for\n\u03bb while learning (.1 for both ACS and the sparse coding basis used with CS) may be suboptimal.\nAdditionally, the optimal value of \u03bb for CS may differ from the optimal value of \u03bb for ACS. For\nthese reasons, it is unclear if ACS exceeds the SNR performance of CS with dictionary learning\nwhen in the optimal regime for both approaches. Most likely, as 3(b) suggests, their performances\nare not signi\ufb01cantly different. However, one reason ACS might perform better is that learning a\nsparsity basis in compressed space tunes the sparsity basis with respect to the measurement matrix\nwhereas performing dictionary learning for CS estimates the sparsity basis independently of the\nmeasurement matrix. Additionally, having its sparsity basis in the compressed space means that\nACS is more ef\ufb01cient in terms of runtime than dictionary learning for CS because the lengths of\nbasis vectors are reduced by the compression factor.\nACS in brain communication: When considering ACS as a model of communication in the brain,\none important question is whether it works when the representational dimensions vary from region\nto region. Typically in CS, the number of basis functions is chosen to equal the dimension of the\noriginal space. To demonstrate how ACS could model the communication between regions with\ndifferent representation dimensions, we train ACS networks whose encoding dimensions are larger\nor smaller than the dimension of the original space (overcomplete or undercomplete). As shown in\n\ufb01gure 3(d), the reconstruction \ufb01delity decreases in the undercomplete case because representations\nin that space either have fewer total active coding vectors or are signi\ufb01cantly less sparse. Interest-\ningly, the reconstruction \ufb01delity increases in the overcomplete case. We suspect that this gain from\novercompleteness also applies in standard CS with an overcomplete dictionary, but this has not been\ntested so far.\n\nFigure 5: A subset of RM from each stage of our multistage ACS model.\n\n7\n\n\fAnother issue to consider for ACS as a model of communication in the brain is whether signal \ufb01delity\nis preserved through repeated communications. To investigate this question we simulated multiple\nstages of communication using ACS. In our model the input of compressed natural image patches is\nencoded as a sparse representation in the \ufb01rst region, transmitted as a compressed signal to a second\nregion where it is encoded sparsely, and compressively transmitted once again to a third region that\nperforms the \ufb01nal encoding. Obviously, this is a vacuous model of neural computation since there is\nlittle use in simply retransmitting the same signal. A meaningful model of cortical processing would\ninvolve additional local computations on the sparse representations before retransmission. However,\nthis basic model can help us explore the effects of repeated communication by ACS. Using samples\nfrom the uncompressed space, we compute RM for each stage just as for a single stage model.\nFigure 5 shows subsets of the components of RM for each stage. Notice that meaningful gabor-like\nstructure is preserved between stages.\n\n6 Discussion\n\nIn this paper, we propose ACS, a new algorithm for learning meaningful sparse representations\nof compressively sampled signals without access to the full signals. Two crucial differences set\nACS apart from traditional CS. First, the ACS coding circuit is formed by unsupervised learning\non subsampled signals and does not require knowledge of the sparsity basis of the signals nor of\nthe measurement matrix used for subsampling. Second, the information in the fully trained ACS\ncoding circuit is insuf\ufb01cient to reconstruct the original signals. To assess the usefulness of the rep-\nresentations formed by ACS, we developed a second estimation procedure that probes the trained\nACS coding circuit with the full signals and correlates signal with encoding. Similarly to the elec-\ntrophysiological approach of computing receptive \ufb01elds, we computed a reconstruction matrix RM.\nTheorem 4.2 proves that after convergence, ACS produces representations of the full data and that\nthe estimation procedure \ufb01nds a reconstruction matrix which can reproduce the full data. Further,\nour simulation experiments revealed that the RM matrix contained smooth receptive \ufb01elds resem-\nbling oriented simple cells (Figures 2 and 4), suggesting that the ACS learning scheme can explain\nthe formation of receptive \ufb01elds even when the input to the cell population is undersampled (and\nthus conventional sparse coding would falter). In addition, the combination of ACS circuit and RM\nmatrix can be used in practice for data compression and be directly compared with traditional CS.\nInterestingly, ACS is fully on par with CS in terms of reconstruction quality (Figure 3). At the same\ntime it is both \ufb02exible and stackable, and it works in overcomplete and undercomplete cases.\nThe recent work on BCS [4] addressed a similar problem where the sparsity basis of compressed\nsamples is unknown. A main difference between BCS and ACS is that BCS aims for full reconstruc-\ntion of the original signals from compressed signals whereas ACS does not. As a consequence, BCS\nis generally ill-posed [4], whereas ACS permits a solution, as we have shown. We have argued that\nfull data reconstruction is not a prerequisite for communication between brain regions. However,\nnote that ACS can be made a full reconstruction algorithm if there is limited access to uncompressed\nsignal. Thus, neither ACS nor practical applications of BCS are fully blind learning algorithms, as\nboth rely on further constraints [4] inferred from the original data. An alternative to ACS / BCS for\nintroducing learning in CS was to adapt the measurement matrix to data [3, 16].\nThe engineering implications of ACS merit further exploration. In particular, our compression re-\nsults with overcomplete ACS indicate that the reconstruction quality was signi\ufb01cantly higher than\nwith standard CS. Additionally, the unsupervised learning with ACS may have advantages in situa-\ntions where access to uncompressed signals is limited or very expensive to acquire. With ACS it is\npossible to do the heavy work of learning a good sparsity basis entirely in the compressed space and\nonly a small number of samples from the uncompressed space are required to reconstruct with RM.\nPerhaps the most intriguing implications of our work concern neurobiology. Our results clearly\ndemonstrate that meaningful sparse representations can be learned on the far end of wiring bottle-\nnecks, fully unsupervised, and without any knowledge of the subsampling scheme. In addition, ACS\nwith overcomplete or undercomplete codes suggests how sparse representations can be communi-\ncated between neural populations of different sizes. From our study, we predict that \ufb01ring patterns\nof neurons sending long-range axons might be less sparse than those involved in local connectivity,\na hypothesis that could be experimentally veri\ufb01ed. It is intriguing to think that the elegance and\nsimplicity of compressive sampling and sparse coding could be exploited by the brain.\n\n8\n\n\fReferences\n[1] A. Bell and T. Sejnowski. Learning the higher-order structure of a natural sound. Network:\n\nComputation in Neural Systems, 7(2):261\u2013266, 1996.\n\n[2] E.J. Cand`es. Compressive sampling. In Proceedings of the International Congress of Mathe-\n\nmaticians, volume 3, pages 1433\u20131452. Citeseer, 2006.\n\n[3] M. Elad. Optimized projections for compressed sensing. IEEE Transactions on Signal Pro-\n\ncessing, 55(12):5695\u20135702, 2007.\n\n[4] S. Gleichman and Y.C. Eldar. Blind Compressed Sensing. preprint, 2010.\n[5] H. Lee, A. Battle, R. Raina, and A.Y. Ng. Ef\ufb01cient sparse coding algorithms. Advances in\n\nneural information processing systems, 19:801, 2007.\n\n[6] B.A. Olshausen and D.J. Field. Emergence of simple-cell receptive \ufb01eld properties by learning\n\na sparse code for natural images. Nature, 381(6583):607\u2013609, 1996.\n\n[7] M. Rehn and F.T. Sommer. A network that uses few active neurones to code visual input\npredicts the diverse shapes of cortical receptive \ufb01elds. Journal of Computational Neuroscience,\n22(2):135\u2013146, 2007.\n\n[8] C.J. Rozell, D.H. Johnson, R.G. Baraniuk, and B.A. Olshausen. Sparse coding via thresholding\n\nand local competition in neural circuits. Neural computation, 20(10):2526\u20132563, 2008.\n\n[9] D.L. Ruderman and W. Bialek. Statistics of natural images: Scaling in the woods. Physical\n\nReview Letters, 73(6):814\u2013817, 1994.\n\n[10] A. Sch\u00a8uz, D. Chaimow, D. Liewald, and M. Dortenman. Quantitative aspects of corticocortical\n\nconnections: a tracer study in the mouse. Cerebral Cortex, 16(10):1474, 2006.\n\n[11] E.C. Smith and M.S. Lewicki. Ef\ufb01cient auditory coding. Nature, 439(7079):978\u2013982, 2006.\n[12] M. Sur, P.E. Garraghty, and A.W. Roe. Experimentally induced visual projections into auditory\n\nthalamus and cortex. Science(Washington), 242(4884):1437\u20131437, 1988.\n\n[13] F.E. Theunissen, S.V. David, N.C. Singh, A. Hsu, W.E. Vinje, and J.L. Gallant. Estimating\nspatio-temporal receptive \ufb01elds of auditory and visual neurons from their responses to natural\nstimuli. Network: Computation in Neural Systems, 12(3):289\u2013316, 2001.\n\n[14] D.C. Van Essen, C.H. Anderson, and D.J. Felleman. Information processing in the primate\n\nvisual system: an integrated systems perspective. Science, 255(5043):419\u2013423, 1992.\n\n[15] M.J. Wainwright. Sharp thresholds for high-dimensional and noisy sparsity recovery using\nIEEE Trans. Information Theory, pages\n\nell1-constrained quadratic programming (Lasso).\n2183\u20132202, 2009.\n\n[16] Y. Weiss, H. Chang, and W. Freeman. Learning compressed sensing. In Snowbird Learning\n\nWorkshop, Allerton, CA. Citeseer, 2007.\n\n[17] R.J. Wyman and J.B. Thomas. What genes are necessary to make an identi\ufb01ed synapse? In\nCold Spring Harbor Symposia on Quantitative Biology, volume 48, page 641. Cold Spring\nHarbor Laboratory Press, 1983.\n\n9\n\n\f", "award": [], "sourceid": 1099, "authors": [{"given_name": "Guy", "family_name": "Isely", "institution": null}, {"given_name": "Christopher", "family_name": "Hillar", "institution": null}, {"given_name": "Fritz", "family_name": "Sommer", "institution": null}]}