{"title": "Estimating image bases for visual image reconstruction from human brain activity", "book": "Advances in Neural Information Processing Systems", "page_first": 576, "page_last": 584, "abstract": "Image representation based on image bases provides a framework for understanding neural representation of visual perception. A recent fMRI study has shown that arbitrary contrast-defined visual images can be reconstructed from fMRI activity patterns using a combination of multi-scale local image bases. In the reconstruction model, the mapping from an fMRI activity pattern to the contrasts of the image bases was learned from measured fMRI responses to visual images. But the shapes of the images bases were fixed, and thus may not be optimal for reconstruction. Here, we propose a method to build a reconstruction model in which image bases are automatically extracted from the measured data. We constructed a probabilistic model that relates the fMRI activity space to the visual image space via a set of latent variables. The mapping from the latent variables to the visual image space can be regarded as a set of image bases. We found that spatially localized, multi-scale image bases were estimated near the fovea, and that the model using the estimated image bases was able to accurately reconstruct novel visual images. The proposed method provides a means to discover a novel functional mapping between stimuli and brain activity patterns.", "full_text": "Estimating image bases for visual image\nreconstruction from human brain activity\n\nYusuke Fujiwara1 Yoichi Miyawaki2;1 Yukiyasu Kamitani1\n\n1ATR Computational Neuroscience Laboratories\n\n2National Institute of Information and Communications Technology\n\n2-2-2 Hikaridai, Seika-cho, Kyoto, Japan\n\nyureisoul@gmail.com yoichi m@atr.jp kmtn@atr.jp\n\nAbstract\n\nImage representation based on image bases provides a framework for understand-\ning neural representation of visual perception. A recent fMRI study has shown\nthat arbitrary contrast-de\ufb01ned visual images can be reconstructed from fMRI ac-\ntivity patterns using a combination of multi-scale local image bases. In the recon-\nstruction model, the mapping from an fMRI activity pattern to the contrasts of the\nimage bases was learned from measured fMRI responses to visual images. But the\nshapes of the images bases were \ufb01xed, and thus may not be optimal for reconstruc-\ntion. Here, we propose a method to build a reconstruction model in which image\nbases are automatically extracted from the measured data. We constructed a prob-\nabilistic model that relates the fMRI activity space to the visual image space via a\nset of latent variables. The mapping from the latent variables to the visual image\nspace can be regarded as a set of image bases. We found that spatially localized,\nmulti-scale image bases were estimated near the fovea, and that the model using\nthe estimated image bases was able to accurately reconstruct novel visual images.\nThe proposed method provides a means to discover a novel functional mapping\nbetween stimuli and brain activity patterns.\n\n1 Introduction\n\nThe image basis is a key concept for understanding neural representation of visual images. Using\nimage bases, we can consider natural scenes as a combination of simple elements corresponding\nto neural units. Previous works have shown that image bases similar to receptive \ufb01elds of simple\ncells are learned from natural scenes by the sparse coding algorithm [4, 9]. A recent fMRI study\nhas shown that visual images can be reconstructed using a linear combination of multi-scale image\nbases (1x1, 1x2, 2x1, and 2x2 pixels covering an entire image), whose contrasts were predicted\nfrom the fMRI activity pattern [6]. The multi-scale bases produced more accurate reconstruction\nthan the pixel-by-pixel prediction, and each scale contributed to reconstruction in a way consistent\nwith known visual cortical representation. However, the prede\ufb01ned shapes of image bases may not\nbe optimal for image reconstruction.\nHere, we developed a method to automatically extract image bases from measured fMRI responses\nto visual stimuli, and used them for image reconstruction. We employed the framework of canonical\ncorrelation analysis (CCA), in which two multi-dimensional observations are related via a common\ncoordinate system. CCA \ufb01nds multiple correspondences between a weighted sum of voxels and\na weighted sum of pixels. These correspondences provide an ef\ufb01cient mapping between the two\nobservations. The pixel weights for each correspondence can be thought to de\ufb01ne an image basis.\nAs the early visual cortex is known to be organized in a retinotopic manner, one can assume that\na small set of pixels corresponds to a small set of voxels. To facilitate the mapping between small\n\n1\n\n\fFigure 1: Model for estimating image bases. (a) Illustration of the model framework. The visual\nimage I (pixels) and an fMRI activity pattern r (voxels) is linked by latent variables z. The links\nfrom each latent variable to image pixels de\ufb01ne an image basis WI, and the links from each latent\nvariable to fMRI voxels is called a weight vector Wr. (b) Graphical representation of the model.\nCircles indicate model parameters to be estimated and squares indicate observations. The matrices\nWI and Wr, the common latent variable z, and the inverse variances (cid:11)I and (cid:11)r are simultaneously\nestimated using the variational Bayesian method. Using the estimated parameters, the predictive\ndistribution for a visual image given a new brain activity pattern is constructed (dashed line).\n\nsets of pixels and voxels, we extended CCA to Bayesian CCA [10] with sparseness priors. Bayesian\nCCA treats the multiple correspondences as latent variables with two transformation matrices to two\nsets of observations. The transformation matrix to the visual image can be regarded as a set of image\nbases. The matrices are assumed to be random variables with hyper-parameters. We introduced a\nsparseness prior into each element of the matrices, such that only small subsets of voxels and pixels\nare related with non-zero matrix elements.\nThe Bayesian CCA model was applied to the data set of Miyawaki et al. [6]. We show that spa-\ntially localized image bases were extracted, especially around the foveal region, whose shapes were\nsimilar to those used in the previous work. We also demonstrate that the model using the estimated\nimage bases produced accurate visual image reconstruction.\n\n2 Method\n\nWe constructed a model in which a visual image is related with an fMRI activity pattern via latent\nvariables (Figure 1). Each latent variable has links to a set of pixels, which can be regarded as\nan image basis because links from a single latent variable construct an element of a visual image.\nThe latent variable also has multiple links to a set of fMRI voxels, which we call a weight vector.\nThis model is equivalent to CCA: each latent variable corresponds to a canonical coef\ufb01cient [3] that\nbundles a subset of fMRI voxels responding to a speci\ufb01c visual stimulus. We then extended the CCA\nmodel to the Bayesian CCA model that can conduct a sparse selection of these links automatically.\n\n2.1 Canonical Correlation Analysis\n\nWe \ufb01rst consider the standard CCA for estimating image bases given visual images I and fMRI\nactivity patterns r. Let I be an N (cid:2) 1 vector and r be a K (cid:2) 1 vector where N is the number\nof image pixels, K is the number of fMRI voxels and t is a sample index. Both data sets are\n(cid:1) I(t)\nindependent identically distributed (i.i.d.) samples. CCA \ufb01nds linear combinations u1(t) = a0\n(cid:1) r(t) such that the correlation between u1 and v1 is maximized. The variables u1\nand v1(t) = b0\nand v1 are called the \ufb01rst canonical variables and the vectors a1 and b1 are called the canonical\n(cid:1) r(t) are sought\ncoef\ufb01cients. Then, the second canonical variables u2(t) = a0\nby maximizing the correlation of u2 and v2 while the second canonical variables are orthogonalized\nto the \ufb01rst canonical variables. This procedure is continued up to a pre-de\ufb01ned number of times M.\nThe number M is conventionally set to the smaller dimension of the two sets of observations: in\nour case, M = N because the number of visual-image pixels is much smaller than that of the fMRI\n\n(cid:1) I(t) and v2(t) = b0\n\n1\n\n1\n\n2\n\n2\n\n2\n\n(a)(b)\fvoxels (N < K). The M sets of canonical variables are summarized as\n\n(1)\n(2)\nwhere u(t) and v(t) are M (cid:2) 1 vectors, A is an M (cid:2) N matrix, and B is a M (cid:2) K matrix. The\nmatrices A and B are obtained by solving the eigen problem of the covariance matrix between I\nand r [1]. The visual image can be reconstructed by\n\nu(t) = A (cid:1) I(t);\nv(t) = B (cid:1) r(t);\n\nI(t) = A\n\n(cid:0)1 (cid:1) B (cid:1) r(t);\n\n(3)\n\nwhere each column vector of the inverse matrix A(cid:0)1 is an image basis.\n\n2.2 Bayesian CCA\n\nBayesian CCA introduces common latent variables that relate a visual image I and the fMRI ac-\ntivity pattern r with image basis set WI and weight vector set Wr (Figure 1 (b)). These variables\nare treated as random variables and prior distributions are assumed for each variable. Hyper-prior\ndistributions are also assumed for an inverse variance of each element of the image bases and the\nweight vectors. The image bases and the weight vectors are estimated as a posterior distribution by\nthe variational Bayesian method [2]. After the parameters are determined, a predictive distribution\nfor the visual image can be calculated.\nWe assume two likelihood functions. One is for visual images that are generated from latent vari-\nables. The other is for fMRI activity patterns that are generated from the same latent variables.\nWhen observation noises for visual images and fMRI voxels are assumed to follow a Gaussian dis-\ntribution with zero mean and spherical covariance, the likelihood functions of the visual image I and\nthe fMRI activity pattern r are\n\njjI(t) (cid:0) WI (cid:1) z(t)jj2\n\n(4)\n\nP (IjWI; z) / exp\n\n[\n[\nP (rjWr; z) / exp\n\n(cid:0)1\n\n2 (cid:12)I\n\n(cid:0)1\n\nT\u2211\nT\u2211\n\nt=1\n\njjr(t) (cid:0) Wr (cid:1) z(t)jj2\n\n2 (cid:12)r\n\n(5)\nwhere WI is an N (cid:2) M matrix representing M image bases, each of which consists of N pixels,\nWr is a K(cid:2)M matrix representing M weight vectors, each of which consist of K voxels, z(t) is an\nM (cid:2) 1 vector representing latent variables, (cid:12)\nare scalar variables representing unknown\nnoise variances of the visual image and fMRI activity pattern, and T is the number of observations.\nThe latent variables are treated as the following Gaussian prior distribution,\n\nand (cid:12)\n\n(cid:0)1\nI\n\n(cid:0)1\nr\n\nt=1\n\n;\n\n]\n]\n\n;\n\nwhere (cid:11)I(n;m) and (cid:11)r(k;m) are the inverse variances of the elements in WI and Wr, respectively,\nwhich are assumed to be mutually independent.\nWe also assume hyper-prior distributions for the inverse variances (cid:11)I(n;m) and (cid:11)r(k;m),\n\nThe image bases and weight vectors are regarded as random variables, and the prior distributions of\nthem are assumed as,\n\n:\n\n(6)\n\n]\n\n[\nP0(z) / exp\n[\nP0(WIj(cid:11)I) / exp\n[\nP0(Wrj(cid:11)r) / exp\n\n(cid:0)1\n2\n\n(cid:0)1\n2\n\nT\u2211\n\nt=1\n\n(cid:0)1\n2\n\nN\u2211\nK\u2211\n\nn=1\n\nM\u2211\nM\u2211\n\nm=1\n\nk=1\n\nm=1\n\njjz(t)jj2\n(\n(\n\n(cid:11)I(n;m)\n\n(cid:11)r(k;m)\n\n]\n]\n\n)2\n)2\n\n;\n\n;\n\nWI(n;m)\n\nWr(k;m)\n\nP0((cid:11)I) =\n\nP0((cid:11)r) =\n\nG((cid:11)I(n;m)j (cid:22)(cid:11)I(n;m); (cid:13)I(n;m));\nG((cid:11)I(k;m)j (cid:22)(cid:11)r(k;m); (cid:13)r(k;m));\n\n\u220f\n\u220f\n\nn\n\n\u220f\n\u220f\n\nm\n\nk\n\nm\n\n3\n\n(7)\n\n(8)\n\n(9)\n\n(10)\n\n\fwhere G((cid:11)j(cid:22)(cid:11); (cid:13)) represents the Gamma distribution with mean (cid:22)(cid:11) and con\ufb01dence parameter (cid:13). For\nour analysis, all the means (cid:22)(cid:11)I(n;m) and (cid:22)(cid:11)r(k;m) were set to 1 and all the con\ufb01dence parameters\n(cid:13)I(n;m) and (cid:13)r(k;m) were set to 0.\nThis con\ufb01guration of the prior and hyper-prior settings is known as the automatic relevance deter-\nmination (ARD), where non-effective parameters are automatically driven to zero [7]. In the current\ncase, these priors and hyper-priors lead to a sparse selection of links from each latent variable to\npixels and voxels.\nPrior distributions of observation noise are assumed as non-informative priors, which are described\nby the observation noise,\n\nP0((cid:12)I) =\n\nP0((cid:12)r) =\n\n1\n(cid:12)I\n1\n(cid:12)r\n\n;\n\n:\n\n(11)\n\n(12)\n\n2.3 Parameter estimation by the variational Bayesian method\nThe image bases and weight vectors are estimated as a posterior distribution P (WI; WrjI; r), given\nthe likelihood functions (Eqs. (4) and (5)), the prior distributions (Eqs. (6) - (8), (11) and (12)), and\nthe hyper-prior distributions (Eqs. (9) and (10)). This posterior distribution is obtained by marginal-\nizing the joint posterior distribution P (WI; Wr; z; (cid:11)I; (cid:11)r; (cid:12)I; (cid:12)rjI; r) with respect to latent vari-\nables and variance parameters,\nP (WI; WrjI; r) =\n\ndzd(cid:11)Id(cid:11)rd(cid:12)Id(cid:12)rP (WI; Wr; z; (cid:11)I; (cid:11)r; (cid:12)I; (cid:12)rjI; r):\n\n\u222b\n\n(13)\n\nSince the joint posterior distribution cannot be calculated analytically, we approximate it using a\ntrial distribution based on the variational Bayesian (VB) method [2].\nIn the VB method, a trial\ndistribution Q(WI; Wr; z; (cid:11)I; (cid:11)r; (cid:12)I; (cid:12)r) with the following factorization is assumed,\n\nQ(WI; Wr; z; (cid:11)I; (cid:11)r; (cid:12)I; (cid:12)r) = Qw(WI)Qw(Wr)Qz(z)Q(cid:11)((cid:11)I; (cid:11)r; (cid:12)I; (cid:12)r):\n\n(14)\nThe joint posterior distribution P (WI; Wr; z; (cid:11)I; (cid:11)r; (cid:12)I; (cid:12)rjI; r) is approximated by the factorized\ndistribution (Eq. (14)). According to the standard calculation of the VB method, the trial distribution\nN\u220f\nM\u220f\nof the image bases Qw(WI) is derived as\n\nQw(WI) =\n\nwhere\n\nn=1\n\nm=1\n\n(cid:0)1\nI(n;m));\n\nN (WI(n;m)jWI(n;m); (cid:27)\nT\u2211\n\nIn(t)zm(t);\n\n)\n\nt=1\n\nm(t) + T (cid:6)\nz2\n\n(cid:0)1\nz(m;m)\n\n+ (cid:11)I(n;m);\n\nWI(n;m) = (cid:22)(cid:12)I(cid:27)\n\n(cid:27)I(n;m) = (cid:22)(cid:12)I\n\n(cid:0)1\nI(n;m)\n\n( T\u2211\n\nt=1\n\n(15)\n\n(16)\n\n(17)\n\n(cid:0)1) represents a Gaussian distribution with mean (cid:22)x and variance (cid:27)\n\nand N (xj(cid:22)x; (cid:27)\n(cid:0)1. The trial distri-\nbution of the weight vectors Qw(Wr) is obtained in a similar way, by replacing I with r, n with k,\nand N with K in Eqs. (15-17). The trial distribution of the latent variables Qz(z) is obtained by\n\nwhere\n\nQz(z) =\n(\n\n(\n\n(cid:0)1\nz );\n\nN (z(t)jz(t); (cid:6)\n)\n)\n\n(\n\n(cid:0)1\nz(t) = (cid:6)\nz\n(cid:6)z = (cid:22)(cid:12)I\nW\n\n(cid:22)(cid:12)IW\n0\nIWI + (cid:6)\n\n0\nII(t) + (cid:22)(cid:12)rW\n+ (cid:22)(cid:12)r\n\n(cid:0)1\nwI\n\n;\n\n0\nrr(t)\n0\nrWr + (cid:6)\nW\n\n(cid:0)1\nwr\n\n)\n\n+ E:\n\n(18)\n\n(19)\n(20)\n\nT\u220f\n\nt=1\n\n4\n\n\fIn Eq. (20), E is an identity matrix, and (cid:6)wI and (cid:6)wr are de\ufb01ned as\n\n(cid:6)wI = diag\n\n(cid:27)I(n;M )\n\n;\n\n(21)\n\n([ N\u2211\n([ K\u2211\n\nn=1\n\nN\u2211\nK\u2211\n\nn=1\n\n(cid:27)I(n;1);(cid:1)(cid:1)(cid:1) ;\n\n(cid:27)r(k;1);(cid:1)(cid:1)(cid:1) ;\n\n])\n])\n\n)\n\n(cid:6)wr = diag\n\n(22)\nFinally, the distribution of the inverse variances Q(cid:11)((cid:11)I; (cid:11)r; (cid:12)I; (cid:12)r) is further factorized into\nQ(cid:11)((cid:11)I)Q(cid:11)((cid:11)r)Q(cid:11)((cid:12)I)Q(cid:11)((cid:12)r), each having a function form equivalent to a gamma distribution.\nThe expectation of (cid:11)I(n;m) is given by\n\n(cid:27)r(k;M )\n\nk=1\n\nk=1\n\n:\n\n(\n\n1\n2\n\n)(\n\n1\n2\n\nand that of (cid:12)I is given by\n\n(cid:22)(cid:11)I(n;m) =\n\n+ (cid:13)I0(n;m)\n\n(WI(n;m))2 +\n\n{\nT\u2211\n\nt=1\n\n[\njjI(t) (cid:0) WI(cid:22)z(t)jj2 + Tr\n\n( T\u2211\n\nt=1\n\n(cid:22)(cid:12)I = N T\n\n1\n2 (cid:27)\n\n(cid:0)1\nI(n;m) + (cid:13)I0(n;m)(cid:11)\n\n(cid:0)1\nI0(n;m)\n\n(cid:0)1\nwI\n\n(cid:6)\n\nz(t)z\n\n0(t) + T (cid:6)\n\n(cid:0)1\nz\n\n+ T (cid:6)\n\n(cid:0)1\nz W\n\n0\nIWI\n\n)(cid:0)1\n\n;\n\n(23)\n\n]}(cid:0)1\n\n:\n\n(24)\nThe expectations of Q(cid:11)((cid:11)r) and Q(cid:11)((cid:12)r) are obtained in a similar way, by replacing I with r, n\nwith k, and N with K in Eq. (23) and Eq. (24), respectively. The expectations of these distributions\nare used in the calculation of Qw(WI), Qw(Wr) and Qz(z) (Eqs. (15) - (20)). The algorithm\nestimates the joint posterior by successive calculations of 1) Qw(WI) and Qw(Wr), 2) Qz(z), and\n3) Q(cid:11)((cid:11)I; (cid:11)r; (cid:12)I; (cid:12)r). After the algorithm converges, image bases WI are calculated by taking the\nexpectation of Q(WI).\n\n2.4 Predictive distribution for visual image reconstruction\n\nUsing the estimated parameters, we can derive the predictive distribution for a visual image Inew\ngiven a new brain activity rnew (Figure 1 (b), dashed line). Note that Inew and rnew were taken\nfrom the data set reserved for testing the model, independent of the data set to estimate the model\nparameters. The predictive distribution P (Inewjrnew) is constructed from the likelihood of the visual\nimage (Eq. (4)), the estimated distribution of image bases Q(WI) (Eqs. (15) - (17)), and a posterior\ndistribution of latent variables P (znewjrnew) as follows,\n\nP (Inewjrnew) =\n\ndWIdznewP (InewjWI; znew)Q(WI)P (znewjrnew):\n\n(25)\n\n\u222b\n\nBecause the multiple integral over the random variable WI and znew is intractable, we replace the\nrandom variable WI with the estimated image bases WI to vanish the integral over WI. Then the\npredictive distribution becomes\n\n\u222b\n\nP (Inewjrnew) \u2019\n\ndznewP (Inewjznew)P (znewjrnew);\n]\n[\nP (Inewjznew) / exp\n\n(cid:22)(cid:12)IjjInew (cid:0) WIznewjj2\n\n:\n\n(cid:0)1\n2\n\n(27)\nSince P (znewjrnew) is an unknown distribution, we approximate P (znewjrnew) based on the trial\n\ndistribution Q(z) (Eqs. (18) - (20)). We construct an approximate distribution eQz(znew), by omit-\n\nting the terms related to the visual image in Eqs. (18) - (20),\n\neQz(znew) = N (zj(cid:22)znew; (cid:6)\n\nwhere\n\nwhere\n\n(\n\n(cid:22)znew = (cid:22)(cid:12)r(cid:6)\n(cid:6)znew = (cid:22)(cid:12)r\n\n(cid:0)1\nznewW\nW\n\n0\nrrnew;\n0\nrWr + (cid:6)\n\n5\n\n(cid:0)1\nznew);\n\n)\n\n(cid:0)1\nwr\n\n+ E:\n\n(26)\n\n(28)\n\n(29)\n(30)\n\n\fFinally, the predictive distribution is obtained by\n\nP (Inewjrnew) \u2019\n\ndznewP (Inewjznew)eQz(znew)\n\n\u222b\n= N (Inewj(cid:22)Inew; (cid:6)\n\n(cid:0)1\nInew);\n\nwhere\n\n0\n(cid:0)1\n(cid:22)Inew = (cid:22)(cid:12)rWI(cid:6)\nrrnew;\nznewW\n0\n(cid:0)1\n(cid:0)1\nI + (cid:22)(cid:12)\n(cid:6)Inew = WI(cid:6)\nI E:\nznewW\n\n(31)\n\n(32)\n(33)\n\nThe reconstructed visual image is calculated by taking the expectation of the predictive distribution.\n\n2.5\n\nfMRI data\n\nWe used the data set from Miyawaki et al. [6], in which fMRI signals were measured while subjects\nviewed visual images consisting of contrast-de\ufb01ned 10 (cid:2) 10 patches. The data set contained two\nindependent sessions. One is a \u201crandom image session\u201d, in which spatially random patterns were\nsequentially presented for 6 s followed by a 6 s rest period. A total of 440 different random patterns\nwere presented for each subject. The other is a \u201c\ufb01gure image session\u201d, in which alphabetical letters\nand simple geometric shapes were sequentially presented for 12 s followed by a 12 s rest period.\nFive alphabetical letters and \ufb01ve geometric shapes were presented six or eight times per subject. We\nused fMRI data from V1 for the analyses. See Miyawaki et al. [6] for details.\n\n3 Results\n\nWe estimated image bases and weight vectors using the data from the \u201crandom image session\u201d.\nThen, reconstruction performance was evaluated with the data from the \u201c\ufb01gure image session\u201d.\n\n3.1 Estimated image bases\n\nFigure 2 (a) shows representative image bases estimated by Bayesian CCA (weight values are in-\ndicated by a gray scale). The estimation algorithm extracted spatially localized image bases whose\nshapes were consistent with those used in the previous study [6] (1 (cid:2) 1, 1 (cid:2) 2, and 2 (cid:2) 1 shown\nin 1st and 2nd row of Figure 2 (a)). We also found image bases with other shapes (e.g., L-shape,\n3 (cid:2) 1 and 1 (cid:2) 3, 3rd row of Figure 2 (a)) that were not assumed in the previous study. We repeated\nthe estimation using data resampled from the random image session, and calculated the distribution\nof the image bases (de\ufb01ned by a pixel cluster with magnitudes over 3 SD of all pixel values) over\neccentricity for different sizes (Figure 2 (a), right). The image bases of the smallest size (1 (cid:2) 1)\nwere distributed over the visual \ufb01eld, and most of them were within three degrees of eccentricity.\nThe size of the image basis tended to increase with eccentricity. For comparison, we also performed\nthe image basis estimation using CCA, but it did not produce spatially localized image bases (Fig-\nure 2 (b)). Estimated weight vectors for fMRI voxels had high values around the retinotopic region\ncorresponding the location of the estimated basis (data not shown).\n\n3.2 Visual image reconstruction using estimated image bases\n\nThe reconstruction model with the estimated image bases was tested on \ufb01ve alphabet letters and \ufb01ve\ngeometric shapes (Figure 3 (a), 1st row). The images reconstructed by Bayesian CCA captured the\nessential features of the presented images (Figure 3 (a), 2nd row). In particular, they showed \ufb01ne\nreconstruction for \ufb01gures consisting of thin lines such as small frames and alphabet letters. However,\nthe peripheral reconstruction was poor and often lacked shapes of the presented images. This may\nbe due to the lack of estimated image bases in the peripheral regions (Figure 2 (a), right). The\nstandard CCA produced poorer reconstruction with noise scattered over the entire image (Figure\n3 (a), 3rd row), as expected from the non-local image bases estimated by CCA (Figure 2 (b)).\nReconstruction using \ufb01xed image bases [6] showed moderate accuracy for all image types (Figure\n3 (a), 4th row). To evaluate the reconstruction performance quantitatively, we calculated the spatial\ncorrelation between the presented and reconstructed images (Figure 3 (b)). The correlation values\n\n6\n\n\fFigure 2: Image basis estimation: (a) Representative bases estimated by Bayesian CCA (left,\nsorted by the number of pixels), and their frequency as a function of eccentricity (right). 3-pixel\nbases (L-shape, 3x1 and 1x3) were not assumed in Miyawaki et al. [6]. Negative (dark) bases were\noften associated with negative voxel weights, thus equivalent to positive bases with positive voxel\nweights. (b) Examples of image bases estimated by the standard CCA.\n\nwere not signi\ufb01cantly different between Bayesian CCA and the \ufb01xed basis method when the alphabet\nletters and the geometric shapes were analyzed together. However, Bayesian CCA outperformed the\n\ufb01xed basis method for the alphabet letters, while the \ufb01xed basis method outperformed Bayesian\nCCA for the geometric shapes (p < :05). This is presumably because the alphabet letters consist\nof more foveal pixels, which overlap the region covered by the image bases estimated by Bayesian\nCCA. The reconstruction performance of CCA was lowest in all cases.\n\n4 Discussion\n\nWe have proposed a new method to estimate image bases from fMRI data and presented visual\nstimuli. Our model consists of the latent variables and two matrices relating the two sets of obser-\nvations. The previous work used \ufb01xed image bases and estimated the weights between the image\nbases and fMRI voxels. This estimation was conducted by the sparse logistic regression that as-\nsumed sparsenes in the weight values, which effectively removed irrelevant voxels [8]. The proposed\nmethod introduced sparseness priors not only for fMRI voxels but also for image pixels. These pri-\nors lead to automatic extraction of images bases, and the mappings between a small number of fMRI\nvoxels and a small number of image pixels. Using this model, we successfully extracted spatially\nlocalized image bases including those not used in the previous work [6]. Using the set of image\nbases, we were able to accurately reconstruct arbitrary contrast-de\ufb01ned visual images from fMRI\nactivity patterns. The sparseness priors played an important role to estimate spatially localized im-\nage bases, and to improve reconstruction performance, as demonstrated by the comparison with the\nresults from standard CCA (Figure 2 and 3).\nOur method has several limitations. First, as the latent variables were assumed to have an orthogo-\nnal Gaussian distribution, it may be dif\ufb01cult to obtain non-orthogonal image bases, which have been\n\n7\n\n(a) Estimated image bases by Bayesian CCA040123456FrequencyEccentricity [deg]3x11x2L-shape00.5-0.5(b) Estimated image bases by CCA1x22x11x33-pixel basis1-pixel basis2-pixel basis040040\fFigure 3: Visual image reconstruction: (a) Presented images (1st row, alphabet letters and geo-\nmetric shapes) and the reconstructed images obtained from Bayesian CCA, the standard CCA, and\nthe \ufb01xed basis model (2nd - 4th rows). (b) Spatial correlation between presented and reconstructed\nimages.\n\nshown to provide an effective image representation in the framework of sparse coding [4,9]. Differ-\nent types of image bases could be generated by introducing non-orthogonality and/or non-lineality\nin the model. The shape of estimated image bases may also depend on the visual stimuli used for\nthe training of the reconstruction model. Although we used random images as visual stimuli, other\ntypes of images including natural scenes may lead to more effective image bases that allow for ac-\ncurate reconstruction. Finally, our method failed to estimate peripheral image bases, and as a result,\nonly poor reconstruction was achieved for peripheral pixels. The cortical magni\ufb01cation factor of the\nvisual cortex [5] suggests that a small number of voxels represent a large number of image pixels in\nthe periphery. Elaborate assumptions about the degree of sparseness depending on eccentricity may\nhelp to improve basis estimation and image reconstruction in the periphery.\n\nAcknowledgments\n\nThis study was supported by the Nissan Science Foundation, SCOPE (SOUMU) and SRPBS\n(MEXT).\n\n8\n\nSpatial CorrelationFixed bases (Miyawaki et al.)Bayesian CCA00.40.8PresentedReconstructedAllGeometricshapesAlphabetLettersGeometric shapesAlphabet letters(a)(b)CCA Bayesian CCAFixed bases (Miyawaki et al.)CCA\fReferences\n[1] Anderson, T.W. (2003). An Introduction to Multivariate Statistical Analysis. 3rd ed. Wiley\n\nInterscience.\n\n[2] Attias, H. (1999). Inferring parameters and structure of latent variable models by variational\n\nBayes. Proc. 15th Conference on Uncertainty in Arti\ufb01cial Intelligence, 21-30.\n\n[3] Bach, F.R. and Jordan, M.I. (2005). A probabilistic interpretation of canonical correlation anal-\n\nysis. Dept. Statist., Univ. California, Berkeley, CA, Tech. Repo. 688.\n\n[4] Bell, A.J. and Sejnowski, T.J. (1997) The independent components of natural scenes are edge\n\n\ufb01lter. Vision Res. 27(23), 3327-3338.\n\n[5] Engel, S.A., Glover, G.H. and Wandell, B.A. (1997) Retinotopic organization in human visual\n\ncortex and the spatial precision of functional MRI. Cereb. Cortex 7, 181-192.\n\n[6] Miyawaki, Y., Uchida, H., Yamashita, O., Sato, MA., Morito, Y., Tanabe, HC., Sadato, N. and\nKamitani, Y. (2008). Visual image reconstruction from human brain activity using a combina-\ntion of multiscale local image decoders. Neuron 60(5), 915-929.\n\n[7] Neal, R.M. (1996). Bayesian learning for Neural Networks. Springer-Verlag.\n[8] Yamashita, O., Sato, MA., Yoshioka, T., Tong, F., Kamitani, Y. (2008) Sparse estimation au-\ntomatically selects voxels relevant for the decoding of fMRI activity patterns. Neuroimage.\n42(4), 1414-29.\n\n[9] Olshausen ,B.A. and Field, D.J. (1996). Emergence of simple-cell receptive \ufb01eld properties by\n\nlearning a sparse code for natural images. Nature 381, 607-609.\n\n[10] Wang, C. (2007). Variatonal Bayesian Approach to Canonical Correlation Analysis. IEEE\n\nTrans Neural Netw. 18(3), 905-910.\n\n9\n\n\f", "award": [], "sourceid": 804, "authors": [{"given_name": "Yusuke", "family_name": "Fujiwara", "institution": null}, {"given_name": "Yoichi", "family_name": "Miyawaki", "institution": null}, {"given_name": "Yukiyasu", "family_name": "Kamitani", "institution": null}]}