{"title": "Deconvolution of High Dimensional Mixtures via Boosting, with Application to Diffusion-Weighted MRI of Human Brain", "book": "Advances in Neural Information Processing Systems", "page_first": 2699, "page_last": 2707, "abstract": "Diffusion-weighted magnetic resonance imaging (DWI) and fiber tractography are the only methods to measure the structure of the white matter in the living human brain. The diffusion signal has been modelled as the combined contribution from many individual fascicles of nerve fibers passing through each location in the white matter. Typically, this is done via basis pursuit, but estimation of the exact directions is limited due to discretization. The difficulties inherent in modeling DWI data are shared by many other problems involving fitting non-parametric mixture models. Ekanadaham et al. proposed an approach, continuous basis pursuit, to overcome discretization error in the 1-dimensional case (e.g., spike-sorting). Here, we propose a more general algorithm that fits mixture models of any dimensionality without discretization. Our algorithm uses the principles of L2-boost, together with refitting of the weights and pruning of the parameters. The addition of these steps to L2-boost both accelerates the algorithm and assures its accuracy. We refer to the resulting algorithm as elastic basis pursuit, or EBP, since it expands and contracts the active set of kernels as needed. We show that in contrast to existing approaches to fitting mixtures, our boosting framework (1) enables the selection of the optimal bias-variance tradeoff along the solution path, and (2) scales with high-dimensional problems. In simulations of DWI, we find that EBP yields better parameter estimates than a non-negative least squares (NNLS) approach, or the standard model used in DWI, the tensor model, which serves as the basis for diffusion tensor imaging (DTI). We demonstrate the utility of the method in DWI data acquired in parts of the brain containing crossings of multiple fascicles of nerve fibers.", "full_text": "Deconvolution of High Dimensional Mixtures via\nBoosting, with Application to Diffusion-Weighted\n\nMRI of Human Brain\n\nCharles Y. Zheng\n\nDepartment of Statistics\n\nStanford University\nStanford, CA 94305\n\nsnarles@stanford.edu\n\nFranco Pestilli\n\nDepartment of Psychological and Brain Sciences\n\nIndiana University, Bloomington, IN 47405\n\nfranpest@indiana.edu\n\nAriel Rokem\n\nDepartment of Psychology\n\nStanford University\nStanford, CA 94305\n\narokem@stanford.edu\n\nAbstract\n\nDiffusion-weighted magnetic resonance imaging (DWI) and \ufb01ber tractography are\nthe only methods to measure the structure of the white matter in the living hu-\nman brain. The diffusion signal has been modelled as the combined contribution\nfrom many individual fascicles of nerve \ufb01bers passing through each location in the\nwhite matter. Typically, this is done via basis pursuit, but estimation of the exact\ndirections is limited due to discretization [1, 2]. The dif\ufb01culties inherent in model-\ning DWI data are shared by many other problems involving \ufb01tting non-parametric\nmixture models. Ekanadaham et al. [3] proposed an approach, continuous basis\npursuit, to overcome discretization error in the 1-dimensional case (e.g., spike-\nsorting). Here, we propose a more general algorithm that \ufb01ts mixture models of\nany dimensionality without discretization. Our algorithm uses the principles of\nL2-boost [4], together with re\ufb01tting of the weights and pruning of the parame-\nters. The addition of these steps to L2-boost both accelerates the algorithm and\nassures its accuracy. We refer to the resulting algorithm as elastic basis pursuit, or\nEBP, since it expands and contracts the active set of kernels as needed. We show\nthat in contrast to existing approaches to \ufb01tting mixtures, our boosting framework\n(1) enables the selection of the optimal bias-variance tradeoff along the solution\npath, and (2) scales with high-dimensional problems. In simulations of DWI, we\n\ufb01nd that EBP yields better parameter estimates than a non-negative least squares\n(NNLS) approach, or the standard model used in DWI, the tensor model, which\nserves as the basis for diffusion tensor imaging (DTI) [5]. We demonstrate the util-\nity of the method in DWI data acquired in parts of the brain containing crossings\nof multiple fascicles of nerve \ufb01bers.\n\n1\n\n\f1\n\nIntroduction\n\nIn many applications, one obtains measurements (xi, yi) for which the response y is related to x via\nsome mixture of known kernel functions f\u03b8(x), and the goal is to recover the mixture parameters \u03b8k\nand their associated weights:\n\nK(cid:88)\n\nyi =\n\nwkf\u03b8k(x) + \u0001i\n\n(1)\n\nk=1\n\nwhere f\u03b8(x) is a known kernel function parameterized by \u03b8, and \u03b8 = (\u03b81, . . . , \u03b8K) are model pa-\nrameters to be estimated, w = (w1, . . . , wK) are unknown nonnegative weights to be estimated,\nand \u0001i is additive noise. The number of components K is also unknown, hence, this is a nonpara-\nmetric model. One example of a domain in which mixture models are useful is the analysis of data\nfrom diffusion-weighted magnetic resonance imaging (DWI). This biomedical imaging technique\nis sensitive to the direction of water diffusion within millimeter-scale voxels in the human brain in\nvivo. Water molecules freely diffuse along the length of nerve cell axons, but is restricted by cell\nmembranes and myelin along directions orthogonal to the axon\u2019s trajectory. Thus, DWI provides\ninformation about the microstructural properties of brain tissue in different locations, about the tra-\njectories of organized bundles of axons, or fascicles within each voxel, and about the connectivity\nstructure of the brain. Mixture models are employed in DWI to deconvolve the signal within each\nvoxel with a kernel function, f\u03b8, assumed to represent the signal from every individual fascicle [1, 2]\n(Figure 1B), and wi provide an estimate of the \ufb01ber orientation distribution function (fODF) in each\nvoxel, the direction and volume fraction of different fascicles in each voxel. In other applications of\nmixture modeling these parameters represent other physical quantities. For example, in chemomet-\nrics, \u03b8 represents a chemical compound and f\u03b8 its spectra. In this paper, we focus on the application\nof mixture models to the data from DWI experiments and simulations of these experiments.\n\n1.1 Model \ufb01tting - existing approaches\n\nHereafter, we restrict our attention to the use of squared-error loss; resulting in penalized least-\nsquares problem\n\nminimize \u02c6K, \u02c6w,\u02c6\u03b8\n\n\u02c6wkf\u02c6\u03b8k\n\n(xi)\n\n+ \u03bbP\u03b8(w)\n\n(2)\n\n(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)yi \u2212\n\n\u02c6K(cid:88)\n\nk=1\n\n(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)2\n\nMinimization problems of the form (2) can be found in the signal deconvolution literature and else-\nwhere: some examples include super-resolution in imaging [6], entropy estimation for discrete dis-\ntributions [7], X-ray diffraction [8], and neural spike sorting [3]. Here, P\u03b8(w) is a convex penalty\nfunction of (\u03b8, w). Examples of such penalty functions given in Section 2.1; a formal de\ufb01nition of\nconvexity in the nonparametric setting can be found in the supplementary material, but will not be\nrequired for the results in the paper. Technically speaking, the objective function (2) is convex in\n(w, \u03b8), but since its domain is of in\ufb01nite dimensionality, for all practical purposes (2) is a nonconvex\noptimization problem. One can consider \ufb01xing the number of components in advance, and using a\ndescent method (with random restarts) to \ufb01nd the best model of that size. Alternatively, one could\nuse a stochastic search method, such as simulated annealing or MCMC [9], to estimate the size of the\nmodel and the model parameters simultaneously. However, as one begins to consider \ufb01tting models\nwith increasing number of components \u02c6K and of high dimensionality, it becomes increasingly dif\ufb01-\ncult to apply these approaches [3]. Hence a common approach to obtaining an approximate solution\nto (2) is to limit the search to a discrete grid of candidate parameters \u03b8 = \u03b81, . . . , \u03b8p. The estimated\nweights and parameters are then obtained by solving an optimization problem of the form\n\n\u02c6\u03b2 = argmin\u03b2>0||y \u2212 (cid:126)F \u03b2||2 + \u03bbP\u03b8(\u03b2)\n\nwhere (cid:126)F has the jth column (cid:126)f\u03b8j , where (cid:126)f\u03b8 is de\ufb01ned by ( (cid:126)f\u03b8)i = f\u03b8(xi). Examples applications\nof this non-negative least-squares-based approach (NNLS) include [10] and [1, 2, 7]. In contrast to\ndescent based methods, which get trapped in local minima, NNLS is guaranteed to converge to a\nsolution which is within \u0001 of the global optimum, where \u0001 depends on the scale of discretization. In\n\n2\n\n\fsome cases, NNLS will predict the signal accurately (with small error), but the parameters resulting\nwill still be erroneous. Figure 1 illustrates the worst-case scenario where discretization is misaligned\nrelative to the true parameters/kernels that generated the signal.\n\nFigure 1: The signal deconvolution problem. Fitting a mixture model with a NNLS algorithm is\nprone to errors due to discretization. For example, in 1D (A), if the true signal (top; dashed line)\narises from a mixture of signals from a bell-shaped kernel functions (bottom; dashed line), but only\na single kernel function between them is present in the basis set (bottom; solid line), this may result\nin inaccurate signal predictions (top; solid line), due to erroneous estimates of the parameters wi.\nThis problem arises in deconvolving multi-dimensional signals, such as the 3D DWI signal (B), as\nwell. Here, the DWI signal in an individual voxel is presented as a 3D surface (top). This surface\nresults from a mixture of signals arising from the fascicles presented on the bottom passing through\nthis single (simulated) voxel. Due to the signal generation process, the kernel of the diffusion signal\nfrom each one of the fascicles has a minimum at its center, resulting in \u2019dimples\u2019 in the diffusion\nsignal in the direction of the peaks in the fascicle orientation distribution function.\n\nIn an effort to improve the discretization error of NNLS, Ekanadham et al [3] introduced continuous\nbasis pursuit (CBP). CBP is an extension of nonnegative least squares in which the points on the\ndiscretization grid \u03b81, . . . , \u03b8p can be continuously moved within a small distance; in this way, one\ncan reach any point in the parameter space. But instead of computing the actual kernel functions\nfor the perturbed parameters, CBP uses linear approximations, e.g. obtained by Taylor expansions.\nDepending on the type of approximation employed, CBP may incur large error. The developers of\nCBP suggest solutions for this problem in the one-dimensional case, but these solutions cannot be\nused for many applications of mixture models (e.g DWI). The computational cost of both NNLS and\nCBP scales exponentially in the dimensionality of the parameter space. In contrast, using stochastic\nsearch methods or descent methods to \ufb01nd the global minimum will generally incur a computational\ncost scaling which is exponential in the sample size times the parameter space dimensions. Thus,\nwhen \ufb01tting high-dimensional mixture models, practitioners are forced to choose between the dis-\ncretization errors inherent to NNLS, or the computational dif\ufb01culties in the descent methods. We\nwill show that our boosting approach to mixture models combines the best of both worlds: while it\ndoes not suffer from discretization error, it features computational tractability comparable to NNLS\nand CBP. We note that for the speci\ufb01c problem of super-resolution, C`andes derived a deconvolution\nalgorithm which \ufb01nds the global minimum of (2) without discretization error and proved that the al-\ngorithm can recover the true parameters under a minimal separation condition on the parameters [6].\nHowever, we are unaware of an extension of this approach to more general applications of mixture\nmodels.\n\n1.2 Boosting\n\nThe model (1) appears in an entirely separate context, as the model for learning a regression function\nas an ensemble of weak learners f\u03b8, or boosting [4]. However, the problem of \ufb01tting a mixture model\nand the problem of \ufb01tting an ensemble of weak learners have several important differences. In the\ncase of learning an ensemble, the family {f\u03b8} can be freely chosen from a universe of possible weak\nlearners, and the only concern is minimizing the prediction risk on a new observation. In contrast,\nin the case of \ufb01tting a mixture model, the family {f\u03b8} is speci\ufb01ed by the application. As a result,\nboosting algorithms, which were derived under the assumption that {f\u03b8} is a suitably \ufb02exible class\nof weak learners, generally perform poorly in the signal deconvolution setting, where the family\n{f\u03b8} is in\ufb02exible. In the context of regression, L2boost, proposed by Buhlmann et al [4] produces a\n\n3\n\nABSignal Parameters \fpath of ensemble models which progressively minimize the sum of squares of the residual. L2boost\n\ufb01ts a series of models of increasing complexity. The \ufb01rst model consists of the single weak learner\n(cid:126)f\u03b8 which best \ufb01ts y. The second model is formed by \ufb01nding the weak learner with the greatest\ncorrelation to the residual of the \ufb01rst model, and adding the new weak learner to the model, without\nchanging any of the previously \ufb01tted weights. In this way the size of the model grows with the\nnumber of iterations: each new learner is fully \ufb01t to the residual and added to the model. But\nbecause the previous weights are never adjusted, L2Boost fails to converge to the global minimum\nof (2) in the mixture model setting, producing suboptimal solutions. In the following section, we\nmodify L2Boost for \ufb01tting mixture models. We refer to the resulting algorithm as elastic basis\npursuit.\n\n2 Elastic Basis Pursuit\n\nOur proposed procedure for \ufb01tting mixture models consists of two stages. In the \ufb01rst stage, we\ntransform a L1 penalized problem to an equivalent non regularized least squares problem. In the\nsecond stage, we employ a modi\ufb01ed version of L2Boost, elastic basis pursuit, to solve the trans-\nformed problem. We will present the two stages of the procedure, then discuss our fast convergence\nresults.\n\n2.1 Regularization\n\nFor most mixture problems it is bene\ufb01cial to apply a L1-norm based penalty, by using a modi\ufb01ed\ninput \u02dcy and kernel function family \u02dcf\u03b8, so that\n\nargminK,w,\u03b8\n\n(cid:126)f\u03b8\n\n+ \u03bbP\u03b8(w) = argminK,w,\u03b8\n\n(3)\n\n(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)y \u2212 K(cid:88)\n\ni=1\n\n(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)2\n\n(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)\u02dcy \u2212 K(cid:88)\n\ni=1\n\n(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)2\n\n\u02dcf\u03b8\n\nWe will use our modi\ufb01ed L2Boost algorithm to produce a path of solutions for objective function\non the left side, which results in a solution path for the penalized objective function (2).\nFor example, it is possible to embed the penalty P\u03b8(w) = ||w||2\n1 in the optimization problem (2).\nOne can show that solutions obtained by using the penalty function P\u03b8(w) = ||w||2\n1 have a one-\nto-one correspondence with solutions of obtained using the usual L1 penalty ||w||1. The penalty\n||w||2\nand using modi\ufb01ed kernel vectors\n\n1 is implemented by using the transformed input: \u02dcy =\n\n(cid:18)y\n\n(cid:19)\n\n0\n\n. Other kinds of regularization are also possible, and are presented in the supplemental\n\n(cid:19)\n\n(cid:18) (cid:126)f\u03b8\u221a\n\n\u02dcf\u03b8 =\nmaterial.\n\n\u03bb\n\n2.2 From L2Boost to Elastic Basis Pursuit\n\nMotivated by the connection between boosting and mixture modelling, we consider application of\nL2Boost to solve the transformed problem (the left side of(3)). Again, we reiterate the nonparamet-\nric nature of the model space; by minimizing (3), we seek to \ufb01nd the model with any number of\ncomponents which minimizes the residual sum of squares. In fact, given appropriate regularization,\nthis results in a well-posed problem. In each iteration of our algorithm a subset of the parameters, \u03b8\nare considered for adjustment. Following Lawson and Hanson [11], we refer to these as the active\nset. As stated before, L2Boost can only grow the active set at each iteration, converging to inaccurate\nmodels. Our solution to this problem is to modify L2Boost so that it grows and contracts the active\nset as needed; hence we refer to this modi\ufb01cation of the L2Boost algorithm as elastic basis pursuit.\nThe key ingredient for any boosting algorithm is an oracle for \ufb01tting a weak learner: that is, a func-\ntion \u03c4 which takes a residual as input and returns the parameter \u03b8 corresponding to the kernel \u02dcf\u03b8\nmost correlated with the residual. EBP takes as inputs the oracle \u03c4, the input vector \u02dcy, the function\n\u02dcf\u03b8, and produces a path of solutions which progressively minimize (3). To initialize the algorithm,\nwe use NNLS to \ufb01nd an initial estimate of (w, \u03b8). In the kth iteration of the boosting algorithm, let\n\u02dcr(k\u22121) be residual from the previous iteration (or the NNLS \ufb01t, if k = 1). The algorithm proceeds\nas follows\n\n4\n\n\f1. Call the oracle to \ufb01nd \u03b8new = \u03c4(\u02dcr(k\u22121)), and add \u03b8new to the active set \u03b8.\n2. Re\ufb01t the weights w, using NNLS, to solve:\n\nminimizew>0||\u02dcy \u2212 \u02dcF w||2\n\nwhere \u02dcF is the matrix formed from the regressors in the active set, \u02dcf\u03b8 for \u03b8 \u2208 \u03b8. This yields\nthe residual \u02dcr(k) = \u02dcy \u2212 \u02dcF w.\n\n3. Prune the active set \u03b8 by removing any parameter \u03b8 whose weight is zero, and update the\nweight vector w in the same way. This ensures that the active set \u03b8 remains sparse in each\niteration. Let (w(k), \u03b8(k)) denote the values of (w, \u03b8) at the end of this step of the iteration.\n4. Stopping may be assessed by computing an estimated prediction error at each iteration, via\nan independent validation set, and stopping the algorithm early when the prediction error\nbegins to climb (indicating over\ufb01tting).\n\nPsuedocode and Matlab code implementing this algorithm can be found in the supplement.\nIn the boosting context, the property of re\ufb01tting the ensemble weights in every iteration is known as\nthe totally corrective property; LPBoost [12] is a well-known example of a totally corrective boost-\ning algorithm. While we derived EBP as a totally corrective variant of L2Boost, one could also view\nEBP as a generalization of the classical Lawson-Hanson (LH) algorithm [11] for solving nonnega-\ntive least-squares problems. Given mild regularity conditions and appropriate regularization, Elastic\n\u221a\nBasis Pursuit can be shown to deterministically converge to the global optimum: we can bound the\nobjective function gap in the mth iteration by C/\nm, where C is an explicit constant (see 2.3).\nTo our knowledge, \ufb01xed iteration guarantees are unavailable for all other methods of comparable\ngenerality for \ufb01tting a mixture with an unknown number of components.\n\n2.3 Convergence Results\n\n(Detailed proofs can be found in the supplementary material.)\nFor our convergence results to hold, we require an oracle function \u03c4 : R\u02dcn \u2192 \u0398 which satis\ufb01es\n\n(cid:42)\n\n(cid:43)\n\n(cid:42)\n\n(cid:43)\n\n\u02dcr,\n\n\u02dcf\u03c4 (\u02dcr)\n|| \u02dcf\u03c4 (\u02dcr)||\n\n\u2265 \u03b1\u03c1(\u02dcr), where \u03c1(\u02dcr) = sup\n\u03b8\u2208\u0398\n\n\u02dcr,\n\n\u02dcf\u03b8\n|| \u02dcf\u03b8||\n\n(4)\n\nfor some \ufb01xed 0 < \u03b1 <= 1. Our proofs can also be modi\ufb01ed to apply given a stochastic oracle that\nsatis\ufb01es (4) with \ufb01xed probability p > 0 for every input \u02dcr. Recall that \u02dcy denotes the transformed\ninput, \u02dcf\u03b8 the transformed kernel and \u02dcn the dimensionality of \u02dcy. We assume that the parameter space\n\u0398 is compact and that \u02dcf\u03b8, the transformed kernel function, is continuous in \u03b8. Furthermore, we\nassume that either L1 regularization is imposed, or the kernels satisfy a positivity condition, i.e.\ninf \u03b8\u2208\u0398 f\u03b8(xi) \u2265 0 for i = 1, . . . , n. Proposition 1 states that these conditions imply the existence\nof a maximally saturated model (w\u2217, \u03b8\nThe existence of such a saturated model, in conjunction with existence of the oracle \u03c4, enables us to\nstate \ufb01xed-iteration guarantees on the precision of EBP, which implies asymptotic convergence to the\nglobal optimum. To do so, we \ufb01rst de\ufb01ne the quantity \u03c1(m) = \u03c1(\u02dcr(m)), see (4) above. Proposition\n2 uses the fact that the residuals \u02dcr(m) are orthogonal to \u02dcF (m), thanks to the NNLS \ufb01tting procedure\nin step 2. This allows us to bound the objective function gap in terms of \u03c1(m). Proposition 3 uses\nproperties of the oracle \u03c4 to lower bound the progress per iteration in terms of \u03c1(m).\nProposition 2 Assume the conditions of Proposition 1. Take saturated model w\u2217, \u03b8\n\n\u2217) of size K\u2217 \u2264 \u02dcn with residual \u02dcr\u2217.\n\n\u2217. Then de\ufb01ning\n\nK\u2217(cid:88)\n\nB\u2217 = 2\n\ni || \u02dcf\u03b8\u2217\nw\u2217\n\ni\n\n||\n\n(5)\n\nthe mth residual of the EBP algorithm \u02dcr(m) can be bounded in size by\n\ni=1\n\n||\u02dcr(m)||2 \u2264 ||\u02dcr\u2217||2 + B\u2217\u03c1(m)\n\n5\n\n\fIn particular, whenever \u03c1 converges to 0, the algorithm converges to the global minimum.\nProposition 3 Assume the conditions of Proposition 1. Then\n\n||\u02dcr(m)||2 \u2212 ||\u02dcr(m+1)||2 \u2265 (\u03b1\u03c1(m))2\n\nfor \u03b1 de\ufb01ned above in (4). This implies that the sequence ||\u02dcr(0)||2, . . . is decreasing.\nCombining Propositions 2 and 3 yields our main result for the non-asymptotic convergence rate.\nProposition 4 Assume the conditions of Proposition 1. Then for all m > 0,\n1\u221a\nm\n\n(cid:112)||\u02dcr(0)||2 \u2212 ||\u02dcr\u2217||2||\n\n||\u02dcr(m)||2 \u2212 ||\u02dcr\u2217||2 \u2264 Bmin\n\n\u03b1\n\nwhere\n\nBmin = inf\n\nw\u2217,\u03b8\u2217 B\u2217\n\nfor B\u2217 de\ufb01ned in (5)\nHence we have characterized the non-asymptotic convergence of EBP at rate\nconstant, which in turn implies asymptotic convergence to the global minimum.\n\n1\u221a\nm with an explicit\n\n3 DWI Results and Discussion\n\nTo demonstrate the utility of EBP in a real-world application, we used this algorithm to \ufb01t mixture\nmodels of DWI. Different approaches are taken to modeling the DWI signal. The classical Diffusion\nTensor Imaging (DTI) model [5], which is widely used in applications of DWI to neuroscience ques-\ntions, is not a mixture model. Instead, it assumes that diffusion in the voxel is well approximated\nby a 3-dimensional Gaussian distribution. This distribution can be parameterized as a rank-2 tensor,\nwhich is expressed as a 3 by 3 matrix. Because the DWI measurement has antipodal symmetry, the\ntensor matrix is symmetric, and only 6 independent parameters need to be estimated to specify it.\nDTI is accurate in many places in the white matter, but its accuracy is lower in locations in which\nthere are multiple crossing fascicles of nerve \ufb01bers. In addition, it should not be used to generate\nestimates of connectivity through these locations. This is because the peak of the \ufb01ber orientation\ndistribution function (fODF) estimated in this location using DTI is not oriented towards the direc-\ntion of any of the crossing \ufb01bers. Instead, it is usually oriented towards an intermediate direction\n(Figure 4B). To address these challenges, mixture models have been developed, that \ufb01t the signal\nas a combination of contributions from fascicles crossing through these locations. These models\nare more accurate in \ufb01tting the signal. Moreover, their estimate of the fODF is useful for track-\ning the fascicles through the white matter for estimates of connectivity. However, these estimation\ntechniques either use different variants of NNLS, with a discrete set of candidate directions [2], or\nwith a spherical harmonic basis set [1], or use stochastic algorithms [9]. To overcome the problems\ninherent in these techniques, we demonstrate here the bene\ufb01ts of using EBP to the estimation of a\nmixture models of fascicles in DWI. We start by demonstrating the utility of EBP in a simulation of\na known con\ufb01guration of crossing fascicles. Then, we demonstrate the performance of the algorithm\nin DWI data.\nThe DWI measurements for a single voxel in the brain are y1, . . . , yn for directions x1, . . . , xn on\nthe three dimensional unit sphere, given by\n\nyi =\n\nwkfDk(xi) + \u0001i, where fD(x) = exp[\u2212bxT Dx],\n\n(6)\n\nk=1\n\nThe kernel functions fD(x) each describe the effect of a single fascicle traversing the measurement\nvoxel on the diffusion signal, well described by the Stejskal-Tanner equation [13]. Because of the\nnon-negative nature of the MRI signal, \u0001i > 0 is generated from a Rician distribution [14]. where b is\na scalar quantity determined by the experimenter, and related to the parameters of the measurement\n(the magnitude of diffusion sensitization applied in the MRI instrument). D is a positive de\ufb01nite\nquadratic form, which is speci\ufb01ed by the direction along which the fascicle represented by fD\ntraverses the voxel and by additional parameters \u03bb1 and \u03bb2, corresponding to the axial and radial\n\n6\n\nK(cid:88)\n\n\fdiffusivity of the fascicle represented by fD. The oracle function \u03c4 is implemented by Newton-\nRaphson with random restarts. In each iteration of the algorithm, the parameters of D (direction\nand diffusivity) are found using the oracle function, \u03c4(\u02dcr), using gradient descent on \u02dcr, the current\nresiduals. In each iteration, the set of fD is shrunk or expanded to best match the signal.\n\n(A) A cross-section through the data.\n\nFigure 2: To demonstrate the steps of EBP, we examine data from 100 iterations of the DWI\nsimulation.\n(B) In the \ufb01rst iteration, the algorithm \ufb01nds\nthe best single kernel to represent the data (solid line: average kernel). (C) The residuals from this\n\ufb01t (positive in dark gray, negative in light gray) are fed to the next step of the algorithm, which then\n\ufb01nds a second kernel (solid line: average kernel). (D) The signal is \ufb01t using both of these kernels\n(which are the active set at this point). The combination of these two kernels \ufb01ts the data better than\nany of them separately, and they are both kept (solid line: average \ufb01t), but redundant kernels can\nalso be discarded at this point (D).\n\nFigure 3: The progress of EBP. In each plot, the abscissa denotes the number of iterations in the\nalgorithm (in log scale). (A) The number of kernel functions in the active set grows as the algorithm\nprogresses, and then plateaus. (B) Meanwhile, the mean square error (MSE) decreases to a minimum\nand then stabilizes. The algorithm would normally be terminated at this minimum. (C) This point\nalso coincides with a minimum in the optimal bias-variance trade-off, as evidenced by the decrease\nin EMD towards this point.\n\nIn a simulation with a complex con\ufb01guration of fascicles, we demonstrate that accurate recovery of\nthe true fODF can be achieved. In our simulation model, we take b = 1000s/mm2, and generate\nv1, v2, v3 as uniformly distributed vectors on the unit sphere and weights w1, w2, w3 as i.i.d. uni-\nformly distributed on the interval [0, 1]. Each vi is associated with a \u03bb1,i between 0.5 and 2, and\nsetting \u03bb2,i to 0. We consider the signal in 150 measurement vectors distributed on the unit sphere\naccording to an electrostatic repulsion algorithm. We partition the vectors into a training partition\nand a test partition to minimize the maximum angular separation in each partition. \u03c32 = 0.005 we\ngenerate a signal\nWe use cross-validation on the training set to \ufb01t NNLS with varying L1 regularization parameter c,\nusing the regularization penalty function: \u03bbP (w) = \u03bb(c \u2212 ||w||1)2. We choose this form of penalty\nfunction because we interpret the weights w as comprising partial volumes in the voxel; hence c\nrepresents the total volume of the voxel weighted by the isotropic component of the diffusion. We\n\ufb01x the regularization penalty parameter \u03bb = 1. The estimated fODFs and predicted signals are\nobtained by three algorithms: DTI, NNLS, and EBP. Each algorithm is applied to the training set\n(75 directions), and error is estimated, relative to a prediction on the test set (75 directions). The\nlatter two methods (NNLS, EBP) use the regularization parameters \u03bb = 1 and the c chosen by cross-\nvalidated NNLS. Figure 2 illustrates the \ufb01rst two iterations of EBP applied to these simulated data.\nThe estimated fODF are compared to the true fODF by the antipodally symmetrized Earth Mover\u2019s\n\n7\n\nABCDModel (cid:31)titeration 2Model (cid:31)titeration 1f\u03b8Residual +Residual \u2013Di(cid:30)usionsignal\fdistance (EMD) [15] in each iteration. Figure 3 demonstrates the progress of the internal state of\nthe EBP algorithm in many repetitions of the simulation. In the simulation results (Figure 4), EBP\nclearly reaches a more accurate solution than DTI, and a sparser solution than NNLS.\n\nFigure 4: DWI Simulation results. Ground truth entered into the simulation is a con\ufb01guration of 3\ncrossing fascicles (A). DTI estimates a single primary diffusion direction that coincides with none\nof these directions (B). NNLS estimates an fODF with many, demonstrating the discretization error\n(see also Figure 1). EBP estimates a much sparser solution with weights concentrated around the\ntrue peaks (D).\n\nThe same procedure is used to \ufb01t the three models to DWI data, obtained at 2x2x2 mm3, at a b-\nvalue of 4000 s/mm2. In the these data, the true fODF is not known. Hence, only test prediction\nerror can be obtained. We compare RMSE of prediction error between the models in a region of\ninterest (ROI) in the brain containing parts of the corpus callosum, a large \ufb01ber bundle that contains\nmany \ufb01bers connecting the two hemispheres, as well as the centrum semiovale, containing multiple\ncrossing \ufb01bers (Figure 5). NNLS and EBP both have substantially reduced error, relative to DTI.\n\nFigure 5: DWI data from a region of interest (A, indicated by red frame) is analyzed and RMSE is\ndisplayed for DTI (B), NNLS(C) and EBP(D).\n\n4 Conclusions\n\nWe developed an algorithm to model multi-dimensional mixtures. This algorithm, Elastic Basis Pur-\nsuit (EBP), is a combination of principles from boosting, and principles from the Lawson-Hanson\nactive set algorithm. It \ufb01ts the data by iteratively generating and testing the match of a set of candi-\ndate kernels to the data. Kernels are added and removed from the set of candidates as needed, using\na totally corrective back\ufb01tting step, based on the match of the entire set of kernels to the data at each\nstep. We show that the algorithm reaches the global optimum, with \ufb01xed iteration guarantees. Thus,\nit can be practically applied to separate a multi-dimensional signal into a sum of component signals.\nFor example, we demonstrate how this algorithm can be used to \ufb01t diffusion-weighted MRI signals\ninto nerve \ufb01ber fascicle components.\n\nAcknowledgments\n\nThe authors thank Brian Wandell and Eero Simoncelli for useful discussions. CZ was supported\nthrough an NIH grant 1T32GM096982 to Robert Tibshirani and Chiara Sabatti, AR was supported\nthrough NIH fellowship F32-EY022294. FP was supported through NSF grant BCS1228397 to\nBrian Wandell\n\n8\n\nACModel parametersTrue parameters-101-101-101-101-101-101-101-101-101B\fReferences\n\n[1] Tournier J-D, Calamante F, Connelly A (2007). Robust determination of the \ufb01bre orientation\ndistribution in diffusion MRI: non-negativity constrained super-resolved spherical deconvolution.\nNeuroimage 35:145972\n[2] DellAcqua F, Rizzo G, Scifo P, Clarke RA, Scotti G, Fazio F (2007). A model-based deconvo-\nlution approach to solve \ufb01ber crossing in diffusion-weighted MR imaging. IEEE Trans Biomed Eng\n54:46272\n[3] Ekanadham C, Tranchina D, and Simoncelli E. (2011). Recovery of sparse translation-invariant\nsignals with continuous basis pursuit. IEEE transactions on signal processing (59):4735-4744.\n[4] B\u00a8uhlmann P, Yu B (2003). Boosting with the L2 loss: regression and classi\ufb01cation. JASA,\n98(462), 324-339.\n[5] Basser,P. J., Mattiello, J. and Le-Bihan, D. (1994). MR diffusion tensor spectroscopy and imag-\ning. Biophysical Journal, 66:259-267.\n[6] Cand`es, E. J., and FernandezGranda, C. (2013). Towards a Mathematical Theory of Superreso-\nlution. Communications on Pure and Applied Mathematics.\n[7] Valiant, G., and Valiant, P. (2011, June). Estimating the unseen: an n/log (n)-sample estimator\nfor entropy and support size, shown optimal via new CLTs. In Proceedings of the 43rd annual ACM\nsymposium on Theory of computing (pp. 685-694). ACM.\n[8] S\u00b4anchez-Bajo, F., and Cumbrera, F. L. (2000). Deconvolution of X-ray diffraction pro\ufb01les by\nusing series expansion. Journal of applied crystallography, 33(2), 259-266.\n[9] Behrens TEJ, Berg HJ, Jbabdi S, Rushworth MFS, and Woolrich MW (2007). Probabilistic\ndiffusion tractography with multiple \ufb01ber orientations: What can we gain? NeuroImage (34):144-\n45.\n[10] Bro, R., and De Jong, S. (1997). A fast non-negativity-constrained least squares algorithm.\nJournal of chemometrics, 11(5), 393-401.\n[11] Lawson CL, and Hanson RJ. (1995). Solving Least Squares Problems. SIAM.\n[12] Demiriz, A., Bennett, K. P., and Shawe-Taylor, J. (2002). Linear programming boosting via\ncolumn generation. Machine Learning, 46(1-3), 225-254.\n[13] Stejskal EO, and Tanner JE. (1965). Spin diffusion measurements: Spin echoes in the presence\nof a time-dependent gradient \ufb01eld. J Chem Phys(42):288-92.\n[14] Gudbjartsson, H., and Patz, S. (1995). The Rician distribution of noisy MR data. Magn Reson\nMed. 34: 910914.\n[15] Rubner, Y., Tomasi, C. Guibas, L.J. (2000). The earth mover\u2019s distance as a metric for image\nretrieval. Intl J. Computer Vision, 40(2), 99-121.\n\n9\n\n\f", "award": [], "sourceid": 1397, "authors": [{"given_name": "Charles", "family_name": "Zheng", "institution": "Stanford University"}, {"given_name": "Franco", "family_name": "Pestilli", "institution": "Stanford"}, {"given_name": "Ariel", "family_name": "Rokem", "institution": "Stanford"}]}