{"title": "Speeding up Magnetic Resonance Image Acquisition by Bayesian Multi-Slice Adaptive Compressed Sensing", "book": "Advances in Neural Information Processing Systems", "page_first": 1633, "page_last": 1641, "abstract": "We show how to sequentially optimize magnetic resonance imaging measurement designs over stacks of neighbouring image slices, by performing convex variational inference on a large scale non-Gaussian linear dynamical system, tracking dominating directions of posterior covariance without imposing any factorization constraints. Our approach can be scaled up to high-resolution images by reductions to numerical mathematics primitives and parallelization on several levels. In a first study, designs are found that improve significantly on others chosen independently for each slice or drawn at random.", "full_text": "Speeding up Magnetic Resonance Image Acquisition\n\nby Bayesian Multi-Slice Adaptive Compressed\n\nSensing\n\nMatthias W. Seeger\n\nSaarland University and Max Planck Institute for Informatics\n\nCampus E1.4, 66123 Saarbr\u00a8ucken, Germany\nmseeger@mmci.uni-saarland.de\n\nAbstract\n\nWe show how to sequentially optimize magnetic resonance imaging measurement\ndesigns over stacks of neighbouring image slices, by performing convex varia-\ntional inference on a large scale non-Gaussian linear dynamical system, tracking\ndominating directions of posterior covariance without imposing any factorization\nconstraints. Our approach can be scaled up to high-resolution images by reduc-\ntions to numerical mathematics primitives and parallelization on several levels. In\na \ufb01rst study, designs are found that improve signi\ufb01cantly on others chosen inde-\npendently for each slice or drawn at random.\n\n1\n\nIntroduction\n\nMagnetic resonance imaging (MRI) [10, 6] is a very \ufb02exible imaging modality. In\ufb02icting no harm\non patients, it is used for an ever-growing number of diagnoses in health-care. Its most serious\nlimitation is acquisition speed, being based on a serial idea (gradient encoding) with limited scope\nfor parallelization. Fourier (aka. k-space) coef\ufb01cients are sampled along smooth trajectories (phase\nencodes), many of which are needed for reconstructions of suf\ufb01cient quality [17, 1]. Long scan\ntimes lead to patient annoyance, grave errors due to movement, and high running costs. The Nyquist\nsampling theorem [2] fundamentally limits traditional linear image reconstruction, but with modern\n3D MRI scenarios, dense sampling is not practical anymore. Acquisition is accelerated to some\nextent in parallel MRI1, by using receive coil arrays [19, 9]:\nthe sensitivity pro\ufb01les of different\ncoils provide part of the localization normally done by more phase steps. A different idea is to use\n(nonlinear) sparse image reconstruction, with which the Nyquist limit can be undercut robustly for\nimages, emphasized recently as compressed sensing [5, 3]. While sparse reconstruction has been\nused for MRI [28, 12], we address the more fundamental question of how to optimize the sampling\ndesign for sparse reconstruction over a speci\ufb01c real-world signal class (MR images) in an adaptive\nmanner, avoiding strong assumptions such as exact, randomly distributed sparsity that do not hold\nfor real images [23]. Our approach is in line with recent endeavours to extend MRI capabilities\nand reduce its cost, by complementing expensive, serial hardware with easily parallelizable digital\ncomputations.\nWe extend the framework of [24], the \ufb01rst approximate Bayesian method for MRI sampling opti-\nmization applicable at resolutions of clinical interest. Their approach falls short of real MRI practice\non a number of points. They considered single image slices only, while stacks2 of neighbouring\n\n1While parallel MRI is becoming the standard, its use is not straightforward. The sensitivity maps are\n\nunknown up front, depend partly on what is scanned, and their reliable estimation can be dif\ufb01cult.\n\n2\u201cStack-of-slices\u201d acquisition along the z axis works by transmitting a narrow-band excitation pulse while\napplying a magnetic \ufb01eld gradient linear in z. If the echo time (between excitation and readout) is shorter than\n\n1\n\n\fslices are typically acquired. Reconstruction can be improved signi\ufb01cantly by taking the strong\nstatistical dependence between pixels of nearby slices into account [14, 26, 18]. Design optimiza-\ntion is a joint problem as well: using the same acquisition pattern for neighbouring slices is clearly\nredundant. Second, the latent image was modelled as real-valued in [24], while in reality it is a\ncomplex-valued signal. To our knowledge, the few directly comparable approaches rely on \u201ctrial-\nand-error\u201d exploration [12, 16, 27], requiring substantially more human expert interventions and real\nMRI measurements, whose high costs our goal-directed method aims to minimize.\nOur extension to stacks of slices requires new technology. Global Gaussian covariances have to\nbe approximated, a straightforward extension of which to many slices is out of the question. We\nshow how to use approximate Kalman smoothing, implementing message passing by the Lanczos\nalgorithm, which has not been done in machine learning before (see [20, 25] for similar proposals\nto oceanography problems). Our technique is complementary to mean \ufb01eld variational inference\napproximations (\u201cvariational Bayes\u201d), where most correlations are ruled out a priori. We track\nthe dominating posterior covariance directions inside our method, allowing them to change during\noptimization. While our double loop approach may be technically more demanding to implement,\nrelaxation as well as algorithm are characterized much better (convex problem; algorithm reducing to\nstandard computational primitives), running orders of magnitude faster. Beyond MRI, applications\ncould be to Bayesian inference over video streams, or to computational photography [11]. Our\napproach is parallelizable on several levels. This property is essential to even start projecting such\napplications: on the scale demanded by modern MRI applications, with practitioners being used to\nview images directly after acquisition, little else but highly parallelizable approaches are viable.\nLarge scale variational inference is reviewed and extended to complex-valued data in Section 2,\nlifted to non-Gaussian linear dynamical systems in Section 3, and the experimental design extension\nis given in Section 4. Results of a preliminary study on data from a Siemens 3T scanner are provided\nin Section 5, using a serial implementation.\n\n2 Large Scale Sparse Inference\n\nOur motivation is to improve MR image reconstruction, not by \ufb01nding a better estimation technique,\nbut by sampling data more economically. A latent MR image slice u \u2208 Cn (n pixels) is measured\nby a design matrix X \u2208 Cm\u00d7n: y = Xu + \u03b5 (\u03b5 \u223c N(0, \u03c32I) models noise). For Cartesian\nMRI, X = IS,\u00b7Fn, Fn the 2D fast Fourier transform, S \u2282 {1, . . . , n} the sampling pattern (which\npartitions into complete columns or rows: phase encodes, the atomic units of the design). Sparse\nreconstruction works by encoding super-Gaussian image statistics in a non-Gaussian prior, then\n\ufb01nding the posterior mode (MAP estimation): a convex quadratic program for the model employed\nhere. To improve the measurement design X itself, posterior information beyond (and independent\nof) its mode is required, chie\ufb02y posterior covariances.\nWe brie\ufb02y review [24], extending it to complex-valued u. The super-Gaussian image prior P (u) is\nadapted by placing potentials on absolute values |sj|, the posterior has the form\n\nP (u|y) \u221d N(y|Xu, \u03c32I)(cid:89)q\n\ne\u2212\u03c4j|sj /\u03c3|,\n\ns = Bu \u2208 Cq.\n\nj=1\n\nHere, B is a sparsity transform [24]. We use the C \u2192 R2 embedding, s = (sj), sj \u2208 R2,\nand norm potentials e\u2212\u03c4j(cid:107)sj /\u03c3(cid:107). Two main ideas lead to [24]. First, inference is relaxed to an\noptimization problem by lower-bounding the log partition function [7] (intuitively, each Laplace\npotential e\u2212\u03c4j(cid:107)sj /\u03c3(cid:107) is lower-bounded by a Gaussian-form potential of variance \u03b3j > 0), leading to\n\n\u03c6(\u03b3) = log |A| + h(\u03b3) + minu R(u, \u03b3), R := \u03c3\u22122(cid:0)(cid:107)y \u2212 Xu(cid:107)2 + sT \u0393\u22121s(cid:1) , \u03b3 = (\u03b3j), (1)\n\nh(\u03b3) = (\u03c4 2)T \u03b3. This procedure implies a Gaussian approximation Q(u|y) = N(u|u\u2217, \u03c32A\u22121)\nto P (u|y), with A = X H X + BT \u0393\u22121B and u\u2217 = u\u2217(\u03b3). The complex extension is formally\nsimilar to [24] (\u03c0 there is \u03b3\u22121 here): \u0393 := (diag \u03b3)\u2297I2 = diag(\u03b31, \u03b31, \u03b32, . . . )T , B := Borig\u2297I2,\nBorig the real-valued sparsity transform. Q(u|y) is \ufb01tted to P (u|y) by min\u03b3(cid:31)0 \u03c6: a convex problem\n[24]. Used within an automatic decision architecture, convexity and robustness of inference become\nassets that are more important than smaller bias after a lot of human expert attention.\n\nthe repeat time (between phase encodes), several slices are acquired in an interleaved fashion, separated by\nslice gaps to avoid crosstalk [17].\n\n2\n\n\fSecond, \u03c6(\u03b3) can be minimized very ef\ufb01ciently by a double loop algorithm [24]. The compu-\ntationally intensive log |A| term is concave in \u03b3\u22121. Upper-bounding it tangentially by the af\ufb01ne\nzT (\u03b3\u22121) \u2212 g\u2217(z) at outer loop (OL) update points, the resulting \u03c6z \u2265 \u03c6 decouples and is mini-\nmized much more ef\ufb01ciently in inner loops (ILs). min\u03b3(cid:31)0 \u03c6z leaves us with\n\n(cid:110)\n\u03c6z(u) = \u03c3\u22122(cid:107)y \u2212 Xu(cid:107)2 + 2(cid:88)\n\n(cid:111)\nj (|sj|)\nh\u2217\n\nj\n\n,\n\nmin\nu\n\nj (|sj|) := \u03c4j(zj + (|sj|/\u03c3)2)1/2,\nh\u2217\n\nadditional Laplace coupling potentials(cid:81)n\n\n(2)\na penalized least squares problem. At convergence, u\u2217 = EQ[u|y], \u03b3j \u2190 (zj + |s\u2217,j/\u03c3|2)1/2/\u03c4j.\nWe can use iteratively reweighted least squares (IRLS), each step of which needs a linear sys-\ntem to be solved of the structure of A. Re\ufb01tting z (OL updates) is much harder: z \u2190 (I \u2297\n1T ) diag\u22121(BA\u22121BT ) = (I \u2297 1T )(\u03c3\u22122VarQ[sj|y]). In terms of Gaussian (Markov) random\n\ufb01elds, the inner optimization needs posterior mean computations only, while OL updates require\nbulk Gaussian variances [21, 15]. The reason why the double loop algorithm is much faster than\nprevious approaches is that only few variance computations are required. The extension to complex-\nvalued u is non-trivial only when it comes to IRLS search direction computations (see Appendix).\nGiven multi-slice data (Xt, yt), t = 1, . . . , T , we can use an undirected hidden Markov model\nover image slices u = (ut) \u2208 CnT . By the stack-of-slices methodology, the likelihood poten-\ntials P (yt|ut) are independent, and P (ut) from above serves as single-node potential, based on\nst = But. If st\u2192 := ut \u2212 ut+1, the dependence between neighbouring slices is captured by\ni=1 e\u2212\u03c4c,i|(st\u2192)i/\u03c3|. The variational parameters \u03b3t at each\nnode are complemented by coupling parameters \u03b3t\u2192 \u2208 Rn\n+. The Gaussian Q(u|y), y = (yt),\nhas the same form as above with a huge A \u2208 CnT\u00d7nT . Inheriting the Markov structure, it is a\nGaussian linear dynamical system (LDS) with very high-dimensional states. How will an ef\ufb01cient\nextension of the double loop algorithm look like? The IL criterion \u03c6z should be coupled between\nneighbouring slices, by way of potentials on st\u2192. OL updates are more dif\ufb01cult to lift: we have to\napproximate marginal variances in a Gaussian LDS. We will do this by Kalman smoothing, approx-\nimating inversion in message computations (conversion from natural to moment parameters) by the\nLanczos algorithm.\nThe central role of Gaussian covariance for approximating non-Gaussian posteriors has not been\nemphasized much in machine learning, where if Bayesian computations are intractable, simpler\n\u201cvariational Bayesian\u201d concepts are routinely used, imposing factorization constraints on the poste-\nrior up front. While such constraints can be adjusted in light of the data, this is dif\ufb01cult and typically\nnot done. Factorization assumptions are a double-edged sword: they radically simplify implemen-\ntations, but result in non-convex algorithms, and half of the problem is left undone. Our approach\noffers an alternative: by using Lanczos on Q(u|y), we retain precisely the maximum-covariance\ndirections of intermediate \ufb01ts to the posterior, without running into combinatorial or non-convex\nproblems. Finally, we place more varied sparsity penalties on the in-plane dimensions [24] than on\nthe third one. This is justi\ufb01ed by voxels typically being larger and spaced with a gap in the third\ndimension, with partial volume effects reducing sparsity. Moreover, a non-local sparsity transform\nalong the third dimension would destroy the Markovian structure essential for ef\ufb01cient computation.\n\n3 Approximate Inference over Multiple Slices\n\nWe aim to extend the single slice method of [24] to the hidden Markov extension, thereby reusing\ncode whenever possible. The variational criterion is (1) with\n\nht(\u03b3t) + I{t\u02dct log | \u02dcA\u2190t + \u0393\u22121\npractice, we average over \u02dct. The algorithm is sketched in Algorithm 1.\n\nstants: log |A| =(cid:80)\n\nt\u2192| +(cid:80)\n\nt , uT\n\nAlgorithm 1 Double loop variational inference algorithm\n\nrepeat\n\nif \ufb01rst iteration then\n\nelse\n\nDefault-initialize z \u221d 1, u = 0.\nRun Kalman smoothing to determine Mt\u2192, and (in parallel) M\u2190t.\nDetermine node variances zt, pair variances zt\u2192, and log |A| from messages. Re\ufb01t upper\nbound \u03c6z to \u03c6 (tangent at \u03b3). Initialize u = u\u2217 (previous solution).\n\nend if\nrepeat\n\nDistributed IRLS to minimize min\u03b3 \u03c6z w.r.t. u.\nEach local update of ut entails solving a linear system (conjugate gradients).\n\nuntil u\u2217 = argminu \u03c6z converged\nUpdate \u03b3j = (zj + |s\u2217,j/\u03c3|2)1/2/\u03c4j.\n\nuntil outer loop converged\n\nFor reconstruction, we run parallel MAP estimation. Following [12], we smooth out the nondiffer-\nentiable l1 penalty by |sj/\u03c3| \u2248 (\u03b5 + |sj/\u03c3|2)1/2 for very small \u03b5 > 0, then use nonlinear conjugate\ngradients with Armijo line search. Nodes return with \u2207ut\u03c6z at the line minimum ut, the next search\ndirection is centrally determined and distributed (just a scalar has to be transferred). This is not the\nsame as centralized CG: line searches are distributed and not done on the global criterion.\nWe brie\ufb02y comment on how to approximate Kalman message passing by way of the Lanczos algo-\nrithm [8], full details are given in [22]. Gaussian (Markov) random \ufb01eld practitioners will appre-\nciate the dif\ufb01culties: there is no locally connected MRF structure, and the Q(u|y) are highly non-\nstationary, being \ufb01tted to a posterior with non-Gaussian statistics (edges in the image, etc). Message\npassing requires the inversion of a precision matrix A. The idea behind Lanczos approximations is\nPCA: if A \u2248 U\u039bU T , \u039b the l (cid:28) n smallest eigenvalues, U\u039b\u22121U T is the PCA approximation of\nA\u22121. With matrices A of certain spectral decay, this representation can be approximated by Lanc-\nzos (see [24, 22] for details). For a low rank PCA approximation of \u02dcAt\u2192, Mt\u2192 has the same rank\n(see Appendix), which allows to run Gaussian message passing tractably. In a parallel implementa-\ntion, the forward and backward \ufb01lter passes run in parallel, passing low rank messages (the rank km\nof these should be smaller than the rank kc for subsequent marginal covariance computations). On\na lower level, both matrix-vector multiplications with Xt (FFT) and reorthogonalizations required\nduring the Lanczos algorithm can easily be parallelized on commodity graphics hardware.\n\n4\n\n\f4 Sampling Optimization by Bayesian Experimental Design\n\nWith our multi-slice variational inference algorithm in place, we address sampling optimization\nby Bayesian sequential experimental design, following [24]. At slice t, the information gain score\n\u2206(X\u2217) := log |I +X\u2217CovQ[ut|y]X T\u2217 | is computed for a \ufb01xed number of phase encode candidates\nX\u2217 \u2208 Cd\u00d7n not yet in Xt, the score maximizer is appended, and a novel measurement is acquired\n(for the maximizer only). \u2206(X\u2217) depends primarily on the marginal posterior covariance matrix\nCovQ[ut|y], computed by Gaussian message passing just as variances in OL updates above (while a\nsingle value \u2206(X\u2217) can be estimated more ef\ufb01ciently, the dominating eigendirections of the global\ncovariance matrix seem necessary to approximate many score values for different candidates X\u2217).\nOnce messages have been passed, scores can be computed in parallel at different nodes. A purely\nsequential approach, extending one design Xt by one encode in each round, is not tractable. In\npractice, we extend several node designs Xt in each round (a \ufb01xed subset Cit \u2282 {1, . . . , T}; \u201cit\u201d the\nround number). Typically, Cit repeats cyclically. This is approximate, since candidates are scored\nindependently at each node. Certainly, Cit should not contain neighbouring nodes. In the interleaved\nstack-of-slices methodology, scan time is determined by the largest factor Xt (number of rows), so\nwe strive for balanced designs here.\nTo sum up, our adaptive design optimization algorithm starts with an initial variational inference\nphase for a start-up design (low frequencies only), then runs through a \ufb01xed number of design\nrounds. Each round starts with Gaussian message passing, based on which scores are computed at\nnodes t \u2208 Cit, new measurements are acquired, and designs Xt are extended. Finally, variational\ninference is run for the extended model, using a small number of OL iterations (only one in our\nexperiments). Time can be saved by basing the \ufb01rst OL update on the same messages and node\nmarginal covariances than the design score computations (neglecting their change through new phase\nencodes).\n\n5 Experiments\n\na further term(cid:81)\n\nWe present experimental results, comparing designs found by our Bayesian joint design optimization\nmethod against alternative choices on real MRI data. We use the model of Section 2, with the\nprior previously used in [24] (potentials of strength \u03c4a on wavelet coef\ufb01cients, of strength \u03c4r on\nCartesian \ufb01nite differences). While the MRI signal u is complex-valued, phase contributions are\nmostly erroneous, and reconstruction as well as design optimization are improved by multiplying\ni e\u2212(\u03c4i/\u03c3)|(cid:61)(ui)| into each single node prior potential, easily incorporated into the\ngeneric setup by appending I \u2297 \u03b4T\n2 to B. We focus on Cartesian MRI (phase encodes are complete\ncolumns3 in k-space): a more clinically relevant setting than spiral sampling treated in [24].\nWe use data of resolution 64\u00d764 (in-plane) to test our approach with a serial implementation. While\nthis is not a resolution of clinical relevance, a truly parallel implementation is required in order to\nrun our method at resolutions 256 \u00d7 256 or beyond: an important point for future work.\n\n5.1 Quality of Lanczos Variance Approximations\n\nWe begin with experiments to analyze the errors in Lanczos variance approximations. Recall from\n[24] that variances are underestimated. We work with a single slice of resolution 64 \u00d7 64, using\na design X of 30 phase encodes, running a single common OL iteration (default-initialized z),\ncomparing different ways of continuing from there: exact z computations (Cholesky decomposition\nof A) versus Lanczos approximations with different numbers of steps k. Results are in Figure 1.\nWhile the relative approximation errors are rather large uniformly, there is a clear structure to them:\nthe largest (and also the very smallest) true values zj are approximated signi\ufb01cantly more accurately\nthan smaller true values. This structure can be used to motivate why, in the presence of large errors\nover all coef\ufb01cients, our inference still works well for sparse linear models, indeed in some cases bet-\nter than if exact computations are used (Figure 1, upper right). The spectrum of A shows a roughly\nlinear decay, so that the largest and smallest eigenvalues (and eigenvectors) are well-approximated\n\n3Our data are sagittal head scans, where the frequency encode direction (along which oversampling is\n\npossible at no extra cost) is typically chosen vertically (the longer anatomic axis).\n\n5\n\n\fFigure 1: Lanczos approximations of Gaussian variances, at beginning of second OL iteration, 64 \u00d7 64\ndata (upper left). Spectral decay of inverse covariance matrix A roughly linear (upper middle).\nl2 recon-\nstruction error of posterior mean estimate after subsequent OL iterations, for exact variance computation vs.\nk = 250, 500, 750, 1500 Lanczos steps (upper right). Lower panel: Relative accuracy zj (cid:55)\u2192 zk,j/zj at be-\nginning of second OL iteration, separately for \u201ca\u201d sites (on wavelet coef\ufb01cients; red), \u201cr\u201d sites (on derivatives;\nblue), and \u201ci\u201d sites (on (cid:61)(u); green).\n\nby Lanczos, while the middle part of the spectrum is not penetrated. Contributions to the largest\nvalues zj come dominatingly from small eigenvalues (large eigenvalues of A\u22121), explaining their\nsmaller relative error. On the other hand, smaller values zj are strongly underestimated (zk,j (cid:28) zj),\nwhich means that the selective shrinkage effect underlying sparse linear models (shrink most co-\nef\ufb01cients strongly, but some not at all) is strengthened by these systematic errors. Finally, the IL\npenalties are \u03c4j(zj + |sj/\u03c3|2)1/2, enforcing sparsity more strongly for smaller zj. Therefore, Lanc-\nzos approximation errors lead to strengthened sparsity in subsequent ILs, but least so for sites with\nlargest true zj. As an educated guess, this effect might even compensate for the fact that Laplace\npotentials may not be sparse enough for natural images.\n\nJoint Design Optimization\n\n5.2\nWe use sagittal head scan data of resolution 64 \u00d7 64 in-plane, 32 slices, acquired on a Siemens\n3T scanner (phase direction anterior-posterior), see [22] for further details. We consider joint and\nindependent MAP reconstruction (for the latter, we run nonlinear CG separately for each slice), for\na number of different design choices: {Xt} optimized jointly by our method here [op-jt]; each\nXt optimized separately, by running the complex variant of [24] on slice ut [op-sp]; Xt = X for\nall t, with X optimized on the most detailed slice (number 16, Figure 2, row 2 middle) [op-eq];\nand encodes of each Xt drawn at random, from the density proposed in [12] [rd], respecting the\ntypical spectral decay of images [4] (all designs contain the 8 lowest-frequency encodes). Results\nfor rd are averaged over ten repetitions. For all setups but op-eq, Xt are different across t.\nHyperparameters are adjusted based on MAP reconstruction results for a \ufb01xed design picked ad hoc\n(\u03c4a = \u03c4r = 0.01, \u03c4i = 0.1 in-plane; \u03c4c = 0.08 between slices), then used for all design optimization\nand MAP reconstruction runs. We run the op-jt optimization with an odd-even schedule {Cit} (all\nodd (even) t \u2208 0, . . . , T \u2212 1 for odd (even) \u201cit\u201d); results for two other schedules of period four come\nout very similar, but require more running time. For variational inference, we run 6 OL iterations\nin the initial, 1 OL iteration in each design round, with up to 30 IL steps (ILs in design rounds\ntypically converged in 2\u20133 steps). The rank parameters (number of Lanczos steps)4 were km = 100,\nkc = 250 (here, ut has \u02dcn = 8192 real coef\ufb01cients). Results are given in Figure 2.\nFirst, across all designs, joint MAP reconstruction improves signi\ufb01cantly upon independent MAP\nreconstruction. This improvement is strongest by far for op-jt (see Figure 2, rows 3,4), which\nfor joint reconstruction improves on all other variants signi\ufb01cantly, especially with 16\u201330 phase\n\n4We repeated op-jt partly with km = 250, with very similar MAP reconstruction errors for the \ufb01nal\n\ndesigns, but signi\ufb01cantly longer run time.\n\n6\n\n1000200030004000500060007000800000.511.522.53Spectrum of A12345671.881.91.921.941.961.9822.022.04Outer loop iteration exactk=1500k=750k=500k=100\fFigure 2: Top row: l2 reconstruction errors (cid:107)| \u02c6uMAP| \u2212 |utrue|(cid:107) of MAP reconstruction for different measure-\nment designs. Left: joint MAP reconstruction; right: independent MAP reconstruction of each slice. op-jt:\n{Xt} optimized jointly; op-sp: Xt optimized separately for each slice; op-eq: Xt = X, optimized on\nslice 16; rd: Xt variable density drawn at random (averaged over 10 repetitions).\nRows 2\u20134: Images for op-jt (25 encodes), slices 15\u201317. Row 2: true images (range 0\u20130.35). Row 3: errors\njoint MAP. Row 4: errors indep. MAP (range 0\u20130.08).\n\nencodes, where scan time is reduced by a factor 2\u20134 (Nyquist sampling requires 64 phase encodes).\nop-eq does worst in this domain: with a model of dependencies between slices in place, it pays\n\n7\n\n10152025303540451234567Number phase encodesL2 reconstruction error op\u2212jtop\u2212spop\u2212eqrd(avg)10152025303540451234567Number phase encodesL2 reconstruction error op\u2212jtop\u2212spop\u2212eqrd(avg)\foff to choose different Xt for each slice. rnd does best from about 35 phase encodes on. While\nthis suboptimal behaviour of our optimization will be analyzed more closely in future work, it is our\nexperience so far that the gain in using greedy sequential Bayesian design optimization over simpler\nchoices is generally largest below 1/2 Nyquist.\n\n6 Conclusions\n\nWe showed how to implement MRI sampling optimization by Bayesian sequential experimental\ndesign, jointly over a stack of neighbouring slices, extending the single slice technique of [24].\nRestricting ourselves to undersampling of Cartesian encodes, our method can be applied in prac-\ntice whenever dense Cartesian sampling is well under control (sequence modi\ufb01cation is limited to\nskipping encodes). We exploit the hidden Markov structure of the model by way of a Lanczos\napproximation of Kalman smoothing. While the latter has been proposed for spatial statistics ap-\nplications [20, 25], it has not been used for non-Gaussian approximate inference before, nor in the\ncontext of sparsity-favouring image models or non-linear experimental design. Our method is a gen-\neral alternative to structured variational mean \ufb01eld approximations typically used for non-Gaussian\ndynamical systems, in that dominating covariances are tracked a posteriori, rather than eliminating\nmost of them a priori through factorization assumptions. In a \ufb01rst study, we obtain encouraging\nresults in the range below 1/2 Nyquist. In future work, we will develop a truly parallel implementa-\ntion, with which higher resolutions can be processed. We are considering extensions of our design\noptimization technology to 3D MRI5 and to parallel MRI with receiver coil arrays [19, 9], whose\ncombination with k-space undersampling can be substantially more powerful than each acceleration\ntechnique on its own [13].\n\nj (sj) = h\u2217\n\nt\u2192 + QT\n\nj = \u03c1jI2 + \u03ba2\n\nj = \u03bdsj(\u03bdsj)T , \u03bd := \u03b42\u03b4T\n\nAppendix\nj ((cid:107)sj(cid:107)), and the Hessians to solve for IRLS Newton directions\nFor norm potentials, h\u2217\ndo not have the form of A anymore. In order to understand this, note that we do not use complex\ncalculus here: s (cid:55)\u2192 |s| is not complex differentiable at any s \u2208 C. Rather, we use the C \u2192 R2\nembedding, then standard real-valued optimization for variables twice the size. If \u03b8j := (h\u2217\nj )(cid:48), \u03c1j :=\nj((cid:107)sj(cid:107)2I2 \u2212\nj )(cid:48)(cid:48) at (cid:107)sj(cid:107) (cid:54)= 0, then using \u2207sj(cid:107)sj(cid:107) = sj/(cid:107)sj(cid:107), we have \u2207\u2207sj h\u2217\n(h\u2217\n1 \u2212 \u03b41\u03b4T\nj ), \u03baj := (\u03b8j/(cid:107)sj(cid:107) \u2212 \u03c1j)1/2/(cid:107)sj(cid:107). Since (cid:107)sj(cid:107)2I2 \u2212 sjsT\n2 ,\nsjsT\nthe Hessian is X H X + BH (s)BT . If \u02c6s := ((diag \u03ba) \u2297 \u03bd)s, then for any v \u2208 R2q: H (s)v =\n((diag \u03c1)\u2297I2)v+((diag w)\u2297I2)\u02c6s, where wj := vT\nj \u02c6sj, j = 1, . . . , q, which shows how to compute\nHessian matrix-vector multiplications, thus to implement IRLS steps in the complex-valued case.\nRecall that messages are passed, alternating between \u02dcAt\u2192 and Mt\u2192 matrices. For a PCA approxi-\nt\u2192, Qt\u2192 \u2208 R\u02dcn\u00d7km orthonormal, Tt\u2192 tridiagonal (obtained by running\nmation \u02dcAt\u2192 \u2248 Qt\u2192Tt\u2192QT\nkm Lanczos steps for \u02dcAt\u2192), low rank algebra gives\nt\u2192, Vt\u2192 \u2208 R\u02dcn\u00d7km,\nMt\u2192 = M( \u02dcAt\u2192, \u0393\u22121\nm) by way of a Cholesky decomposition. Now, \u02dcA(t+1)\u2192 = At+1 + Vt\u2192V T\ncomputed in O(n k2\nt\u2192\nbecomes the precision matrix for the next Lanczos run: MVMs have additional complexity of\nO(n km). Given all messages, node covariances are PCA-approximated by running Lanczos on\n(t\u22121)\u2192 + V\u2190(t+1)V T\u2190(t+1) for kc iterations. Pair variances VarQ[st\u2192|y] are es-\nAt + V(t\u22121)\u2192V T\ntimated by running Lanczos on vectors of size 2\u02dcn (say for kc/2 iterations; the precision matrix is\ngiven in Section 3). More details are given in [22].\n\nt\u2192) = Qt\u2192(cid:0)T \u22121\n\nt\u2192\u0393t\u2192Qt\u2192(cid:1)\u22121\n\nQT\n\nt\u2192 = Vt\u2192V T\n\nAcknowledgments\n\nThis work is partly funded by the Excellence Initiative of the German research foundation (DFG). It is part of\nan ongoing collaboration with Rolf Pohmann, Hannes Nickisch and Bernhard Sch\u00a8olkopf, MPI for Biological\nCybernetics, T\u00a8ubingen, where data for this study has been acquired.\n\n5In 3D MRI, image volumes are acquired without slice selection, using phase encoding along two dimen-\n\nsions. There are no unmeasured slice gaps and voxels are isotropic, but scan time is much longer.\n\n8\n\n\f1st edition, 2004.\n\nReferences\n[1] M.A. Bernstein, K.F. King, and X.J. Zhou. Handbook of MRI Pulse Sequences. Elsevier Academic Press,\n\n[2] R. Bracewell. The Fourier Transform and Its Applications. McGraw-Hill, 3rd edition, 1999.\n[3] E. Cand`es, J. Romberg, and T. Tao. Robust uncertainty principles: Exact signal reconstruction from\n\nhighly incomplete frequency information. IEEE Trans. Inf. Theo., 52(2):489\u2013509, 2006.\n\n2009.\n\n[4] H. Chang, Y. Weiss, and W. Freeman. Informative sensing. Technical Report 0901.4275v1 [cs.IT], ArXiv,\n[5] D. Donoho. Compressed sensing. IEEE Trans. Inf. Theo., 52(4):1289\u20131306, 2006.\n[6] A. Garroway, P. Grannell, and P. Mans\ufb01eld. Image formation in NMR by a selective irradiative pulse. J.\n\nPhys. C: Solid State Phys., 7:L457\u2013L462, 1974.\n\n[7] M. Girolami. A variational method for learning sparse and overcomplete representations. N. Comp.,\n\n13:2517\u20132532, 2001.\n\n[8] G. Golub and C. Van Loan. Matrix Computations. Johns Hopkins University Press, 3rd edition, 1996.\n[9] M. A. Griswold, P. M. Jakob, R. M. Heidemann, M. Nittka, V. Jellus, J. Wang, B. Kiefer, and A. Haase.\nGeneralized autocalibrating partially parallel acquisitions (GRAPPA). Magn. Reson. Med., 47(6):1202\u2013\n10, 2002.\n\nImage formation by induced local interactions: Examples employing nuclear magnetic\n\n[10] P. Lauterbur.\n\nresonance. Nature, 242:190\u2013191, 1973.\n\n[11] A. Levin, W. Freeman, and F. Durand. Understanding camera trade-offs through a Bayesian analysis of\nlight \ufb01eld projections. In European Conference on Computer Vision, LNCS 5305, pages 88\u2013101. Springer,\n2008.\n[12] M. Lustig, D. Donoho, and J. Pauly. Sparse MRI: The application of compressed sensing for rapid MR\nimaging. Magn. Reson. Med., 85(6):1182\u20131195, 2007.\n\n[13] M. Lustig and J. Pauly. SPIR-iT: Iterative self consistent parallel imaging reconstruction from arbitrary\n\nk-space. Magn. Reson. Med., 2009. In print.\n\n[14] B. Madore, G. Glover, and N. Pelc. Unalising by Fourier-encoding the overlaps using the temporal\n\ndimension (UNFOLD), applied to cardiac imaging and fMRI. Magn. Reson. Med., 42:813\u2013828, 1999.\n\n[15] D. Malioutov, J. Johnson, and A. Willsky. Low-rank variance estimation in large-scale GMRF models. In\n\nICASSP, 2006.\n\n[16] G. Marseille, R. de Beer, M. Fuderer, A. Mehlkopf, and D. van Ormondt. Nonuniform phase-encode\n\ndistributions for MRI scan time reduction. J. Magn. Reson. B, 111(1):70\u201375, 1996.\n\n[17] D. McRobbie, E. Moore, M. Graves, and M. Prince. MRI: From Picture to Proton. Cambridge University\n\nPress, 2nd edition, 2007.\n\n[18] C. Mistretta, O. Wieben, J. Velikina, W. Block, J. Perry, Y. Wu, K. Johnson, and Y. Wu. Highly constrained\n\nbackprojection for time-resolved MRI. Magn. Reson. Med., 55:30\u201340, 2006.\n\n[19] K. Pruessmann, M. Weiger, M. Scheidegger, and P. Boesiger. SENSE: Sensitivity encoding for fast MRI.\n\nMagn. Reson. Med., 42:952\u2013962, 1999.\n\n[20] M. Schneider and A. Willsky. Krylov subspace algorithms for space-time oceanography data assimilation.\n\nIn IEEE International Geoscience and Remote Sensing Symposium, 2000.\n\n[21] M. Schneider and A. Willsky. Krylov subspace estimation. SIAM J. Comp., 22(5):1840\u20131864, 2001.\n[22] M. Seeger. Speeding up magnetic resonance image acquisition by Bayesian multi-slice adaptive com-\n\npressed sensing. Supplemental Appendix, 2010.\n\n[23] M. Seeger and H. Nickisch. Compressed sensing and Bayesian experimental design. In ICML 25, 2008.\n[24] M. Seeger, H. Nickisch, R. Pohmann, and B. Sch\u00a8olkopf. Bayesian experimental design of magnetic\n\nresonance imaging sequences. In NIPS 21, pages 1441\u20131448, 2009.\n\n[25] D. Treebushny and H. Madsen. On the construction of a reduced rank square-root Kalman \ufb01lter for\n\nef\ufb01cient uncertainty propagation. Future Gener. Comput. Syst., 21(7):1047\u20131055, 2005.\n\n[26] J. Tsao, P. Boesinger, and K. Pruessmann. k-t BLAST and k-t SENSE: Dynamic MRI with high frame\n\nrate exploting spatiotemporal correlations. Magn. Reson. Med., 50:1031\u20131042, 2003.\n\n[27] F. Wajer. Non-Cartesian MRI Scan Time Reduction through Sparse Sampling. PhD thesis, Delft University\n\n[28] J. Weaver, Y. Xu, D. Healy, and L. Cromwell. Filtering noise from images with wavelet transforms. Magn.\n\nof Technology, 2001.\n\nReson. Med., 21(2):288\u2013295, 1991.\n\n9\n\n\f", "award": [], "sourceid": 1038, "authors": [{"given_name": "Matthias", "family_name": "Seeger", "institution": null}]}