{"title": "Learning Conditional Deformable Templates with Convolutional Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 806, "page_last": 818, "abstract": "We develop a learning framework for building deformable templates, which play a fundamental role in many image analysis and computational anatomy tasks. Conventional methods for template creation and image alignment to the template have undergone decades of rich technical development. In these frameworks, templates are constructed using an iterative process of template estimation and alignment, which is often computationally very expensive. Due in part to this shortcoming, most methods compute a single template for the entire population of images, or a few templates for specific sub-groups of the data. In this work, we present a probabilistic model and efficient learning strategy that yields either universal or \\textit{conditional} templates, jointly with a neural network that provides efficient alignment of the images to these templates. We demonstrate the usefulness of this method on a variety of domains, with a special focus on neuroimaging. This is particularly useful for clinical applications where a pre-existing template does not exist, or creating a new one with traditional methods can be prohibitively expensive. Our code and atlases are available online as part of the VoxelMorph library at http://voxelmorph.csail.mit.edu.", "full_text": "Learning Conditional Deformable Templates\n\nwith Convolutional Networks\n\nAdrian V. Dalca\n\nCSAIL, MIT\nMGH, HMS\n\nMarianne Rakic\n\nD-ITET, ETH\nCSAIL, MIT\n\nJohn Guttag\nCSAIL, MIT\n\nMert R. Sabuncu\n\nECE and BME, Cornell\n\nadalca@mit.edu\n\nmrakic@mit.edu\n\nguttag@mit.edu\n\nmsabuncu@cornell.edu\n\nAbstract\n\nWe develop a learning framework for building deformable templates, which play\na fundamental role in many image analysis and computational anatomy tasks.\nConventional methods for template creation and image alignment to the template\nhave undergone decades of rich technical development. In these frameworks,\ntemplates are constructed using an iterative process of template estimation and\nalignment, which is often computationally very expensive. Due in part to this\nshortcoming, most methods compute a single template for the entire population\nof images, or a few templates for speci\ufb01c sub-groups of the data. In this work,\nwe present a probabilistic model and ef\ufb01cient learning strategy that yields either\nuniversal or conditional templates, jointly with a neural network that provides\nef\ufb01cient alignment of the images to these templates. We demonstrate the usefulness\nof this method on a variety of domains, with a special focus on neuroimaging.\nThis is particularly useful for clinical applications where a pre-existing template\ndoes not exist, or creating a new one with traditional methods can be prohibitively\nexpensive. Our code and atlases are available online as part of the VoxelMorph\nlibrary at http://voxelmorph.csail.mit.edu.\n\nIntroduction\n\n1\nA deformable template is an image that can be geometrically deformed to match images in a dataset,\nproviding a common reference frame. Templates are a powerful tool that enables the analysis\nof geometric variability. They have been used in computer vision [26, 37, 42], medical image\nanalysis [3, 21, 40, 50], graphics [44, 66], and time series signals [1, 73]. We are motivated by the\nstudy of anatomical variability in neuroimaging, where collections of scans are mapped to a common\ntemplate with anatomical and/or functional landmarks. However, the methods developed here are\napplicable to other domains.\nAnalysis with a deformable template is often done by computing a smooth deformation \ufb01eld that\naligns the template to another image. The deformation \ufb01eld can be used to derive a measure of the\ndifferences between the two images. Rapidly obtaining this \ufb01eld to a given template is by itself a\nchallenging task and the focus of extensive research.\nA template can be chosen as one of the images in a given dataset, but often these do not represent the\nstructural variability and complexity in the image collection, and can lead to biased and misleading\nanalyses [40]. If the template does not adequately represent dataset variability, such as the possible\nanatomy, it becomes challenging to accurately deform the template to some images. A good template\ntherefore minimizes the geometric distance to all images in a dataset. There has been extensive\nmethodological development for \ufb01nding such a central template [3, 21, 40, 50], but these methods\ninvolve costly optimization procedures and domain-speci\ufb01c heuristics, requiring extensive runtimes.\nFor complex 3D images such as MRI, this process can consume days to weeks. In practice, this leads\nto few templates being constructed, and researchers often use templates that are not optimal for their\ndataset. Our work makes it easy and computationally ef\ufb01cient to generate deformable templates.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fFigure 1: Conditional deformable templates generated by our method. Left: slices from 3D brain\ntemplates conditioned on age; Right: MNIST templates conditioned on class label.\n\nWhile deformable templates are powerful, a single template may be inadequate at capturing the\nvariability in a large dataset. Existing methods alleviate this problem by grouping subpopulations,\nusually along a single attribute, and computing separate templates for each group. This approach\nrelies on arbitrary decisions about the attributes and thresholds used for subdividing the dataset.\nFurthermore, each template is only constructed based on a subset of the data, thus exploiting fewer\nimages, leading to sub-optimal templates. Instead, we propose a learning-based approach that can\ncompute on-demand conditional deformable templates by leveraging the entire collection. Our\nframework enables the use of multiple attributes, continuous (e.g., age) or discrete (e.g., sex), to\ncondition the template on, without needing to apply arbitrary thresholding or subdividing a dataset.\nWe formulate template estimation as a learning problem and describe a novel method to tackle it.\n\n(1) We describe a probabilistic spatial deformation model based on diffeomorphisms. We then\ndevelop a general, end-to-end framework using convolutional neural networks that jointly synthe-\nsizes templates and rapidly provides the deformation \ufb01eld to any new image.\n\n(2) This framework also enables learning a conditional template function given instance attributes,\nsuch as the age and sex of the subject in an MRI. Once learned, this function enables rapid\nsynthesis of on-demand conditional templates. For example, it could construct a 3D brain MRI\ntemplate for 35 year old women.\n\n(3) We demonstrate the template construction method and its utility on a variety of datasets, in-\ncluding a large neuroimaging study. In addition, we show preliminary experiments indicating\ncharacteristics and interesting results of the model. For example, this formulation can be extended\nto learn image representations up to a deformation.\n\nConditional templates capture important trends related to attributes, and are useful for dealing with\nconfounders. For example, in studying disease impact, for some tasks it may be helpful to register\nscans to age-speci\ufb01c templates rather than one covering a wide age range.\n\n2 Related Works\n\n2.1 Spatial Alignment (Image Registration)\nSpatial alignment, or registration, between two images is a building block for estimation of deformable\ntemplates. Alignment usually involves two steps: a global af\ufb01ne transformation, and a deformable\ntransformation (as in many optical \ufb02ow applications). In this work we focus on, and make use of,\ndeformable transformations.\nThere is extensive work in deformable image registration methods [5, 6, 7, 10, 19, 28, 68, 72, 74].\nConventional frameworks optimize a regularized dense deformation \ufb01eld that matches one image\nwith the other [7, 68]. Diffeomorphic transforms are toplogy preserving and invertible, and have been\nwidely used in computational neuroanatomy analysis [6, 5, 10, 13, 14, 32, 41, 55, 59, 70, 74]. While\nextensively studied, conventional registration algorithms require an optimization for every pair of\nimages, leading to long runtimes in practice.\nRecently proposed learning based registration methods offer a signi\ufb01cant speed-up at test time [8,\n9, 12, 17, 18, 23, 47, 46, 61, 65, 71]. These methods learn a network that computes the deformation\n\ufb01eld, either in a supervised (using ground truth deformations), unsupervised (using classical energy\nfunctions), or semi-supervised setting. These algorithms have been used for registering an image\nto an existing template. However, in many realistic scenarios, a template is not readily available,\nfor example in a clinical study that uses a speci\ufb01c scan protocol. We build on these ideas in our\nlearning strategy, but jointly estimate a registration network and a conditional deformable template in\nan unsupervised setting. In parallel, independent work, Weber et al. [64] propose a learning-based\n\n2\n\n\fframework for diffeomorphic joint temporal alignment of time-series data called DTAN. DTAN\ngeneralizes to test data, outperfoming other joint alignment tools for time-series tasks.\nOptical \ufb02ow methods are closely related to image registration, \ufb01nding a dense displacement \ufb01eld\nfor a pair of 2D images. Similar to registration, classical approaches solve an optimization problem,\noften using variational methods [11, 35, 67]. Learning-based optical \ufb02ow methods use convolutional\nneural networks to learn the dense displacement \ufb01elds [2, 25, 36, 38, 60, 69].\n\n2.2 Template Construction\nDeformable templates, or atlases, are widely used in computational anatomy. Speci\ufb01cally, the\ndeformation \ufb01elds from this template to individual images are often carefully analyzed to understand\npopulation variability. The template is usually constructed through an iterative procedure based on a\ncollection of images or volumes. First, an initial template is chosen, such as an example image or\na pixel-wise average across all images. Next, all images are aligned (registered) to this template, a\nbetter template is estimated from aligned images through averaging, and the process is iterated until\nconvergence [3, 21, 40, 50, 63]. Since the above procedure requires many iterations involving many\ncostly (3D) pairwise registrations, atlas construction runtimes are often prohibitive.\nA single population template can be insuf\ufb01cient at capturing complex variability. Current methods\noften subdivide the population to build multiple atlases. For example, in neuroimaging, some methods\nbuild different templates for different age groups, requiring rigid discretization of the population\nand prohibiting each template from using all information across the collection. Images can also\nbe clustered and a template optimized for each cluster, requiring a pre-set number of clusters [63].\nSpecialized methods have also been developed that tackle a particular variability of interest. For\nexample, spatiotemporal brain templates have been developed using specialized registration pipelines\nand explicit modelling of brain degeneration with time [22, 31, 48], requiring signi\ufb01cant domain\nknowledge, manual anatomical segmentations, and signi\ufb01cant computational resources. We build\non the intuitions of these methods, but propose a general framework that can learn conditional\ndeformable templates for any given set of attributes. Speci\ufb01cally, our strategy learns a single network\nthat levarges shared information across the entire dataset and can output different templates as a\nfunction of sets of attributes, such as age, sex, and disease state. The conditional function learned by\nour model generates unbiased population templates for a speci\ufb01c con\ufb01guration of the attributes.\nOur model can be used to study the population variation with respect to certain attributes it was trained\non, such as age in neuroimaging. In recent literature on deep probabilistic models, several papers \ufb01nd\nand explore latent axes of important variability in the dataset [4, 15, 30, 33, 43, 51]. Our model can\nalso be used to build conditional geometric templates based on such latent information, as we show\nin our experiments. In this case, our model can be seen as learning meaningful image representations\nup to a geometric deformation. However, in this paper we focus on observed (measured) attributes,\nwith the goal of explicitly capturing variability that is often a source of confounding.\n\n3 Methods\n\nWe \ufb01rst present a generative model that describes the formation of images through deformations from\nan unknown conditional template. We describe a learning approach that uses neural networks and\ndiffeomorphic transforms to jointly estimate the global template and a network that rapidly aligns it\nto each image.\n\n3.1 Probabilistic model\n\nLet xi be a data sample, such as a 2D image, a 3D volume like an MRI scan, or a time series. For the\nrest of this section, we use images and volumes as an example, but the development applies broadly\nto many data types. We assume we have a dataset X = {xi}, and model each image as a spatial\ndeformation \u03c6vi of a global template t. Each transform \u03c6vi is parametrized by the random vector vi.\nWe consider a model of a conditional template t = f\u03b8t(a), a function of attribute vector a,\nparametrized by global parameters \u03b8t. For example, a can encode a class label or phenotypical\ninformation associated with medical scans, such as age and sex. In cases where no such conditioning\ninformation is available or of interest, this formulation reduces to a standard single template for the\nentire dataset: t = t\u03b8t, where \u03b8t can represent the pixel intensity values to be estimated.\n\n3\n\n\fWe estimate the deformable template parameters \u03b8t and the deformation \ufb01elds for every data point\nusing maximum likelihood. Letting V = {vi} and A = {ai},\n\n\u02c6\u03b8t, \u02c6V = arg max\n\n\u03b8t,V log p\u03b8t(V|X ,A) = arg max\n\n\u03b8t,V log p\u03b8t(X|V;A) + log p(V),\n\n(1)\n\nwhere the \ufb01rst term captures the likelihood of the data and deformations, and the second term controls\na prior over the deformation \ufb01elds.\nDeformations. While the method described in this paper applies to a range of deformation\nparametrization v, we focus on diffeomorphisms. Diffeomorphic deformations are invertible\nand differentiable, thus preserving topology. Speci\ufb01cally, we treat v as a stationary velocity\n\ufb01eld [5, 17, 32, 45, 46, 57], although time-varying \ufb01elds are also possible.\nIn this setup, the\ndeformation \ufb01eld \u03c6v is de\ufb01ned through the following ordinary differential equation:\n\n\u2202\u03c6(t)\nv\n\u2202t\n\n= v(\u03c6(t)\n\nv ),\n\n(2)\n\nwhere \u03c6(0) = Id is the identity transformation and t is time. We can obtain the \ufb01nal deformation\n\ufb01eld \u03c6(1) by integrating the stationary velocity \ufb01eld v over t = [0, 1]. We compute this integration\nthrough scaling and squaring, which has been shown to be ef\ufb01ciently implementable in automatic\ndifferentiation platforms [18, 45].\nWe model the velocity \ufb01eld prior p(V) to encourage desirable deformation properties. Speci\ufb01cally, we\n\ufb01rst assume that deformations are smooth, for example to maintain anatomical consistency. Second,\nwe assume that population templates are unbiased, restricting deformation statistics. Letting uv be\nthe spatial displacement for \u03c6v = Id + uv, and \u2207ui be its spatial gradient,\n\nand \u00afuv = 1/n(cid:80)\n\nwhere N (\u00b7; \u00b5, \u03a3) is the multivariate normal distribution with mean \u00b5 and covariance \u03a3,\ni uvi. We let \u03a3\u22121 = L, where L = \u03bbdD \u2212 \u03bbaC is (a relaxation of) the Laplacian\nof a neighborhood graph de\ufb01ned on the pixel grid, with the graph degree matrix D and the pixel\nneighbourhood adjacency matrix C [17]. Using this formulation, we obtain\n\nN (uvi; 0\n\u00af\n\n, \u03a3u)\n\n(3)\n\ni\n\np(V) \u221d exp{\u2212\u03b3(cid:107)\u00afuv(cid:107)2}(cid:89)\n\nlog p(V) = \u2212\u03b3(cid:107)\u00afu(cid:107)2 \u2212(cid:88)\n\ni\n\n(cid:88)\n\ni\n\n\u03bbd(cid:107)ui(cid:107)2 +\n\nd\n2\n\n(cid:107)\u2207ui(cid:107)2 + const\n\n\u03bba\n2\n\n(4)\n\nwhere d is the neighbourhood degree. The \ufb01rst term encourages a small average deformation across\nthe dataset, encouraging a central, unbiased template. The second and third terms encourage templates\nthat minimize deformation size and smoothness, respectively, and \u03b3, \u03bbd and \u03bba are hyperparameters.\nData Likelihood. The data likelihood p(xi|vi, ai) can be adapted to the application domain. For\nimages, we often adopt a simple additive Gaussian model coupled with a deformable template:\n\np(xi|vi; ai) = N (xi; f\u03b8t(ai) \u25e6 \u03c6vi, \u03c32I),\n\n(5)\nwhere \u25e6 represents a spatial warp, and \u03c32 represents additive image noise. However, in some datasets,\ndifferent likelihoods are more appropriate. For example, due to the spatial variability of contrast and\nnoise in MRIs, likelihood models that result in normalized cross correlation loss functions have been\nwidely shown to lead to more robust results, and such models can be used with our framework [6].\n\n3.2 Neural Network Model\nTo solve the maximum likelihood formulation (1) given the model instantiations speci\ufb01ed above,\nwe design a network g\u03b8(xi, ai) = (vi, t) that takes as input an image and an attribute vector to\ncondition the template on (this could be empty for global templates). The network can be effectively\nseen as having two functional parts. The \ufb01rst, gt,\u03b8t(ai) = t, produces the conditional template. The\nsecond, gv,\u03b8v (t, xi) = vi, takes in the template and a data point, and outputs the most likely velocity\n\ufb01eld (and hence deformation) between them. By learning the optimal parameters \u02c6\u03b8 = {\u02c6\u03b8t, \u02c6\u03b8v}, we\nestimate a global network that simultaneously provides a deformable (conditional) template and its\ndeformation to a datapoint. Figure 2 provides an overview schematic of the proposed network.\n\n4\n\n\fFigure 2: Overview. The network takes as input an image and an optional attribute vector. The upper\nnetwork gt,\u03b8t(\u00b7) outputs a template, which is then registered with the input image by the second\nnetwork gv,\u03b8v (\u00b7). The loss function, derived from the negative log likelihood of the generative model,\nleverages the template warped into t \u25e6 \u03c6vi.\n\nWe optimize the neural network parameters \u03b8 using stochastic gradient algorithms, and minimize the\nnegative maximum likelihood (1) for image xi:\nL(\u03b8t, \u03b8v; vi, xi, ai) = \u2212 log p\u03b8(vi, xi; ai) = \u2212 log p\u03b8(xi|vi; ai) \u2212 log p\u03b8(vi)\n\n(cid:88)\n\ni\n\n(cid:88)\n\ni\n\n= \u2212 1\n\n2\u03c32(cid:107)xi \u2212 gt,\u03b8t(ai) \u25e6 \u03c6vi(cid:107)2 \u2212 \u03b3(cid:107)\u00afu(cid:107)2 \u2212 \u03bbd\n\nd\n2\n\n(cid:107)ui(cid:107)2 +\n\n\u03bba\n2\n\n(cid:107)\u2207ui(cid:107)2 + const,\n\n(6)\n\nwhere gt,\u03b8t(ai) yields the template at iteration i, and vi = gv,\u03b8v (t\u03b8t,i, xi).\nThe use of stochastic gradients to update the networks enables us to learn templates faster than\nconventional methods by avoiding the need to compute \ufb01nal deformations at each iteration. Intuitively,\nwith every iteration the network learns to output a template, optionally conditioned on the attribute\ndata, that can be smoothly and invertably warped to every image in the dataset.\nWe implement the template network gt,\u03b8t(\u00b7) with two versions, depending on whether we are\nestimating an unconditional or conditional template. The \ufb01rst, conditional version gt,\u03b8t(ai) consists\nof a decoder that takes as input the attribute data ai, and outputs the template t. The decoder\nincludes a fully connected layer, followed by several blocks of upsampling, convolutional, and ReLu\nactivation layers. The second, unconditional version gt,\u03b8t has no inputs and simply consists of a\nlearnable parameter at each pixel. The registration network gv,\u03b8v (t, xi) takes as input two images t\nand xi and outputs a stationary velocity \ufb01eld vi, and is designed as a convolutional U-Net like\narchitecture [62] with the design used in recent registration literature [9]. To compute the loss (6),\nwe compute the deformation \ufb01eld \u03c6vi from vi using differentiable scaling and squaring integration\nlayers [17, 45], and the warped template t \u25e6 \u03c6vi using spatial transform layers. We approximate\nk=K\u2212c uk,\nwhere uk is the displacement at iteration k, K is the current iteration, and c is usually set to 100 in\nour experiments. Speci\ufb01c network design parameters depend on the application domain, and are\nincluded in the supplementary materials.\n\nthe average deformation \u00afu in the loss function using a weighted running average \u00afu \u223c(cid:80)K\n\n3.3 Test-time Inference of Template and Deformations.\n\nGiven a trained network, we obtain a (potentially conditional) template \u02c6t directly from net-\nwork gt,\u03b8t(ai) by a single forward pass given input ai. For each test input image xi, the deformation\n\ufb01elds themselves are often of interest for analysis or prediction. The network also provides the\ndeformation \u02c6\u03c6 \u02c6vi, where \u02c6vi = gv,\u03b8v (\u02c6t, xi).\nOften times, the inverse deformation, which takes the image to the template space, is also de-\n\u22121\nsired. Using a stationary velocity \ufb01eld representation, obtaining this inverse deformation \u03c6\nis\nv\neasy to compute by integrating the negative velocity \ufb01eld using the same scaling and squaring\nlayer: \u03c6\n\n\u22121\nv = \u03c6\u2212v [5, 18, 56].\n\n5\n\nLoss (\u2112)deformationfield (\ud835\udf53\ud835\udf53\ud835\udc97\ud835\udc97\ud835\udc56\ud835\udc56)SpatialTransform\ud835\udc97\ud835\udc97\ud835\udc56\ud835\udc56integrationlayervelocityfieldLearned template (\ud835\udc95\ud835\udc95)\ud835\udc54\ud835\udc54\ud835\udc63\ud835\udc63,\ud835\udf3d\ud835\udf3d\ud835\udc63\ud835\udc63\ud835\udc95\ud835\udc95,\ud835\udc99\ud835\udc99\ud835\udc56\ud835\udc56Image (\ud835\udc99\ud835\udc99\ud835\udc56\ud835\udc56)Moved (\ud835\udc95\ud835\udc95\u2218\ud835\udf53\ud835\udf53\ud835\udc97\ud835\udc97\ud835\udc56\ud835\udc56)Attribute (\ud835\udc82\ud835\udc82\ud835\udc56\ud835\udc56)\fFigure 3: MNIST examples (1) MNIST digits from D-scale-rot; (2) templates conditioned on\nclass (vertical axis) and scale (horizontal axis) on MNIST D-scale, learned with our model, and (3)\nwith a decoder-only baseline model; (4) conditional templates learned with our model on the MNIST\nD-class-scale-rot dataset for the digit 3 and a variety of scaling and rotation values.\n\n4 Experiments\nWe present two main sets of experiments. The \ufb01rst set uses image-based datasets MNIST and Google\nQuickDraw, with the goal of providing a picture of the capabilities of our method. While deformable\ntemplates in these data are not a real-world application, these are often-studied datasets that provide a\nplatform to analyze aspects of deformable templates.\nIn contrast, the second set of experiments is designed to demonstrate the utility of our method on a\ntask of practical importance, analysis of brain MRI. We demonstrate that our method can produce\nhigh quality deformable templates in the context of realistic data, and that conditional deformable\ntemplates capture important anatomical variability related to age.\n\n4.1 Experiment on Benchmark Datasets\nData. We use the MNIST dataset, consisting of small 2D images of hand-written digits [49] and 11\nclasses from the Google QuickDraw dataset [39], a collection of categorized drawings contributed\nby players in an online drawing game. To evaluate our method\u2019s ability to construct conditional\ntemplates that accurately capture the impact of attributes on which the templates are conditioned,\nwe generate examples in which the initial images are scaled and rotated (Figure 3). Speci\ufb01cally, we\nuse an image scaling factor in the range 0.7 \u2212 1.3 and rotations in the range 0 to 360 degrees. We\nlearn different models using either the original dataset involving different classes (D-class), the\ndataset with simulated scale effects (D-class-scale), and the rotations (D-class-scale-rot).\nWhile simulated image changes are obvious to an observer, during training we assume we know the\nattributes that cause the changes, but do not a priori model their effect on the images. This simulates,\nfor example, the correlation between age and changing size of anatomical structures. The goal is to\nunderstand whether the proposed method is able to learn the relationship between the attribute and\nthe geometrical variability in the dataset, and hence produce a function for generating on-demand\ntemplates conditioned on the attributes. The datasets are split into train, validation and test sets.\n\n4.1.1 Validation\nIn the \ufb01rst experiment, we evaluate our ability to construct suitable conditional templates.\nHyperparameters. Model hyperparameters have intuitive effects on the sharpness of templates, the\nspatial smoothness of registration \ufb01elds, and the quality of alignments. In practical settings, they\nshould be chosen based on the desired goal of a given task. In these experiments, we tune hyperparam-\neters by visually assessing deformations on validation data, starting from \u03b3 = 0.01, \u03bbd = 0.001, \u03bba =\n0.01, and \u03c3 = 1 for training on the D-class data. We found that once a hyperparameter was chosen\nfor one dataset, only minor tuning was required for other experiments.\nEvaluation criteria. Template construction is an ill-posed problem, and the utility of resulting\ntemplates depends on the desired task. We report a series of measures to capture properties of the\n(cid:80)\nresulting templates and deformations. Our \ufb01rst two quantitative evaluation criteria relate to centrality,\nfor which we computed the norm of the mean displacement \ufb01eld (cid:107)\u00afu(cid:107)2 and the average displacement\ni (cid:107)ui(cid:107)2. Next, we illustrate \ufb01eld regularity per image class, and average intensity image\nsize 1\nn\nagreement (via MSE). These metrics capture aspects about the deformation \ufb01elds, rather than solely\nintrinsic properties of the templates. They need to be evaluated together - otherwise, deformation \ufb01elds\ncan lead to perfectly matching the image and template while being very irregular and geometrically\n\n6\n\n\fFigure 4: Example convergence. Convergence of two condi-\ntional template models. Left: model trained on digit-only at-\ntribute on D-class for epochs [0, 1, 2, 5, 100]. Right: model trained\non D-class-rot, with all three attributes given as input to the\nmodel for epochs [0, 50, 75, 150, 1020], and randomly sampled dig-\nits [1, 2, 4, 8, 2, 8, 7, 8, 6, 6], rotations, and scales.\n\nFigure 5: Example deforma-\ntions. Each row shows: class\ntemplate, example class im-\nage, template warped to this\ninstance, instance warped to\nmatch the template, and the de-\nformation \ufb01eld.\n\nFigure 6: Quantitative measures. Centrality and average\ndeformation norm for templates generated by our model and\nthe baselines on the D-class variant of MNIST. We \ufb01nd that\nour models yield more central templates. Additional measures\ncan be found in supplementary Figure 6.\n\nVolume\n\ntrends.\nFigure 7:\nChange in volume of ventri-\ncles and hippocampi of\nthe\nage-conditional brain templates.\n\nmeaningless, or can be perfectly smooth (zero displacement) at the cost of poor image matching. To\ncapture \ufb01eld regularity, we compute the Jacobian matrix J\u03c6(p) = \u2207\u03c6(p) \u2208 R3\u00d73, which captures\nthe local properties of \u03c6 around voxel pixel p. Low values indicate irregular deformation \ufb01elds,\nand |J\u03c6(p)| \u2264 0 indicate pixels that are not topology-preserving. Jacobian determinants near 1\nrepresent very smooth \ufb01elds. We use held-out test subjects for these measures.\nBaselines. We compare our templates with templates built by choosing exemplar data as templates,\nand by training only a decoder of the given attributes using MSE loss and the same network architec-\nture as the template network gt,\u03b8t(\u00b7). This latter baseline can be seen as differing from our method in\nthat it minimizes a pixel-wise intensity difference as opposed to a geometric difference (deformation).\nResults. Figure 3 illustrates conditional templates using our model and the decoder, and results from\nour model on the full MNIST dataset using all attributes. Our method produces sharp, central tem-\nplates that are plausible digits and are a smoothly deformable to other digits. Example deformations\nare shown in Figure 5. Supplementary Figures 13 contains similar results for the QuickDraw dataset.\nFigure 4 illustrates convergence behavior for two models, showing that the conditional attributes are\nable to capture complicated geometric differences. Templates early in the learning process share\nappearance features across attributes, indicating that the network leverages common information\nacross the dataset. The \ufb01nal templates enable signi\ufb01cantly smaller deformations than early ones,\nindicating better representation of the conditional variability. As one would expect, more epochs are\nnecessary for convergence of the model with more attributes.\nFigures 6 and 9 show template measures indicating that our conditional templates are more central\nand require smaller deformations than the baselines when registered with test set digits. We also\n\ufb01nd that our method and exemplar-based templates can perform well for both deformation metrics,\nand comparable to each other. Speci\ufb01cally, all deformations are \"smooth\" (no negative Jacobian\ndeterminants) and image differences are visually imperceptible. We underscore that changes in\nthe hyperparameters will produce slightly different trade offs for these measures. At the presented\nparameters, our method produces templates and deformation \ufb01elds with slightly smoother deformation\n\ufb01elds coming at a slight cost in MSE for some digits, while the baselines can lead to slightly irregular\n\ufb01elds to force images to match. The decoder baselines underperforms in all metrics. These results\nindicate that both our methods and instance-based templates can lead to accurate and smooth\ndeformation \ufb01elds, while our methods produce more central template requiring smaller deformations.\n\n7\n\n0123456789Digit024681012Displacement Field: Centralityinstances baselinedecoder baselineours0123456789Digit2468Displacement Field: Norminstances baselinedecoder baselineours2030405060708090Age510152025Volume (x1000 voxels)Right HippocampusLeft HippocampusRight VentricleLeft Ventricle\fFigure 8: Slices from Learned 3D Brain MRI templates. Left: single unconditional template\nrepresenting the entire population. Right: conditional age templates for brain MRI for ages 15 to 90,\nillustrating, for example, growth of the ventricles, also evident in a supplementary video.\n\n4.1.2 Analysis\nIn this section, we explore further characteristics and utility provided by our model using the MNIST\nand QuickDraw dataset. Due to space limitations, the \ufb01gures are given in the supplementary material.\nVariability and Synthesis. Conditional deformable templates capture an image representation up to\na spatial deformation. Deformation \ufb01elds from templates to images are often studied to characterize\nand visualize variability in a population. To illustrate this point, we demonstrate the main within-class\nvariability by \ufb01nding the principal components of the velocity \ufb01elds using PCA. Figure 10 illustrates\nsynthesized digits by warping the template along these components capturing handwriting variability\nin natural digits.\nIn another variability experiment, we treat scale as a confounder and validate that our method reduces\nconfounding effects. Figure 10 illustrates that a model learned with a scale attribute is able to learn\nprincipal geometric variability with reduced scale effects compared to one not using this attribute.\nMissing Attributes. We test the ability of our conditional framework to learn templates that gen-\neralize to sparsely observed attributes in two regimes. First, for the D-class-scale dataset, we\ncompletely hold out scaling factors in the range 0.9 \u2212 1.1 for images of digits 3, 4 and 5. In the\nsecond regime, we hold out all but 5 instances of the digit 5. Figure 11 indicates that for each regime,\nour method produces reasonable templates even for the held out attributes, indicating that it leverages\nthe entire dataset in learning the conditioning function.\nLatent attributes. In this experiment, we compare our method to recent probabilistic models in\nthe situation where attributes are not known a priori. To do this, we add an encoder from the input\nimage xi to the latent attribute, and as a baseline train an autoencoder with the same encoder and\ndecoder architectures as used in our model, and the MSE loss. We train on the D-class dataset\nwith a bottleneck of a single neuron simulating the single unknown attribute. While more powerful\nautoencoders can lead to better reconstructions of the inputs, our goal is to explore the main mode of\nvariability captured by each method. As Figure 12 shows, this autoencoder produces much fuzzier\nlooking reconstructions, whereas our approach tends to reproduce the template for the given digit\nimage. This is because the autoencoder learns representations to minimize pixel intensity differences,\nwhereas our approach learns representations that minimize spatial deformations. In other words, out\nmodel learns image representations with respect to minimal geometric deformations.\n4.2 Experiment 2: Neuroimaging\nIn this section, we illustrate unconditional and conditional 3D brain MRI templates learned by our\nmethod, with the goal of showing its utility for the realistic task of neuroimaging analysis. We \ufb01rst\nshow that our method ef\ufb01ciently synthesizes a unconditional population template, comparable to\nexisting ones that require signi\ufb01cantly more computation to construct. Secondly, we show that our\nlearned conditional template function captures anatomical variability as a function of age.\nData. We use a large dataset of 7829 T1-weighted 3D brain MRI scans from publicly available\ndatasets: ADNI [58], OASIS [52], ABIDE [24], ADHD200 [54], MCIC [29], PPMI [53], HABS [16],\n\n8\n\n\fand Harvard GSP [34]. All scans are pre-processed with standard steps, including resampling to 1mm\nisotropic voxels, af\ufb01ne spatial normalization and anatomical segmentations using FreeSurfer [27].\nFinal images are cropped to 160 \u00d7 192 \u00d7 224. The segmentation maps are only used for analysis.\nThe dataset is split into 7329 training volumes, 250 validation and 250 test. This dataset was \ufb01rst\nassembled and used in [20]\nMethods. All of the training data was used to build an unconditional template. We also learned\na conditional template function using age and sex attributes, using only the ADNI and ABIDE\ndatasets which provide this information. Following neuroimaging literature, we use a likelihood\nmodel resulting in normalized cross correlation data loss. Training the model requires approximately\na day on a Titan XP GPU. However, obtaining a conditional template from a learned network requires\nless than a second.\nEvaluation. For a given template, we obtain anatomical segmentations by warping 100 training\nimages to the template and averaging their warped segmentations. For the conditional template, we\ndo this for 7 ages equally spaced between 15 and 90 years old, for both males and females. We\n\ufb01rst analyze anatomical trends with respect to conditional attributes. We then measured registration\naccuracy facilitated by each template with the test set via the widely used volume overlap measure\nDice (higher is better). To compare volume overlap via the Dice metric, as a baseline we use the\natlas and segmentation masks available online from recent literature [8]. To test the volume overlap\nwith anatomical segmentations of test data, we warp each template (unconditional, appropriate age\nand sex conditional template, and baseline) to each of 100 test subjects, and propagate the template\nsegmentations. We computed the mean Dice score of all subjects and 30 FreeSurfer labels.\nResults. Figures 8 and 14, and a supplementary video1, illustrate example slices from the uncondi-\ntional and conditional 3D templates. The ventricles and hippocampi are known to have signi\ufb01cant\nanatomical variation as a function of age, which can be seen in the images. Figure 7 illustrates their\nvolume measured using our atlases as a function of age, showing the growth of the ventricle volumes\nand shrinkage of the hippocampus. Figure 15 illustrates representative results.\nWe \ufb01nd Dice scores of 0.800 (\u00b10.110) for the unconditional template, 0.795 (\u00b10.116) for the\nconditional model, and 0.731 (\u00b10.153) for the baseline, with this difference roughly consistent\nfor each anatomical structure. We emphasize that these numbers may not be directly compared,\nsince the baseline atlas (and segmentations) were obtained using a different process involving an\nexternal dataset and manual labeling, while our template was built with our training images (and their\nFreeSurfer segmentations to obtain template labels). Nonetheless, these visualizations and analyses\nare encouraging, suggesting that our method provides anatomical templates for brain MRI that enable\nbrain segmentation.\n\n5 Discussion and Conclusion\nDeformable templates play an important role in image analysis tasks. In this paper, we present a\nmethod for automatically learning such templates from data. Our method is both less labor intensive\nand computationally more ef\ufb01cient than traditional data-driven methods for learning templates.\nMoreover, our method can be used to learn a function that can quickly generate templates conditioned\nupon sets of attributes. It can for example generate a template for the brains of 75 year old women\nin under a second. To our knowledge, this is the only general method for producing templates\nconditioned on available attributes.\nIn a series of experiments on popular image datasets, we demonstrate that our method produces high\nquality unconditional templates. We show that it can be used to construct conditional templates that\naccount for confounders such as scaling and rotation. In a second set of experiments, we demonstrate\nthe practical utility of our methods by applying it to a large data set of brain MRI images. We show\nthat with about a day of training, we can produce unconditional atlases similar in quality and utility to\na widely used atlas that took weeks to produce. We also show that the method can be used to rapidly\nproduce conditional atlases that are consistent with known age-related changes in anatomy.\nIn the future, we plan to explore downstream consequences of being able to easily and quickly\nproduce conditional templates for medical imaging studies. In addition, we believe that our model\ncan be used for other tasks, such as estimating unknown attributes (e.g., age) for a given patient,\nwhich would be an interesting direction for further exploration.\n\n1Video can be found at http://voxelmorph.mit.edu/atlas_creation/\n\n9\n\n\fAcknowledgments\n\nThis research was funded by NIH grants R01LM012719, R01AG053949, and 1R21AG050122, NSF\nCAREER 1748377, NSF NeuroNex Grant 1707312, and Wistron Corporation.\n\nReferences\n[1] Waleed H Abdulla, David Chow, and Gary Sin. Cross-words reference template for dtw-based\nspeech recognition systems. In TENCON 2003. Conference on convergent technologies for\nAsia-Paci\ufb01c region, volume 4, pages 1576\u20131579. IEEE, 2003.\n\n[2] Aria Ahmadi and Ioannis Patras. Unsupervised convolutional neural networks for motion\nestimation. In Image Processing (ICIP), 2016 IEEE International Conference on, pages 1629\u2013\n1633. IEEE, 2016.\n\n[3] St\u00e9phanie Allassonni\u00e8re, Yali Amit, and Alain Trouv\u00e9. Towards a coherent statistical framework\nfor dense deformable template estimation. Journal of the Royal Statistical Society: Series B\n(Statistical Methodology), 69(1):3\u201329, 2007.\n\n[4] Martin Arjovsky, Soumith Chintala, and L\u00e9on Bottou. Wasserstein gan. arXiv preprint\n\narXiv:1701.07875, 2017.\n\n[5] J. Ashburner. A fast diffeomorphic image registration algorithm. Neuroimage, 38(1):95\u2013113,\n\n2007.\n\n[6] Brian B Avants, Charles L Epstein, Murray Grossman, and James C Gee. Symmetric diffeo-\nmorphic image registration with cross-correlation: evaluating automated labeling of elderly and\nneurodegenerative brain. Medical image analysis, 12(1):26\u201341, 2008.\n\n[7] R. Bajcsy and S. Kovacic. Multiresolution elastic matching. Computer Vision, Graphics, and\n\nImage Processing, 46:1\u201321, 1989.\n\n[8] G. Balakrishnan, A. Zhao, M.R. Sabuncu, J. Guttag, and A.V. Dalca. An unsupervised learning\nmodel for deformable medical image registration. In Proceedings of the IEEE Conference on\nComputer Vision and Pattern Recognition, pages 9252\u20139260, 2018.\n\n[9] G. Balakrishnan, A. Zhao, M.R. Sabuncu, J. Guttag, and A.V. Dalca. Voxelmorph: A learning\nframework for deformable medical image registration. IEEE Transactions on Medical Imaging,\n2018.\n\n[10] M.F. Beg et al. Computing large deformation metric mappings via geodesic \ufb02ows of diffeomor-\n\nphisms. Int. J. Comput. Vision, 61:139\u2013157, 2005.\n\n[11] Thomas Brox et al. High accuracy optical \ufb02ow estimation based on a theory for warping.\n\nEuropean Conference on Computer Vision (ECCV), pages 25\u201336, 2004.\n\n[12] Xiaohuan Cao, Jianhua Yang, Jun Zhang, Dong Nie, Minjeong Kim, Qian Wang, and Dinggang\nShen. Deformable image registration based on similarity-steered cnn regression. In International\nConference on Medical Image Computing and Computer-Assisted Intervention, pages 300\u2013308.\nSpringer, 2017.\n\n[13] Yan Cao, Michael I Miller, Raimond L Winslow, and Laurent Younes. Large deformation diffeo-\nmorphic metric mapping of vector \ufb01elds. IEEE transactions on medical imaging, 24(9):1216\u2013\n1230, 2005.\n\n[14] Can Ceritoglu, Kenichi Oishi, Xin Li, Ming-Chung Chou, Laurent Younes, Marilyn Albert,\nConstantine Lyketsos, Peter CM van Zijl, Michael I Miller, and Susumu Mori. Multi-contrast\nlarge deformation diffeomorphic metric mapping for diffusion tensor imaging. Neuroimage,\n47(2):618\u2013627, 2009.\n\n[15] Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. Infogan:\nInterpretable representation learning by information maximizing generative adversarial nets. In\nAdvances in neural information processing systems, pages 2172\u20132180, 2016.\n\n[16] A. Dagley et al. Harvard aging brain study: dataset and accessibility. NeuroImage, 2015.\n[17] A.V. Dalca, G. Balakrishnan, J. Guttag, and M.R Sabuncu. Unsupervised learning for fast prob-\nabilistic diffeomorphic registration. In International Conference on Medical Image Computing\nand Computer-Assisted Intervention, pages 729\u2013738. Springer, 2018.\n\n10\n\n\f[18] A.V. Dalca, G. Balakrishnan, J. Guttag, and M.R. Sabuncu. Unsupervised learning of probabilis-\ntic diffeomorphic registration for images and surfaces. Medical Image Analysis, 57:226\u2013236,\n2019.\n\n[19] A.V. Dalca et al. Patch-based discrete registration of clinical brain images. In International\n\nWorkshop on Patch-based Techniques in Medical Imaging, pages 60\u201367. Springer, 2016.\n\n[20] A.V. Dalca, J. Guttag, and M.R Sabuncu. Anatomical priors in convolutional networks for\nunsupervised biomedical segmentation. In Proceedings of the IEEE Conference on Computer\nVision and Pattern Recognition, pages 9290\u20139299, 2018.\n\n[21] Brad Davis, Peter Lorenzen, and Sarang C Joshi. Large deformation minimum mean squared\nerror template estimation for computational anatomy. In ISBI, volume 4, pages 173\u2013176, 2004.\n[22] Brad C Davis, P Thomas Fletcher, Elizabeth Bullitt, and Sarang Joshi. Population shape\nregression from random design data. International journal of computer vision, 90(2):255\u2013266,\n2010.\n\n[23] B.D. de Vos et al. End-to-end unsupervised deformable image registration with a convolutional\n\nneural network. In DLMIA, pages 204\u2013212. 2017.\n\n[24] A. Di Martino et al. The autism brain imaging data exchange: towards a large-scale evaluation\n\nof the intrinsic brain architecture in autism. Molecular psychiatry, 19(6):659\u2013667, 2014.\n[25] A. Dosovitskiy et al. Flownet: Learning optical \ufb02ow with convolutional networks. 2015.\n[26] Pedro F Felzenszwalb and Joshua D Schwartz. Hierarchical matching of deformable shapes. In\n2007 IEEE Conference on Computer Vision and Pattern Recognition, pages 1\u20138. IEEE, 2007.\n\n[27] B. Fischl. Freesurfer. Neuroimage, 62(2):774\u2013781, 2012.\n[28] Ben Glocker et al. Dense image registration through mrfs and ef\ufb01cient linear programming.\n\nMedical image analysis, 12(6):731\u2013741, 2008.\n\n[29] R.L. Gollub et al. The mcic collection: a shared repository of multi-modal, multi-site brain\nimage data from a clinical investigation of schizophrenia. Neuroinformatics, 11(3):367\u2013388,\n2013.\n\n[30] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning. MIT Press, 2016.\n[31] Piotr A Habas, Kio Kim, Francois Rousseau, Orit A Glenn, A James Barkovich, and Colin\nStudholme. A spatio-temporal atlas of the human fetal brain with application to tissue seg-\nmentation. In International Conference on Medical Image Computing and Computer-Assisted\nIntervention, pages 289\u2013296. Springer, 2009.\n\n[32] Monica Hernandez, Matias N Bossa, and Salvador Olmos. Registration of anatomical images\nusing paths of diffeomorphisms parameterized with stationary vector \ufb01eld \ufb02ows. International\nJournal of Computer Vision, 85(3):291\u2013306, 2009.\n\n[33] Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick,\nShakir Mohamed, and Alexander Lerchner. beta-vae: Learning basic visual concepts with a\nconstrained variational framework. ICLR, 2(5):6, 2017.\n\n[34] A. J Holmes et al. Brain genomics superstruct project initial data release with structural,\n\nfunctional, and behavioral measures. Scienti\ufb01c data, 2, 2015.\n\n[35] Berthold K.P. Horn and Brian G. Schunck. Determining optical \ufb02ow. 1980.\n[36] Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, and Thomas\nBrox. Flownet 2.0: Evolution of optical \ufb02ow estimation with deep networks. In IEEE conference\non computer vision and pattern recognition (CVPR), volume 2, page 6, 2017.\n\n[37] Anil K. Jain, Yu Zhong, and Sridhar Lakshmanan. Object matching using deformable templates.\n\nIEEE Transactions on pattern analysis and machine intelligence, 18(3):267\u2013278, 1996.\n\n[38] J Yu Jason, Adam W Harley, and Konstantinos G Derpanis. Back to basics: Unsupervised learn-\ning of optical \ufb02ow via brightness constancy and motion smoothness. In European Conference\non Computer Vision, pages 3\u201310. Springer, 2016.\n\n[39] Jonas Jongejan, Henry Rowley, Takashi Kawashima, Jongmin Kim, and Nick Fox-Gieg. The\n\nquick, draw!-ai experiment. Mount View, CA, accessed Feb, 17:2018, 2016.\n\n[40] Sarang Joshi, Brad Davis, Matthieu Jomier, and Guido Gerig. Unbiased diffeomorphic atlas\n\nconstruction for computational anatomy. NeuroImage, 23:S151\u2013S160, 2004.\n\n11\n\n\f[41] Sarang C Joshi and Michael I Miller. Landmark matching via large deformation diffeomor-\n\nphisms. IEEE transactions on image processing, 9(8):1357\u20131370, 2000.\n\n[42] Jaechul Kim, Ce Liu, Fei Sha, and Kristen Grauman. Deformable spatial pyramid matching for\nfast dense correspondences. In Proceedings of the IEEE Conference on Computer Vision and\nPattern Recognition, pages 2307\u20132314, 2013.\n\n[43] D.P. Kingma and M. Welling. Auto-encoding variational bayes. ICLR, 2014.\n[44] Iasonas Kokkinos, Michael M Bronstein, Roee Litman, and Alex M Bronstein. Intrinsic shape\ncontext descriptors for deformable shapes. In 2012 IEEE Conference on Computer Vision and\nPattern Recognition, pages 159\u2013166. IEEE, 2012.\n\n[45] J. Krebs, T. Mansi, B. Mailh\u00e9, N. Ayache, and H. Delingette. Unsupervised probabilistic\ndeformation modeling for robust diffeomorphic registration. Deep Learning in Medical Image\nAnalysis, 2018.\n\n[46] Julian Krebs, Herv\u00e9 e Delingette, Boris Mailh\u00e9, Nicholas Ayache, and Tommaso Mansi. Learn-\ning a probabilistic model for diffeomorphic registration. IEEE transactions on medical imaging,\n2019.\n\n[47] Julian Krebs et al. Robust non-rigid registration through agent-based action learning. In Medical\nImage Computing and Computer-Assisted Intervention (MICCAI), pages 344\u2013352, Cham, 2017.\nSpringer International Publishing.\n\n[48] Maria Kuklisova-Murgasova, Paul Aljabar, Latha Srinivasan, Serena J Counsell, Valentina Doria,\nAhmed Serag, Ioannis S Gousias, James P Boardman, Mary A Rutherford, A David Edwards,\net al. A dynamic 4d probabilistic atlas of the developing brain. NeuroImage, 54(4):2750\u20132763,\n2011.\n\n[49] Yann LeCun. The mnist database of handwritten digits. http://yann. lecun. com/exdb/mnist/,\n\n1998.\n\n[50] Jun Ma, Michael I Miller, Alain Trouv\u00e9, and Laurent Younes. Bayesian template estimation in\n\ncomputational anatomy. NeuroImage, 42(1):252\u2013261, 2008.\n\n[51] Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow, and Brendan Frey. Adver-\n\nsarial autoencoders. arXiv preprint arXiv:1511.05644, 2015.\n\n[52] D.S. Marcus et al. Open access series of imaging studies (oasis): cross-sectional mri data in\nyoung, middle aged, nondemented, and demented older adults. Journal of cognitive neuro-\nscience, 19(9):1498\u20131507, 2007.\n\n[53] K. Marek et al. The parkinson progression marker initiative (ppmi). Progress in neurobiology,\n\n95(4):629\u2013635, 2011.\n\n[54] M.P. Milham et al. The adhd-200 consortium: a model to advance the translational potential of\n\nneuroimaging in clinical neuroscience. Front. Sys. Neurosci, 6:62, 2012.\n\n[55] Michael I Miller, M Faisal Beg, Can Ceritoglu, and Craig Stark. Increasing the power of\nfunctional maps of the medial temporal lobe by using large deformation diffeomorphic metric\nmapping. Proceedings of the National Academy of Sciences, 102(27):9685\u20139690, 2005.\n\n[56] M. Modat, I.J.A. Simpson, M.J. Cardoso, D.M. Cash, N. Toussaint, N.C. Fox, and S. Ourselin.\nSimulating neurodegeneration through longitudinal population analysis of structural and diffu-\nsion weighted mri data. Medical Image Computing and Computer-Assisted Intervention, LNCS\n8675:57\u201364, 2014.\n\n[57] Marc Modat, Pankaj Daga, M Jorge Cardoso, Sebastien Ourselin, Gerard R Ridgway, and John\nAshburner. Parametric non-rigid registration using a stationary velocity \ufb01eld. In 2012 IEEE\nWorkshop on Mathematical Methods in Biomedical Image Analysis, pages 145\u2013150. IEEE,\n2012.\n\n[58] S.G. Mueller et al. Ways toward an early diagnosis in Alzheimer\u2019s disease: the Alzheimer\u2019s\n\nDisease Neuroimaging Initiative (ADNI). Alzheimer\u2019s & Dementia, 1(1):55\u201366, 2005.\n\n[59] Kenichi Oishi, Andreia Faria, Hangyi Jiang, Xin Li, Kazi Akhter, Jiangyang Zhang, John T\nHsu, Michael I Miller, Peter CM van Zijl, Marilyn Albert, et al. Atlas-based whole brain white\nmatter analysis using large deformation diffeomorphic metric mapping: application to normal\nelderly and alzheimer\u2019s disease participants. Neuroimage, 46(2):486\u2013499, 2009.\n\n12\n\n\f[60] Anurag Ranjan and Michael J Black. Optical \ufb02ow estimation using a spatial pyramid network.\nIn IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 2, page 2.\nIEEE, 2017.\n\n[61] M.M. Roh\u00e9 et al. SVF-Net: Learning deformable image registration using shape matching. In\n\nMICCAI, pages 266\u2013274. Springer, 2017.\n\n[62] O. Ronneberger et al. U-net: Convolutional networks for biomedical image segmentation. In\n\nMICCAI, pages 234\u2013241. Springer, 2015.\n\n[63] Mert R Sabuncu, Serdar K Balci, Martha E Shenton, and Polina Golland. Image-driven popula-\ntion analysis through mixture modeling. IEEE transactions on medical imaging, 28(9):1473\u2013\n1487, 2009.\n\n[64] Ron A Shapira Weber, Matan Eyal, Nicki Skafte, Oren Shriki, and Oren Freifeld. Diffeomorphic\n\ntemporal alignment networks. In NeurIPS: Neural Information Processing Systems, 2019.\n\n[65] H. Sokooti et al. Nonrigid image registration using multi-scale 3d convolutional neural networks.\n\nIn MICCAI, pages 232\u2013239, Cham, 2017. Springer.\n\n[66] Carsten Stoll, Zachi Karni, Christian R\u00f6ssl, Hitoshi Yamauchi, and Hans-Peter Seidel. Template\n\ndeformation for point cloud \ufb01tting. In SPBG, pages 27\u201335, 2006.\n\n[67] D. Sun et al. Secrets of optical \ufb02ow estimation and their principles. IEEE Conf. on Computer\n\nVision and Pattern Recognition (CVPR), pages 2432\u20132439, 2010.\n\n[68] J.P. Thirion.\n\nImage matching as a diffusion process: an analogy with maxwell\u2019s demons.\n\nMedical Image Analysis, 2(3):243\u2013260, 1998.\n\n[69] Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. Deep end2end\nvoxel2voxel prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern\nRecognition Workshops, pages 17\u201324, 2016.\n\n[70] Tom Vercauteren et al. Diffeomorphic demons: Ef\ufb01cient non-parametric image registration.\n\nNeuroImage, 45(1):S61\u2013S72, 2009.\n\n[71] X. Yang et al. Quicksilver: Fast predictive image registration\u2013a deep learning approach.\n\nNeuroImage, 158:378\u2013396, 2017.\n\n[72] BT Thomas Yeo, Mert R Sabuncu, Tom Vercauteren, Daphne J Holt, Katrin Amunts, Karl\nZilles, Polina Golland, and Bruce Fischl. Learning task-optimal registration cost functions for\nlocalizing cytoarchitecture and function in the cerebral cortex. IEEE transactions on medical\nimaging, 29(7):1424\u20131441, 2010.\n\n[73] Aras Yurtman and Billur Barshan. Automated evaluation of physical therapy exercises using\nmulti-template dynamic time warping on wearable sensor signals. Computer methods and\nprograms in biomedicine, 117(2):189\u2013207, 2014.\n\n[74] M. Zhang et al. Frequency diffeomorphisms for ef\ufb01cient image registration. In IPMI, pages\n\n559\u2013570. Springer, 2017.\n\n13\n\n\f", "award": [], "sourceid": 405, "authors": [{"given_name": "Adrian", "family_name": "Dalca", "institution": "MIT, HMS"}, {"given_name": "Marianne", "family_name": "Rakic", "institution": "MIT/ETH Z\u00fcrich"}, {"given_name": "John", "family_name": "Guttag", "institution": "Massachusetts Institute of Technology"}, {"given_name": "Mert", "family_name": "Sabuncu", "institution": "Cornell"}]}