{"title": "Signal Estimation Under Random Time-Warpings and Nonlinear Signal Alignment", "book": "Advances in Neural Information Processing Systems", "page_first": 675, "page_last": 683, "abstract": "While signal estimation under random amplitudes, phase shifts, and additive noise is studied frequently, the problem of estimating a deterministic signal under random time-warpings has been relatively unexplored. We present a novel framework for estimating the unknown signal that utilizes the action of the warping group to form an equivalence relation between signals. First, we derive an estimator for the equivalence class of the unknown signal using the notion of Karcher mean on the quotient space of equivalence classes. This step requires the use of Fisher-Rao Riemannian metric and a square-root representation of signals to enable computations of distances and means under this metric. Then, we define a notion of the center of a class and show that the center of the estimated class is a consistent estimator of the underlying unknown signal. This estimation algorithm has many applications: (1)registration/alignment of functional data, (2) separation of phase/amplitude components of functional data, (3) joint demodulation and carrier estimation, and (4) sparse modeling of functional data. Here we demonstrate only (1) and (2): Given signals are temporally aligned using nonlinear warpings and, thus, separated into their phase and amplitude components. The proposed method for signal alignment is shown to have state of the art performance using Berkeley growth, handwritten signatures, and neuroscience spike train data.", "full_text": "Signal Estimation Under Random Time-Warpings\n\nand Nonlinear Signal Alignment\n\nSebastian Kurtek\n\nAnuj Srivastava Wei Wu\n\nDepartment of Statistics\n\nFlorida State University, Tallahassee, FL 32306\n\nskurtek,anuj,wwu@stat.fsu.edu\n\nAbstract\n\nWhile signal estimation under random amplitudes, phase shifts, and additive noise\nis studied frequently, the problem of estimating a deterministic signal under ran-\ndom time-warpings has been relatively unexplored. We present a novel framework\nfor estimating the unknown signal that utilizes the action of the warping group to\nform an equivalence relation between signals. First, we derive an estimator for\nthe equivalence class of the unknown signal using the notion of Karcher mean on\nthe quotient space of equivalence classes. This step requires the use of Fisher-Rao\nRiemannian metric and a square-root representation of signals to enable compu-\ntations of distances and means under this metric. Then, we de\ufb01ne a notion of\nthe center of a class and show that the center of the estimated class is a consis-\ntent estimator of the underlying unknown signal. This estimation algorithm has\nmany applications: (1) registration/alignment of functional data, (2) separation of\nphase/amplitude components of functional data, (3) joint demodulation and car-\nrier estimation, and (4) sparse modeling of functional data. Here we demonstrate\nonly (1) and (2): Given signals are temporally aligned using nonlinear warpings\nand, thus, separated into their phase and amplitude components. The proposed\nmethod for signal alignment is shown to have state of the art performance using\nBerkeley growth, handwritten signatures, and neuroscience spike train data.\n\n1 Introduction\n\nConsider the problem of estimating signal using noisy observation under the model:\n\nf (t) = cg(a t \u2212 \u03c6) + e(t) ,\n\nwhere the random quantities c \u2208 R is the scale, a \u2208 R is the rate, \u03c6 \u2208 R is the phase shift, and\ne(t) \u2208 R is the additive noise. There has been an elaborate theory for estimation of the underlying\nsignal g, given one or several observations of the function f. Often one assumes that g takes a\nparametric form, e.g. a superposition of Gaussians or exponentials with different parameters, and\nestimates these parameters from the observed data [12]. For instance, the estimation of sinusoids or\nexponentials in additive Gaussian noise is a classical problem in signal and speech processing. In\nthis paper we consider a related but fundamentally different estimation problem where the observed\nfunctional data is modeled as: for t \u2208 [0, 1],\n\nfi(t) = cig(\u03b3i(t)) + ei,\n\n(1)\nHere \u03b3i : [0, 1] \u2192 [0, 1] are diffeomorphisms with \u03b3i(0) = 0 and \u03b3i(1) = 1. The fis represent ob-\nservations of an unknown, deterministic signal g under random warpings \u03b3i, scalings ci and vertical\ntranslations ei \u2208 R. (A more general model would be to use full functions for additive noise but that\nrequires further discussion due to identi\ufb01ability issues. Thus, we restrict to the above model in this\npaper.) This problem is interesting because in many situations, including speech, SONAR, RADAR,\n\ni = 1, 2, . . . , n ,\n\n1\n\n\fFigure 1: Separation of phase and amplitude variability in function data.\n\nNMR, fMRI, and MEG applications, the noise can actually affect the instantaneous phase of the sig-\nnal, resulting in an observation that is a phase (or frequency) modulation of the original signal. This\nproblem is challenging because of the nonparametric, random nature of the warping functions \u03b3is. It\nseems dif\ufb01cult to be able to recover g where its observations have been time-warped nonlinearly in\na random fashion. The past papers have either restricted to linear warpings (e.g. \u03b3i(t) = ait\u2212 \u03c6i) or\nknown g (e.g. g(t) = cos(t)). It turns out that without any further restrictions on \u03b3is one can recover\ng only up to an arbitrary warping function. This is easy to see since g \u25e6 \u03b3i = (g \u25e6 \u03b3) \u25e6 (\u03b3\u22121 \u25e6 \u03b3i)\nfor any warping function \u03b3. (As described later, the warping functions are restricted to be automor-\nphisms of a domain and, hence, form a group.) Under an additional condition related to the mean of\n(inverses of) \u03b3is, we can reach the exact signal g, as demonstrated in this paper.\nIn fact, this model describes several related, some even equivalent, problems but with distinct\napplications:\n\nProblem 1: Joint Phase Demodulation and Carrier Estimation: One can view this problem as\nthat of phase (or frequency) demodulation but without the knowledge of the carrier signal g. Thus,\nit becomes a problem of joint estimation of the carrier signal (g) and phase demodulation (\u03b3\u22121\n)\nof signals that share the same carrier. In case the carrier signal g is known, e.g. g is a sinusoid,\nthen it is relatively easier to estimate the warping functions using dynamic time warping or other\nestimation theoretic methods [15, 13]. So, we consider problem of estimation of g from {fi} under\nthe model given in Eqn. 1.\n\ni\n\nProblem 2: Phase-Amplitude Separation: Consider the set of signals shown in the top-left panel\nof Fig. 1. These functions differ from each other in both heights and locations of their peaks and\nvalleys. One would like to separate the variability associated with the heights, called the amplitude\nvariability, from the variability associated with the locations, termed the phase variability. Although\nthis problem has been studied for almost two decades in the statistics community, see e.g.\n[7,\n9, 4, 11, 8], it is still considered an open problem. Extracting the amplitude variability implies\ntemporally aligning the given functions using nonlinear time warping, with the result shown in the\nbottom right. The corresponding set of warping functions, shown in the top right, represent the\nphase variability. The phase component can also be illustrated by applying these warping functions\nto the same function, also shown in the top right. The main reason for separating functional data into\nthese components is to better preserve the structure of the observed data, since a separate modeling\nof amplitude and phase variability will be more natural, parsimonious and ef\ufb01cient. It may not be\nobvious but the solution to this separation problem is intimately connected to the estimation of g in\nEqn. 1.\n\nProblem 3: Multiple Signal/Image Registration: The problem of phase-amplitude separation is\nintrinsically same as the problem of joint registration of multiple signals. The problem here is: Given\na set of observed signals {fi} estimate the corresponding points in their domains. In other words,\n\n2\n\n\u22123\u22122\u2212101230.40.50.60.70.80.911.11.21.300.20.40.60.8100.10.20.30.40.50.60.70.80.91\u22123\u22122\u2212101230.40.50.60.70.80.911.11.21.3\u22123\u22122\u2212101230.40.50.60.70.80.911.11.21.3\u22123\u22122\u2212101230.20.30.40.50.60.70.80.911.11.2\u22123\u22122\u2212101230.40.50.60.70.80.911.11.21.3original datamean +/- STD before warpingmean +/- STD after warpingwarping functionsphase componentsamplitude components\fi\n\nwhat are the \u03b3is such that, for any t0, fi(\u03b3\u22121\n(t0)) correspond to each other. The bottom right panels\nof Fig. 1 show the registered signals. Although this problem is more commonly studied for images,\nits one-dimensional version is non-trivial and helps understand the basic challenges. We will study\nthe 1D problem in this paper but, at least conceptually, the solutions extend to higher-dimensional\nproblems also.\nIn this paper we provide the following speci\ufb01c contributions. We study the problem of estimating\ng given a set {fi} under the model given in Eqn. 1 and propose a consistent estimator for this\nproblem, along with the supporting asymptotic theory. Also, we illustrate the use of this solution in\nautomated alignment of sets of given signals. Our framework is based on an equivalence relation\nbetween signals de\ufb01ned as follows. Two signals, are deemed equivalent if one can be time-warped\ninto the other; since the warping functions form a group, the equivalence class is an orbit of the\nwarping group. This relation partitions the set of signals into equivalence classes, and the set of\nequivalence classes (orbits) forms a quotient space. Our estimation of g is based on two steps. First,\nwe estimate the equivalence class of g using the notion of Karcher mean on quotient space which,\nin turn, requires a distance on this quotient space. This distance should respect the equivalence\nstructure, i.e. the distance between any elements should be zero if and only if they are in the same\nclass. We propose to use a distance that results from the Fisher-Rao Riemannian metric. This\nmetric was introduced in 1945 by C. R. Rao [10] and studied rigorously in the 70s and 80s by\nAmari [1], Efron [3], Kass [6], Cencov [2], and others. While those earlier efforts were focused\non analyzing parametric families, we use the nonparametric version of the Fisher-Rao Riemannian\nmetric in this paper. The dif\ufb01culty in using this metric directly is that it is not straightforward to\ncompute geodesics (remember that geodesics lengths provide the desired distances). However, a\nsimple square-root transformation converts this metric into the standard L2 metric and the distance\nis obtainable as a simple L2 norm between the square-root forms of functions. Second, given an\nestimate of the equivalence class of g, we de\ufb01ne the notion of a center of an orbit and use that to\nderive an estimator for g.\n\n2 Background Material\n\nWe introduce some notation. Let \u0393 be the set of orientation-preserving diffeomorphisms of the unit\ninterval [0, 1]: \u0393 = {\u03b3 : [0, 1] \u2192 [0, 1]|\u03b3(0) = 0, \u03b3(1) = 1, \u03b3 is a diffeo}. Elements of \u0393 form a\n(cid:82) 1\ngroup, i.e. (1) for any \u03b31, \u03b32 \u2208 \u0393, their composition \u03b31 \u25e6 \u03b32 \u2208 \u0393; and (2) for any \u03b3 \u2208 \u0393, its inverse\n\u03b3\u22121 \u2208 \u0393, where the identity is the self-mapping \u03b3id(t) = t. We will use (cid:107)f(cid:107) to denote the L2 norm\n0 |f (t)|2dt)1/2.\n(\n2.1 Representation Space of Functions\n\n(cid:189)\n\n(cid:112)|x|\n\n0\n\nx/\n\nLet f be a real-valued function on the interval [0, 1]. We are going to restrict to those f that are\nabsolutely continuous on [0, 1]; let F denote the set of all such functions. We de\ufb01ne a mapping:\nif |x| (cid:54)= 0\nQ : R \u2192 R according to: Q(x) \u2261\notherwise . Note that Q is a continuous map.\nFor the purpose of studying the function f, we will represent it using a square-root velocity function\n(SRVF) de\ufb01ned as q : [0, 1] \u2192 R, where q(t) \u2261 Q( \u02d9f (t)) = \u02d9f (t)/\n| \u02d9f (t)|. It can be shown that if\nthe function f is absolutely continuous, then the resulting SRVF is square integrable. Thus, we will\nde\ufb01ne L2([0, 1], R) (or simply L2) to be the set of all SRVFs. For every q \u2208 L2 there exists a function\nf (unique up to a constant, or a vertical translation) such that the given q is the SRVF of that f. If\nwe warp a function f by \u03b3, the SRVF of f \u25e6 \u03b3 is given by: \u02dcq(t) =\n\u02d9\u03b3(t).\n\u221a\nWe will denote this transformation by (q, \u03b3) = (q \u25e6 \u03b3)\n\n\u221a\ndt (f\u25e6\u03b3)(t)\ndt (f\u25e6\u03b3)(t)| = (q \u25e6 \u03b3)(t)\n| d\n\n(cid:113)\n\n(cid:112)\n\n\u02d9\u03b3.\n\nd\n\n2.2 Elastic Riemannian Metric\nDe\ufb01nition 1 For any f \u2208 F and v1, v2 \u2208 Tf (F), where Tf (F) is the tangent space to F at f, the\nFisher-Rao Riemannian metric is de\ufb01ned as the inner product:\n\n(cid:104)(cid:104)v1, v2(cid:105)(cid:105)f =\n\n1\n4\n\n1\n| \u02d9f (t)| dt .\n\n(2)\n\n(cid:90) 1\n\n\u02d9v1(t) \u02d9v2(t)\n\n0\n\n3\n\n\fThis metric has many fundamental advantages, including the fact that it is the only Riemannian\nmetric that is invariant to the domain warping [2]. This metric is somewhat complicated since it\nchanges from point to point on F, and it is not straightforward to derive equations for computing\ngeodesics in F. However, a small transformation provide an enormous simpli\ufb01cation of this task.\nThis motivates the use of SRVFs for representing and aligning elastic functions.\nLemma 1 Under the SRVF representation, the Fisher-Rao Riemannian metric becomes the stan-\ndard L2 metric.\n\nThis result can be used to compute the distance dF R between any two functions by computing the\nL2 distance between the corresponding SRVFs, that is, dF R(f1, f2) = (cid:107)q1 \u2212 q2(cid:107). The next question\nis: What is the effect of warping on dF R? This is answered by the following result of isometry.\nLemma 2 For any two SRVFs q1, q2 \u2208 L2 and \u03b3 \u2208 \u0393, (cid:107)(q1, \u03b3) \u2212 (q2, \u03b3)(cid:107) = (cid:107)q1 \u2212 q2(cid:107).\n\n2.3 Elastic Distance on Quotient Space\n\nOur next step is to de\ufb01ne an elastic distance between functions as follows. The orbit of an SRVF\nq \u2208 L2 is given by: [q] = closure{(q, \u03b3)|\u03b3 \u2208 \u0393}. It is the set of SRVFs associated with all the\nwarpings of a function, and their limit points. Let S denote the set of all such orbits. To compare\nany two orbits we need a metric on S. We will use the Fisher-Rao distance to induce a distance\nbetween orbits, and we can do that only because under this the action of \u0393 is by isometries.\nDe\ufb01nition 2 For any two functions f1, f2 \u2208 F and the corresponding SRVFs, q1, q2 \u2208 L2, we\nde\ufb01ne the elastic distance d on the quotient space S to be: d([q1], [q2]) = inf \u03b3\u2208\u0393 (cid:107)q1 \u2212 (q2, \u03b3)(cid:107).\nNote that the distance d between a function and its domain-warped version is zero. However, it can\nbe shown that if two SRVFs belong to different orbits, then the distance between them is non-zero.\nThus, this distance d is a proper distance (i.e. it satis\ufb01es non-negativity, symmetry, and the triangle\ninequality) on S but not on L2 itself, where it is only a pseudo-distance.\n\n3 Signal Estimation Method\nOur estimation is based on the model fi = ci(g \u25e6 \u03b3i) + ei, i = 1,\u00b7\u00b7\u00b7 , n, where g, fi \u2208 F, ci \u2208 R+,\n\u03b3i \u2208 \u0393 and ei \u2208 R. Given {fi}, our goal is to identify warping functions {\u03b3i} so as to reconstruct g.\nWe will do so in three steps: 1) For a given collection of functions {fi}, and their SRVFs {qi}, we\ncompute the mean of the corresponding orbits {[qi]} in the quotient space S; we will call it [\u00b5]n. 2)\nWe compute an appropriate element of this mean orbit to de\ufb01ne a template \u00b5n in L2. The optimal\nwarping functions {\u03b3i} are estimated by align individual functions to match the template \u00b5n. 3) The\nestimated warping functions are then used to align {fi} and reconstruct the underlying signal g.\n\n\u221a\n\n(cid:82) 1\n\n3.1 Pre-step: Karcher Mean of Points in \u0393\nIn this section we will de\ufb01ne a Karcher mean of a set of warping functions {\u03b3i}, under the Fisher-\nRao metric, using the differential geometry of \u0393. Analysis on \u0393 is not straightforward because it is a\nnonlinear manifold. To understand its geometry, we will represent an element \u03b3 \u2208 \u0393 by the square-\n(cid:82) t\nroot of its derivative \u03c8 =\n\u02d9\u03b3. Note that this is the same as the SRVF de\ufb01ned earlier for elements\nof F, except that \u02d9\u03b3 > 0 here. Since \u03b3(0) = 0, the mapping from \u03b3 to \u03c8 is a bijection and one\n(cid:82) 1\ncan reconstruct \u03b3 from \u03c8 using \u03b3(t) =\n0 \u03c8(s)2ds. An important advantage of this transformation\nis that since (cid:107)\u03c8(cid:107)2 =\n0 \u02d9\u03b3(t)dt = \u03b3(1) \u2212 \u03b3(0) = 1, the set of all such \u03c8s is S\u221e,\nthe unit sphere in the Hilbert space L2. In other words, the square-root representation simpli\ufb01es the\ncomplicated geometry of \u0393 to the unit sphere. Recall that the distance between any two points on\nthe unit sphere, under the Euclidean metric, is simply the length of the shortest arc of a great circle\nconnecting them on the sphere. Using Lemma 1, the Fisher-Rao distance between any two warping\nfunctions is found to be dF R(\u03b31, \u03b32) = cos\u22121(\n\u02d9\u03b32(t)dt). Now that we have a proper\ndistance on \u0393, we can de\ufb01ne a Karcher mean as follows.\nDe\ufb01nition 3 For a given set of warping functions \u03b31, \u03b32, . . . , \u03b3n \u2208 \u0393, de\ufb01ne their Karcher mean to\nbe \u00af\u03b3n = argmin\u03b3\u2208\u0393\n\n0 \u03c8(t)2dt =\n\ni=1 dF R(\u03b3, \u03b3i)2.\n\n(cid:80)n\n\n(cid:112)\n\n(cid:82) 1\n\n0\n\n\u02d9\u03b31(t)\n\n(cid:112)\n\n4\n\n\fThe search for this minimum is performed using a standard iterative algorithm that is not repeated\nhere to save space.\n\n3.2 Step 1: Karcher Mean of Points in S = L2/\u0393\nNext we consider the problem of \ufb01nding means of points in the quotient space S.\nDe\ufb01nition 4 De\ufb01ne the Karcher mean [\u00b5]n of the given SRVF orbits {[qi]} in the space S as a local\nminimum of the sum of squares of elastic distances:\n\nn(cid:88)\n\ni=1\n\n[\u00b5]n = argmin\n\n[q]\u2208S\n\nd([q], [qi])2 .\n\n(3)\n\nWe emphasize that the Karcher mean [\u00b5]n is actually an orbit of functions, rather than a function.\nThe full algorithm for computing the Karcher mean in S is given next.\nAlgorithm 1: Karcher Mean of {[qi]} in S\n\n1. Initialization Step: Select \u00b5 = qj, where j is any index in argmin1\u2264i\u2264n ||qi\u2212 1\n2. For each qi \ufb01nd \u03b3\u2217\n\n(cid:80)n\nk=1 qk||.\ni = argmin\u03b3\u2208\u0393 (cid:107)\u00b5 \u2212 (qi, \u03b3)(cid:107). The solution to this\n\ni by solving: \u03b3\u2217\n\noptimization comes from a dynamic programming algorithm in a discretized domain.\n\n3. Compute the aligned SRVFs using \u02dcqi (cid:55)\u2192 (qi, \u03b3\u2217\ni ).\n4. If the increment (cid:107) 1\n\n(cid:80)n\ni=1 \u02dcqi \u2212 \u00b5(cid:107) is small, then stop. Else, update the mean using \u00b5 (cid:55)\u2192\n\nn\n\nn\n\n(cid:80)n\n\n1\nn\n\ni=1 \u02dcqi and return to step 2.\n\n(cid:80)n\ni=1 (cid:107)\u00b5(k) \u2212 \u02dcq(k)\n\nThe iterative update in Steps 2-4 is based on the gradient of the cost function given in Eqn. 3.\nDenote the estimated mean in the kth iteration by \u00b5(k). In the kth iteration, let \u03b3(k)\ndenote the\noptimal domain warping from qi to \u00b5(k) and let \u02dcq(k)\ni=1 d([\u00b5(k)], [qi])2 =\ni=1 d([\u00b5(k+1)], [qi])2. Thus, the cost function\ni=1 d([\u00b5(k)], [qi])2 will always converge.\n\ni (cid:107)2 \u2265(cid:80)n\n(cid:80)n\n\ndecreases iteratively and as zero is a lower bound,\n\ni (cid:107)2 \u2265(cid:80)n\n\ni=1 (cid:107)\u00b5(k+1) \u2212 \u02dcq(k)\n\ni = (qi, \u03b3(k)\n\n). Then,\n\ni\n\ni\n\n(cid:80)n\n\n3.3 Step 2: Center of an Orbit\n\nHere we \ufb01nd a particular element of this mean orbit so that it can be used as a template to align the\ngiven functions.\n\nDe\ufb01nition 5 For a given set of SRVFs q1, q2, . . . , qn and q, de\ufb01ne an element \u02dcq of [q] as the center\nof [q] with respect to the set {qi} if the warping functions {\u03b3i}, where \u03b3i = argmin\u03b3\u2208\u0393 (cid:107)\u02dcq\u2212(qi, \u03b3)(cid:107),\nhave the Karcher mean \u03b3id.\n\nWe will prove the existence of such an element by construction.\nAlgorithm 2: Finding Center of an Orbit : WLOG, let q be any element of the orbit [q].\n\n1. For each qi \ufb01nd \u03b3i by solving: \u03b3i = argmin\u03b3\u2208\u0393 (cid:107)q \u2212 (qi, \u03b3)(cid:107).\n2. Compute the mean \u00af\u03b3n of all {\u03b3i}. The center of [q] wrt {qi} is given by \u02dcq = (q, \u00af\u03b3\u22121\nn ).\n\n(cid:80)n\n\nWe need to show that \u02dcq resulting from Algorithm 2 satis\ufb01es the mean condition in De\ufb01nition 5. Note\nn ) \u2212 (qi, \u03b3)(cid:107) =\nthat \u03b3i is chosen to minimize (cid:107)q \u2212 (qi, \u03b3)(cid:107), and also that (cid:107)\u02dcq \u2212 (qi, \u03b3)(cid:107) = (cid:107)(q, \u00af\u03b3\u22121\n(cid:80)n\n(cid:107)q \u2212 (qi, \u03b3 \u25e6 \u00af\u03b3n)(cid:107). Therefore, \u03b3\u2217\nn minimizes (cid:107)\u02dcq \u2212 (qi, \u03b3)(cid:107). That is, \u03b3\u2217\ni is a warping\nthat aligns qi to \u02dcq. To verify the Karcher mean of \u03b3\u2217\ni , we compute the sum of squared distances\ni=1 dF R(\u03b3 \u25e6 \u00af\u03b3n, \u03b3i)2. As \u00af\u03b3n is already the\nn )2 =\nmean of \u03b3i, this sum of squares is minimized when \u03b3 = \u03b3id. That is, the mean of \u03b3\u2217\nWe will apply this setup in our problem by \ufb01nding the center of [\u00b5]n with respect to the SRVFs {qi}.\n\n(cid:80)n\ni = \u03b3i \u25e6 \u00af\u03b3\u22121\ni=1 dF R(\u03b3, \u03b3i \u25e6 \u00af\u03b3\u22121\n\ni=1 dF R(\u03b3, \u03b3\u2217\n\ni is \u03b3id.\n\ni )2 =\n\n5\n\n\fg\n\n{fi}\n\n{ \u02dcfi}\n\nestimate of g\n\nerror w.r.t. n\n\nFigure 2: Example of consistent estimation.\n\n3.4 Steps 1-3: Complete Estimation Algorithm\nConsider the observation model fi = ci(g \u25e6 \u03b3i) + ei, i = 1, . . . , n, where g is an unknown signal,\nand ci \u2208 R+, \u03b3i \u2208 \u0393 and ei \u2208 R are random. Given the observations {fi}, the goal is to estimate the\nsignal g. To make the system identi\ufb01able, we need some constraints on \u03b3i, ci, and ei. In this paper,\nthe constraints are: 1) the population mean of {\u03b3\u22121\ni } is identity \u03b3id, and 2) the population Karcher\nmeans of {ci} and {ei} are known, denoted by E(\u00afc) and E(\u00afe), respectively. Now we can utilize\nAlgorithms 1 and 2 to present the full procedure for function alignment and signal estimation.\nComplete Estimation Algorithm: Given a set of functions {fi}n\nE(\u00afc) and E(\u00afe). Let {qi}n\n\ni=1 denote the SRVFs of {fi}n\n\ni=1 on [0, 1], and population means\n\ni=1, respectively.\n\n(cid:80)n\n\n1. Computer the Karcher mean of {[qi]} in S using Algorithm 1; Denote it by [\u00b5]n.\n2. Find the center of [\u00b5]n wrt {qi} using Algorithm 2; call it \u00b5n.\n3. For i = 1, 2, . . . , n, \ufb01nd \u03b3\u2217\ni by solving: \u03b3\u2217\n4. Compute the aligned SRVFs \u02dcqi = (qi, \u03b3\u2217\n5. Return the warping functions {\u03b3\u2217\n\ni = argmin\u03b3\u2208\u0393 (cid:107)\u00b5n \u2212 (qi, \u03b3)(cid:107).\ni ) and aligned functions \u02dcfi = fi \u25e6 \u03b3\u2217\ni .\n\ni } and the estimated signal \u02c6g = ( 1\n\n\u02dcfi\u2212E(\u00afe))/E(\u00afc).\nIllustration. We illustrate the estimation process using an example which is a quadratically-\nenveloped sine-wave function g(t) = (1 \u2212 (1 \u2212 2t)2) sin(5\u03c0t), t \u2208 [0, 1]. We randomly generate\nn = 50 warping functions {\u03b3i} such that {\u03b3\u22121\ni } are i.i.d with mean \u03b3id. We also generate i.i.d\nsequences {ci} and {ei} from the exponential distribution with mean 1 and the standard normal\ndistribution, respectively. Then we compute functions fi = ci(g \u25e6 \u03b3i) + ei to form the functional\ndata. In Fig. 2, the \ufb01rst panel shows the function g, and the second panel shows the data {fi}. The\nComplete Estimation Algorithm results in the aligned functions { \u02dcfi = fi \u25e6 \u03b3\u2217\ni } that are are shown\nin the third panel in Fig. 2. In this case, E(\u00afc)) = 1, E(\u00afe) = 0. This estimated g (red) using the\nComplete Estimation Algorithm as well as the true g (blue) are shown in the fourth panel. Note\nthat the estimate is very successful despite large variability in the raw data. Finally, we examine\nthe performance of the estimator with respect to the sample size, by performing this estimation for\nn equal to 5, 10, 20, 30, and 40. The estimation errors, computed using the L2 norm between esti-\nmated g\u2019s and the true g, are shown in the last panel. As we will show in the following theoretical\ndevelopment, this estimate converges to the true g when the sample size n grows large.\n\nn\n\ni=1\n\n4 Estimator Consistency and Asymptotics\n\nIn this section we mathematically demonstrate that the proposed algorithms in Section 3 provide\na consistent estimator for the underlying function g. This or related problems have been consid-\nered previously by several papers, including [14, 9], but we are not aware of any formal statistical\nsolution.\nAt \ufb01rst, we establish the following useful result.\nLemma 3 For any q1, q2 \u2208 L2 and a constant c > 0, we have argmin\u03b3\u2208\u0393 (cid:107)q1 \u2212 (q2, \u03b3)(cid:107) =\nargmin\u03b3\u2208\u0393 (cid:107)cq1 \u2212 (q2, \u03b3)(cid:107).\nCorollary 1 For any function q \u2208 L2 and constant c > 0, we have \u03b3id \u2208 argmin\u03b3\u2208\u0393 (cid:107)cq \u2212 (q, \u03b3)(cid:107).\nMoreover, if the set {t \u2208 [0, 1]|q(t) = 0} has (Lebesgue) measure 0, \u03b3id = argmin\u03b3\u2208\u0393 (cid:107)cq\u2212(q, \u03b3)(cid:107).\n\n6\n\n00.51\u22124\u2212202400.51\u22124\u2212202400.51\u22124\u2212202400.51\u22124\u22122024 true gestimated g5102030405000.30.6\fBased on Lemma 3 and Corollary 1, we have the following result on the Karcher mean in the quotient\nspace S.\nTheorem 1 For a function g, consider a sequence of functions fi(t) = cig(\u03b3i(t)) + ei, where ci\nis a positive constant, ei is a constant, and \u03b3i is a time warping, i = 1,\u00b7\u00b7\u00b7 , n. Denote by qg\nand qi the SRVFs of g and fi, respectively, and let \u00afs = 1\nci. Then, the Karcher mean of\n{[qi], i = 1, 2, . . . , n} in S is \u00afs[qg]. That is,\nn\n\n(cid:80)n\n\n\u221a\n\ni=1\n\n(cid:195)\n\nN(cid:88)\n\n(cid:33)\n\n[\u00b5]n \u2261 argmin\n\nd2([qi], [q])\n\n= \u00afs[qg] = \u00afs{(qg, \u03b3), \u03b3 \u2208 \u0393} .\n\n[q]\n\ni=1\n\nNext, we present a simple fact about the Karcher mean (see De\ufb01nition 3) of warping functions.\nLemma 4 Given a set {\u03b3i \u2208 \u0393|i = 1, ..., n} and a \u03b30 \u2208 \u0393, if the Karcher mean of {\u03b3i} is \u00af\u03b3, then\nthe Karcher mean of {\u03b3i \u25e6 \u03b30} is \u00af\u03b3 \u25e6 \u03b30.\nTheorem 1 ensures that [\u00b5]n belongs to the orbit of [qg] (up to a scale factor) but we are interested in\nestimating g itself, rather than its orbit. We will show in two steps (Theorems 2 and 3) that \ufb01nding\nthe center of the orbit [\u00b5]n leads to a consistent estimator for g.\nTheorem 2 Under the same conditions as in Theorem 1, let \u00b5 = (\u00afsqg, \u03b30), for \u03b30 \u2208 \u0393, denote an\narbitrary element of the Karcher mean class [\u00b5]n = \u00afs[qg]. Assume that the set {t \u2208 [0, 1]| \u02d9g(t) = 0}\ni } is \u03b3id, then the center of the\nhas Lebesgue measure zero. If the population Karcher mean of {\u03b3\u22121\norbit [\u00b5]n, denoted by \u00b5n, satis\ufb01es limn\u2192\u221e \u00b5n = E(\u00afs)qg.\nThis result shows that asymptotically one can recover the SRVF of the original signal by the Karcher\nmean of the SRVFs of the observed signals. Next in Theorem 3, we will show that one can also\nreconstruct g using aligned functions { \u02dcfi} generated by the Alignment Algorithm in Section 3.\nTheorem 3 Under the same conditions as in Theorem 2, let \u03b3\u2217\nfi\u25e6 \u03b3\u2217\n\ni = argmin\u03b3 (cid:107)(qi, \u03b3)\u2212 \u00b5n(cid:107) and \u02dcfi =\n\u02dcfi = E(\u00afc)g + E(\u00afe).\n\ni=1 ei, then limn\u2192\u221e 1\n\ni . If we denote \u00afc = 1\nn\n\ni=1 ci and \u00afe = 1\n\n(cid:80)n\n\n(cid:80)n\n\n(cid:80)n\n\nn\n\ni=1\n\nn\n\n5 Application to Signal Alignment\n\nIn this section we will focus on function alignment and comparison of alignment performance with\nsome previous methods on several datasets. In this case, the given signals are viewed as {fi} in the\nprevious set up and we estimate the center of the orbit and then use it for alignment of all signals.\nThe datasets include 3 real experimental applications listed below. The data are shown in Column 1\nin Fig. 3.\n\n1. Real Data 1. Berkeley Growth Data: The Berkeley growth dataset for 39 male subjects\n[11]. For better illustrations, we have used the \ufb01rst derivatives of the growth (i.e. growth\nvelocity) curves as the functions {fi} in our analysis.\n\n2. Real Data 2. Handwriting Signature Data: 20 handwritten signatures and the acceler-\nation functions along the signature curves [8]. Let (x(t), y(t)) denote the x and y coor-\ndinates of a signature traced as a function of time t. We study the acceleration functions\nf (t) =\n\n\u00a8x(t)2 + \u00a8y(t)2 of the signatures.\n\n(cid:112)\n\n3. Real Data 3. Neural Spike Data: Spiking activity of one motor cortical neuron in a\nMacaque monkey was recorded during arm-movement behavior [16]. The smoothed (using\na Gaussian kernel) spike trains over 10 movement trials are used in this alignment analysis.\n\nThere are no standard criteria on evaluating function alignment in the current literature. Here we use\nthe following three criteria so that together they provide a comprehensive evaluation, where fi and\n\u02dcfi, i = 1, ..., N, denote the original and the aligned functions, respectively.\n\n1. Least Squares: ls = 1\nN\n\nj(cid:54)=i\nj(cid:54)=i fj (t))2dt. ls measures the cross-sectional\nvariance of the aligned functions, relative to original values. The smaller the value of ls,\nthe better the alignment is in general.\n\ni=1\n\n\u02dcfj (t))2dt\n\n( \u02dcfi(t)\u2212 1\nN\u22121\n(fi(t)\u2212 1\nN\u22121\n\n(cid:80)N\n\n(cid:82)\n(cid:82)\n\n(cid:80)\n(cid:80)\n\n7\n\n\fOriginal\n\nPACE [11]\n\nSMR [4]\n\nMBM [5]\n\nF-R\n\nGrowth-male\n\n(0.91, 1.09, 0.68)\n\n(0.45, 1.17, 0.77)\n\n(0.70, 1.17, 0.62)\n\n(0.64, 1.18, 0.31)\n\nSignature\n\n(0.91, 1.18, 0.84)\n\n(0.62, 1.59, 0.31)\n\n(0.64, 1.57, 0.46)\n\n(0.56, 1.79, 0.31)\n\nNeural data\n\n(0.87, 1.35, 1.10)\n\n(0.69, 2.54, 0.95)\n\n(0.48, 3.06, 0.40)\n\n(0.40, 3.77, 0.28)\n\nFigure 3: Empirical evaluation of four methods on 3 real datasets, with the alignment performance\ncomputed using three criteria (ls, pc, sls). The best cases are shown in boldface.\n\n2. Pairwise Correlation: pc =\n\n(cid:80)\n(cid:80)\n\ncorrelation between functions. Large values of pc indicate good sychronization.\n\ni(cid:54)=j cc( \u02dcfi(t), \u02dcfj (t))\ni(cid:54)=j cc(fi(t),fj (t)) , where cc(f, g) is the pairwise Pearson\u2019s\n(cid:80)N\n(cid:80)N\n\n(cid:80)N\n(cid:80)N\n\n(cid:82)\n(cid:82)\n\ni=1\n\nN\n\n\u02d9\u02dcfi(t)\u2212 1\n(\n( \u02d9fi(t)\u2212 1\n\n3. Sobolev Least Squares: sls =\n\n, This criterion measures the\ntotal cross-sectional variance of the derivatives of the aligned functions, relative to the\noriginal value. The smaller the value of sls, the better synchronization the method achieves.\n\nj=1\n\nj=1\n\ni=1\n\nN\n\n\u02d9\u02dcfj )2dt\n\u02d9fj )2dt\n\nWe compare our Fisher-Rao (F-R) method with the Tang-M\u00a8uller method [11] provided in principal\nanalysis by conditional expectation (PACE) package, the self-modeling registration (SMR) method\npresented in [4], and the moment-based matching (MBM) technique presented in [5]. Fig. 3 sum-\nmarizes the values of (ls, pc, sls) for these four methods using 3 real datasets. From the results, we\ncan see that the F-R method does uniformly well in functional alignment under all the evaluation\nmetrics. We have found that the ls criterion is sometimes misleading in the sense that a low value\ncan result even if the functions are not very well aligned. This is the case, for example, in the male\ngrowth data under SMR method. Here the ls = 0.45, while for our method ls = 0.64, even though\nit is easy to see that latter has performed a better alignment. On the other hand, the sls criterion\nseems to best correlate with a visual evaluation of the alignment. The neural spike train data is the\nmost challenging and no other method except ours does a good job.\n\n6 Summary\n\nIn this paper we have described a parameter-free approach for reconstructing underlying signal using\ngiven functions with random warpings, scalings, and translations. The basic idea is to use the Fisher-\nRao Riemannian metric and the resulting geodesic distance to de\ufb01ne a proper distance, called elastic\ndistance, between warping orbits of SRVF functions. This distance is used to compute a Karcher\nmean of the orbits, and a template is selected from the mean orbit using an additional condition\nthat the mean of the warping functions is identity. By applying these warpings on the original func-\ntions, we provide a consistent estimator of the underlying signal. One interesting application of this\nframework is in aligning functions with signi\ufb01cant x-variability. We show the the proposed Fisher-\nRao method provides better alignment performance than the state-of-the-art methods in several real\nexperimental data.\n\n8\n\n5101501020305101501020305101501020305101501020305101501020302040608000.511.52040608000.511.52040608000.511.52040608000.511.52040608000.511.50.511.522.500.511.50.511.522.500.511.50.511.522.500.511.50.511.522.500.511.50.511.522.500.511.5\fReferences\n[1] S. Amari. Differential Geometric Methods in Statistics. Lecture Notes in Statistics, Vol. 28.\n\nSpringer, 1985.\n\n[2] N. N. \u02c7Cencov. Statistical Decision Rules and Optimal Inferences, volume 53 of Translations\n\nof Mathematical Monographs. AMS, Providence, USA, 1982.\n\n[3] B. Efron. De\ufb01ning the curvature of a statistical problem (with applications to second order\n\nef\ufb01ciency). Ann. Statist., 3:1189\u20131242, 1975.\n\n[4] D. Gervini and T. Gasser. Self-modeling warping functions. Journal of the Royal Statistical\n\nSociety, Ser. B, 66:959\u2013971, 2004.\n\n[5] G. James. Curve alignment by moments. Annals of Applied Statistics, 1(2):480\u2013501, 2007.\n[6] R. E. Kass and P. W. Vos. Geometric Foundations of Asymptotic Inference. John Wiley &\n\nSons, Inc., 1997.\n\n[7] A. Kneip and T. Gasser. Statistical tools to analyze data representing a sample of curves. The\n\nAnnals of Statistics, 20:1266\u20131305, 1992.\n\n[8] A. Kneip and J. O. Ramsay. Combining registration and \ufb01tting for functional models. Journal\n\nof American Statistical Association, 103(483), 2008.\n\n[9] J. O. Ramsay and X. Li. Curve registration. Journal of the Royal Statistical Society, Ser. B,\n\n60:351\u2013363, 1998.\n\n[10] C. R. Rao.\n\nInformation and accuracy attainable in the estimation of statistical parameters.\n\nBulletin of Calcutta Mathematical Society, 37:81\u201391, 1945.\n\n[11] R. Tang and H. G. Muller. Pairwise curve synchronization for functional data. Biometrika,\n\n95(4):875\u2013889, 2008.\n\n[12] H.L. Van Trees. Detection, Estimation, and Modulation Theory, vol. I. John Wiley, N.Y., 1971.\n[13] M. Tsang, J. H. Shapiro, and S. Lloyd. Quantum theory of optical temporal phase and instan-\n\ntaneous frequency. Phys. Rev. A, 78(5):053820, Nov 2008.\n\n[14] K. Wang and T. Gasser. Alignment of curves by dynamic time warping. Annals of Statistics,\n\n25(3):1251\u20131276, 1997.\n\n[15] A. Willsky. Fourier series and estimation on the circle with applications to synchronous\ncommunication\u2013I: Analysis. IEEE Transactions on Information Theory, 20(5):577 \u2013 583, sep\n1974.\n\n[16] W. Wu and A. Srivastava. Towards Statistical Summaries of Spike Train Data. Journal of\n\nNeuroscience Methods, 195:107\u2013110, 2011.\n\n9\n\n\f", "award": [], "sourceid": 477, "authors": [{"given_name": "Sebastian", "family_name": "Kurtek", "institution": null}, {"given_name": "Anuj", "family_name": "Srivastava", "institution": null}, {"given_name": "Wei", "family_name": "Wu", "institution": null}]}