{"title": "Bayesian Model Selection Approach to Boundary Detection with Non-Local Priors", "book": "Advances in Neural Information Processing Systems", "page_first": 1974, "page_last": 1983, "abstract": "Based on non-local prior distributions, we propose a Bayesian model selection (BMS) procedure for boundary detection in a sequence of data with multiple systematic mean changes. The BMS method can effectively suppress the non-boundary spike points with large instantaneous changes. We speed up the algorithm by reducing the multiple change points to a series of single change point detection problems. We establish the consistency of the estimated number and locations of the change points under various prior distributions. Extensive simulation studies are conducted to compare the BMS with existing methods, and our approach is illustrated with application to the magnetic resonance imaging guided radiation therapy data.", "full_text": "Bayesian Model Selection Approach to Boundary\n\nDetection with Non-Local Priors\n\nFei Jiang\n\nDepartment of Statistics and Actuarial Science\n\nThe University of Hong Kong\n\nfeijiang@hku.hk\n\nGuosheng Yin\n\nDominici Francesca\n\nDepartment of Statistics and Actuarial Science\n\nHarvard T.H. Chan School of Public Health\n\nThe University of Hong Kong\n\ngyin@hku.hk\n\nHarvard University\n\nfdominic@hsph.harvard.edu\n\nAbstract\n\nBased on non-local prior distributions, we propose a Bayesian model selection\n(BMS) procedure for boundary detection in a sequence of data with multiple\nsystematic mean changes. The BMS method can effectively suppress the non-\nboundary spike points with large instantaneous changes. We speed up the algorithm\nby reducing the multiple change points to a series of single change point detection\nproblems. We establish the consistency of the estimated number and locations of\nthe change points under various prior distributions. Extensive simulation studies\nare conducted to compare the BMS with existing methods, and our approach is\nillustrated with application to the magnetic resonance imaging guided radiation\ntherapy data.\n\n1\n\nIntroduction\n\nTraditional change point detection algorithms often apply to the situation where the occurrence\nfrequency of the change points is relatively consistent across the signals. For example, the narrowest-\nover-threshold (NOT) algorithm [1] is more suitable when different segments between the change\npoints have comparable lengths, and the stepwise marginal likelihood (SML) method [5] works better\nto identify frequent change points. However, in practice it is often the case that distances between\nconsecutive change points may vary dramatically, while only those with certain distance gaps are of\ninterest. For such settings, we develop a computationally ef\ufb01cient Bayesian model selection (BMS)\napproach to identifying multiple change points.\nThe inconsistent gaps between the change points can be observed from the signals generated by\nthe magnetic resonance imaging guided radiation therapy (MRgRT). When radiations travel in the\nmagnetic \ufb01eld, the dose can be signi\ufb01cantly enhanced near the boundaries between different tissues or\norgans inside human bodies. As shown by Figure 1, the Duke Mid-sized Optical-CT System (DMOS)\nis developed to identify the dose changes near the region of such boundary artifact. It also exhibits\nthe pro\ufb01le of dose intensities as the radiation travels through the dosimeter, where the boundaries on\nand inside the dosimeter can be distinguished by the notable peaks in the signals. In the experiment,\nradiations enter the cylindrical dosimeter from different directions, and a sequence of dose intensities\nordered by their distances to the sources are recorded. Because the dosimeter is circular and there is a\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fFigure 1: Reconstructed image of a slice in a cylindrical dosimeter with a cavity in the middle (left) and a\ntypical line pro\ufb01le through the center of the cavity (right). Radiations enter the dosimeter from the hole on the\nleft of the cylindrical dosimeter, which rotates 360 degrees so that radiations can enter from different directions.\n\ncavity in the middle, radiations from different directions would hit the boundaries in the dosimeter at\nsimilar distances from their sources.\nIn the MRgRT data, radiations in certain directions may experience temporary changes at non-\nboundary locations, which may result from the abnormal status of the DMOS system rather than the\ntrue dose changes. The temporary change points, appearing in the data sequence as the spike points,\nare often mixed up with those on the boundary (i.e., the peak locations in the right panel of Figure\n1), which makes the boundary detection extremely challenging. Figures A.1 in Appendix shows the\nchange points in the MRgRT data identi\ufb01ed by the NOT [1] and SML [5] algorithms respectively,\nwhile neither can correctly identify the true boundaries. This motivates us to propose a new approach\nto detecting the systematic changes when the segment lengths have dramatic differences.\nOur preliminary analysis of the MRgRT data demonstrates that the local control of the discovery is\ncrucial. To avoid picking the spike points, we enforce a minimal distance between adjacent change\npoints. Moreover, we adopt a computationally ef\ufb01cient local scan routine and propose a systematic\ntwo-stage procedure to speed up the change point detection. More speci\ufb01cally, the local scan method\n\ufb01rst identi\ufb01es the candidate points with a minimal distance based on the local data, and then optimizes\nan utility function to obtain the estimates for the locations and the total number of change points.\nBecause the change points are de\ufb01ned based on the mean changes between two consecutive segments,\nthe local data are suf\ufb01cient to detect the systemic changes [6, 7, 8, 13, 14, 16, 18, 19].\nTo perserve the positive detection rate of the change points and reduce the false detection rate of the\nnon-change points, we take a Bayesian marginal likelihood function as the utility, and develop a new\nBMS procedure for identifying change points. We show that the selection consistency is achieved\nunder both the local [2, 3, 4, 17] and non-local priors [11], whereas the convergence rate is faster\nunder the later. Our BMS procedure is cast in the model selection framework, which is faster than\nthe dynamic programming in the SML framework. For example, for the MRgRT data, BMS takes\n1.3 seconds and SML takes 3.1 seconds when the maximum number of change points is capped at\n100. The ef\ufb01ciency of BMS is mainly due to the fact that it reduces the search space dramatically by\nselecting a small set of candidate change points. Once the candidate points are selected, BMS only\nneeds to evaluate two consecutive segments at a time, which greatly facilitates parallel computation.\n\n2 Bayesian multiple change points detection\n\n2.1 Probability model\n\ntp0 among n observations Yn\n\nY1, . . . , Yn .\nSuppose there are p0 true change points t1\ntj and minj 0,...,p0 j. We\nAs a convention, let t0\nn 1. Denote j\ntj 1\n1 and \u2327Kn 1\nn 1, while selection of\nconsider a set of Kn candidate points \u23271, . . . ,\u2327 Kn, with \u23270\n\u2327j, and nI minj 0,...,Kn 1 nj,\nthe candidate points is discussed in Section 2.3. De\ufb01ne nj\n\u2327j 1\nnI denote the set of candidate change\n. Let H nI\n1, . . . , Kn, \u2327j 1\nnI\n\u2327j\n1, . . . , p0 denote the set of true change points. Not only does the\npoints, and let T0 p0\nspeci\ufb01cation of the candidate points allow BMS to be implemented in a lower dimensional space with\nthe most in\ufb02uential points, but it also guarantees that there are a suf\ufb01cient number of non-change\npoints surrounding the true change points so that the consistency conditions are met. The probability\n\n1 and t p0 1\n\n\u2327j : j\ntj : j\n\n2\n\n\fmodel takes the form of\n\nYl\n\n\u232b\u2327j\n\n\u270fl,\n\nl\n\n\u2327j,\u2327 j 1 ,\n\nwhere the random errors \u270fl are independent with mean zero and variance 2\n maxj 0,...,p0 j.\nFor ease of exposition, we \ufb01rst consider the case where the locations of the candidate change points\nare given and T0 p0\nYl, which is the sample average for the\n1. If the candidate point \u2327k is not a change point, then the points\nj\nin \u2327k,\u2327 k 1 should have the same mean as those in \u2327k 1,\u2327 k ; otherwise there should be a mean\nshift between the segments \u2327k,\u2327 k 1 and \u2327k 1,\u2327 k . Hence, we can formulate the model and prior\ndistribution for l\n\nH nI . De\ufb01ne \u00afY\u2327j\n\n1 th segment \u2327j 1,\u2327 j , j\n\nj . Further, we de\ufb01ne\n\n\u23271 as follows:\n\n\u2327j 1\nl \u2327j 1\n\nn 1\nj 1\n\nYl\n\n\u00afY\u2327k\n\u00b5k\n\u00b5k\n\n\u21e0l,\n\n\u2327k,\u2327 k 1 ,\n\nl\nif \u2327k is a change point,\n\n\u00b5k\n\u21e1 \u00b5k ,\n0, with probability 1, if \u2327k is an nI-\ufb02at point,\n\nwhere \u21e0l is a mean-zero error term and \u21e1\nis a prior distribution. The nI-\ufb02at point is de\ufb01ned as\na non-change point which is at least nI apart from any change points. We require the nI distance\nbetween the true change points and the \ufb02at ones so that there are suf\ufb01cient neighborhood samples to\nachieve the estimation consistency.\n0 is the lower bound of \u00b5k0,\nLet \u00b5k0 be the true value of \u00b5k, and we assume \u00b5k0\nfor the k\u2019s with \u2327k\nT0 p0 . The prior distribution of \u00b5k determines the convergence rate of the\nBMS procedure. We explore three types of priors: the local prior [9], the non-local moment prior and\nthe inverse moment prior [11] as follows:\n\n, where \n\nLocal prior: \u21e1L \u00b5\nMoment prior: \u21e1M \u00b5\nInverse moment prior: \u21e1I \u00b5\nwhere CM is the normalizing constant.\nLet Mk represent the model that \u2327k is the sole change point. We de\ufb01ne the marginal likelihood with\nthe Gaussian kernel as\nKn\n\nN 0,! 2 ,\n\u00b52v CM 1\ns\u232bq 2 q 2s \u00b5 q 1 exp\n\n2\u21e1 exp \u00b52 2 ,\n\n\u00b52 \u232b\n\n\u2327k 1 1\n\n\u2327j 1 1\n\ns ,\n\n\u00afY\u2327k\n\n\u00b5 2 \u21e1 \u00b5 d\u00b5.\n\nPr Yn Mk\n\nexp\n\nYl\n\nj 1,j k\n\nl \u2327j\n\n\u00afY\u2327j\n\n2\n\nexp\n\nYl\n\nl \u2327k\n\nThe posterior model probability of Mk given Yn is\n\nPr Mk Yn\n\nPr Yn Mk Pr Mk\nKn\nj 1 Pr Yn Mj Pr Mj\n\nPr Yn Mk\nKn\nj 1 Pr Yn Mj\n\n,\n\nwhen Mj takes a discrete uniform prior, j\n1, . . . , Kn. It is not necessary for Yn to be normally\ndistributed to ensure the selection consistency in detecting mean changes, while the Gaussian kernel\nis used here because it tends to be large when the difference between the true and the hypothetical\nsegment means is small. Hence, as n\n, Pr Mk Yn approaches 1 when \u2327k is a true change point\nand the \u2327j\u2019s j\n\nk are nI-\ufb02at points.\n\n2.2 Detection of change points\nWe start with the simplest case where there is only one mean shift in the data, i.e., p0\n1 is \ufb01xed\na priori. We select the candidate point \u2327k corresponding to the largest Pr Mk Yn , i.e., the largest\nmarginal likelihood Pr Yn Mk . It can be shown that\n\nPr Mk Yn\n\n1\n\nKn\n\nj k\n\nPr Yn Mj\nPr Yn Mk\n\n1\n\n,\n\nwhere for j\n\nk,\n\nPr Yn Mj\n\n\u2327j 1 1\nl \u2327j\n\nexp\n\u2327j 1 1\nl \u2327j\n\nYl\n\nexp\n\n\u00afY\u2327j\nYl\n\n\u00b5 2 \u21e1 \u00b5 d\u00b5\n\u00afY\u2327j\n\n2\n\n,\n\n3\n\n\fk we replace above \u2327j and \u2327j 1 by \u2327k and \u2327k 1 respectively. As a result, the selection\n\nand for j\nconsistency is determined by the evidence in favor of \u00b5k\nFor the case with multiple change points (p0\nlargest Pr Mk Yn , for which the selection consistency is presented as follows.\nTheorem 1. Let M Mk,\u2327 k\n\n. If it holds that\n\n\u21e1 \u00b5k and \u00b5j\n\n1), we select the points corresponding to the p0\n\n0 for j\n\nk.\n\n(1)\n\nT0 p0\nPr Yn Mj\nI \n\nOp anj ,\n, then\n\nfor \u2327j\n\nT0 p0 , anj\n\nop 1 , and n1 2\nPr Mk Yn\n\n1 Op KnanI exp nI2\n\n.\n\nHence, as nI log n\n\nMk M\nc\n\n0, nI\n\n, we have\n\nPr Mk Yn\n\np 1.\n\nMk M\n\nn 1 2\nj\n\nfor local prior \u21e1L \u00b5 ; anj\n\nThe proof of Theorem 1 is delineated in Appendix. The selection consistency depends on the\nconvergence rate of anI , which is determined by the prior \u21e1 . Lemmas 2\u20134 in Appendix show\nthat anj\nfor\n\u21e1I \u00b5 . Hence, the selection consistency is achieved at the fastest rate using the non-local inverse\nmoment prior.\nWhen p0 is unknown, let T p be the set containing p points obtained by the procedure described\nabove. We de\ufb01ne the marginal likelihood given T p as\n\nfor \u21e1M \u00b5 and anj\n\nn v 1 2\nj\n\nns s 1\nj\n\nexp\n\n\u2327j 1 1\n\nPr Yn T p\n\n\u2327j T p\n\nl \u2327j\n\nexp\n\nYl\n\n\u00afY\u2327j\n\n2\n\n\u2327k 1 1\n\nl \u2327k\n\nexp\n\nYl\n\n\u00afY\u2327k\n\n\u00b5 2 \u21e1 \u00b5 d\u00b5.\n\n\u2327k T p\n\nWe can estimate the locations and the number of change points in two steps: First for any given p, we\nobtain T p using the procedure described in the previous section; and second we estimate p0 by\np by maximizing Pr Yn T p with respect to p, which is merely implemented in one dimension.\nIn contrast, SML [5] simultaneously estimates the locations and the number of change points by\nmaximizing the marginal likelihood with respect to both T p and p.\n2.3 Selection of candidate points\nPrevious discussions rely upon a critical assumption that the candidate points are speci\ufb01ed in advance.\nthat is close\nTo facilitate the implementation of BMS, we need to \ufb01nd a candidate set Hc nI\nto H nI . For the selection consistency of the change points, we require for each tj there is a\n\u2327k Hc nI , such that Pr tj\n\n1 Op min exp nI2 , anI\n\n. De\ufb01ne\n\nRi\n\n\u2327k\ni nI 1\nl\n\ni\n\nnI\nexp\ni nI 1\nl\n\ni\n\nYl\nexp\n\n\u00afYi\nYl\n\n\u00b5 2 \u21e1 \u00b5 d\u00b5\n\u00afYi\n\n2\n\n,\n\nn 1\nI\n\ni 1\nj i nI\n\nwhere \u00afYi\nYj. By the argument similar to that in Lemma 1, Ri goes to in\ufb01nity when\ni is a true change point, and Ri approaches zero in probability when i is an nI-\ufb02at point. Hence,\nthe value of Ri can distinguish a change point from a set of nI-\ufb02at points. To further eliminate\nthe non-change points that are also not nI-\ufb02at, we implement the non-maximum suppression that\nremoves the points which do not yield the largest Ri\u2019s in their nI-neighborhood. Speci\ufb01cally, the\nscreening procedure for selecting candidate points is described as follows.\n\nAlgorithm 1 : Screening\n\n(i) For each i in nI, n\n(ii) If Ri max Rj : j\n(iii) Scan through the entire data sequence, and obtain a set of Kn candidate points Hc nI .\n\n, then i is selected as a candidate point.\n\nnI , compute Ri.\ni\n\nnI, i\n\nnI\n\n4\n\n\fThe screening algorithm is comparable to that in [15], as by the Laplace approximation we have\n\nDn\n\nRi\n\nexp\n\ni nI 1\nl\n\ni\ni nI 1\nl\n\ni\n\nexp\ni nI 1\n\n\u02c7\u00b5 2 \u21e1 \u02c7\u00b5\n2\n\nYl\n\n\u00afYi\n\nYl\n\n\u00afYi\ni 1\n\n1\n\nop 1\n\nDn exp 2\n\nYl\n\nYj\n\n\u02c7\u00b5\n\nnI \u02c7\u00b52 \u21e1 \u02c7\u00b5 1\n\nop 1 ,\n\nl\n\ni\n\nj i nI\n\nwhere Dn is a constant of order Op n 1 2\nYl\nlog\u21e1 \u00b5 . The magnitude of the leading term in Ri is strongly associated with n 1\n\nand \u02c7\u00b5 is the maximizer of\n\ni nI 1\nl\n\nI\n\ni\n\nI\n\n\u00afYi\ni nI 1\nl\n\n\u00b5 2\nYl\n\ni\n\ni 1\nj i nI\n\nYj, which is the local diagnosis function with h\nThe screening procedure identi\ufb01es a candidate set Hc nI\nProposition 1. Assume that n1 2\nthat Pr tj\n\nI \n\nnI,\u2327\n\nnI\n\n, and for each tj\n\n1 O min exp nI2 , anI\n\n\u2327\n\n.\n\nnI in [15].\n\nthat would lead to the consistency result.\nT0 p0 , there is a \u2327 Hc nI , such\n\ntj maximizes Ri in the nI-neighborhood of tj asymptotically. By selecting the local\nIn theory, i\nmaximal Ri in the screening procedure, Hc nI would cover the nI-neighborhood of T0 p0 as\nindicates that the effect size cannot be too small in order\nn\nto \ufb01nd the candidate points around the true change points. After selecting the candidate points, we\nperform a re\ufb01nement step to identify the locations and the total number of change points.\n\n. Also the condition n1 2\n\nI \n\nAlgorithm 2 : Re\ufb01nement\nScanning\n\n(i) Compute Pr Yn Mk by scanning over all the candidate points in Hc nI .\n(ii) For each p, obtain a set of change points T p corresponding to the p largest Pr Yn Mk ,\n\nk\n\n1, . . . , Kn.\n\nOptimization\n\n(iii) Select p that maximizes Pr Yn T p .\n\nTheorem 2. Assume that nI log n\nholds. Let Hc nI be the set of candidate points such that \u2327k 1\na \u2327k Hc nI , Pr tj\n\n\u2327k\n1 Op min exp nI2 , anI\n\nI \n\n0, n1 2\n\nnI\n\n\u2327k\n\nc\n\n, lim supn\n\nnI \n\n1 2, and (1)\nnI, and for each tj there is\n. Then,\n\nPr p\n\np0\n\n1 Op max exp nI2 , anI\n\n,\n\nand furthermore,\n\nPr\n\nPr\n\nsup\ntj T p\n\ninf\n\ntj T0 p0\n\nsup\ntj T0 p0\n\ninf\ntj T p\n\ntj\n\ntj n\n\nnI n\n\n1 O exp nI2\n\n,\n\ntj\n\ntj n\n\nnI n\n\n1 O anI .\n\nTheorem 2 shows that BMS controls both the over- and under-segmentation errors. The ra-\ntionale is that for any T p different from T0 p0 ,\nT p whose nI-neighborhood does not contain true change points. Then the likelihood ratio\ngoes to 0 with probability 1, because the ratio contains at least\nPr Yn T p\nT0 p0 , which converges to 0 in\none of Pr Yn Mj and Pr Yn Mj\nprobability by Lemma 1 and (1). As the computational time for Pr Yn Mk grows at the speed of\nO n for k\n1, . . . , Kn, that for the re\ufb01nement stage grows with the sample size at the speed of\nO nKn .\n\nthere is at least a chosen point \u2327\n\nT0 p0 and \u2327j\n\nPr Yn T0 p0\n\n1 for \u2327k\n\n5\n\n\f3 Simulations\n\n3.1 Data sequence without spikes\nTo evaluate the performance of the proposed BMS method in the settings without spike points, we\ngenerate data from two different models. Model I takes the form of\n\nModel I : Yi\n\nh J xi\n\n\u270fi,\n\n2.01, 2.51, 1.51, 2.01, 2.51, 2.11, 1.05, 2.16, 1.56, 2.56, 2.11 with p0\n\n11,\nwhere h\n,\nthe error \u270fi N 0, 1 , and \nis a sign function, and the xi\u2019s are equally spaced on [0, 1]. The true change points are\nwhere sgn\n0.1, 0.13, 0.15, 0.23, 0.25, 0.40, 0.44, 0.65, 0.76, 0.78, 0.81 . The errors\ntj n, j\n1, . . . , p0\nare generated from three distributions: N 0, 1 ; t distribution with 5 degrees of freedom t 5 ,\nstandardized to have variance 1; and the log-normal distribution LN 0, 1 , standardized to have\nvariance 1. Model II considers heteroscedastic errors across segments,\n\n0.5. We set J xi\n\n1, . . . , p0\n\nsgn nxi\n\n1\n\ntj\n\n2, j\n\nModel II : Yi\n\nh J ti\n\n\u270fi\n\nvj,\n\n1 J ti\n\nj 1\n\nwhere vj, j\nthe same as those in model I. The over- and under-segmentation errors are respectively de\ufb01ned as\n\n1, 0.5, 3, 2 3, 0.5, 3, 2 3, 0.5, 3, 2 3, 0.5 . Other speci\ufb01cations remain\n\n1, . . . , 11\n\nd Gn Gn\n\nsup\nb Gn\n\ninf\na Gn\n\na\n\nb , d Gn Gn\n\na\n\nb .\n\nsup\nb Gn\n\ninf\na Gn\n\nlog n 1.5h, where h\n\np0 on average for each prior. Both the selection error and p\n\nFor the BMS procedure, we consider three different priors for \u21e1 , corresponding to the local prior,\nnon-local moment prior and non-local inverse moment prior. Figure 2 presents the relationship\nbetween the maximum of the over- and under-segmentation errors, p\np0 and the value of h with\n0.65 leading to the smallest segmentation error. We take the\nsample size 1000, which indicates h\nminimum distance between candidate points nI\n0.5 generally works\nwell in the simulations.\nFurthermore, we assess the performance of BMS using different priors under model I with a normal\nerror, when p0 is not prespeci\ufb01ed. In Figure 3, we present the selection error which is de\ufb01ned as\nthe maximum of the number of selected change points that are not in T0 p0 and the number of\ntrue change points that are not in T p . The tuning parameters are calibrated to yield the smallest\nsegmentation error and p\np0\nleads to the best convergence among the\ndecrease as the sample size increases, and the prior \u21e1I\nthree prior choices.\nFor a comprehensive comparison with existing methods, we assess BMS under the non-local inverse\nmoment prior \u21e1I\n6 against existing methods including PELT [12],\nWBS [7], NOT with normal or heavy-tail distributions [1] and SML [5]. Table 1 summarizes\nthe numerical results under model I and model II with normal, Student\u2019s t, and log-normal error\ndistributions and their heteroscedastic counterparts. On average, BMS performs the best in selecting\nthe number of change points and balancing both over- and under-segmentation errors. It is expected\nthat the performances of WBS, PELT, and SML deteriorate when the errors do not follow a normal\ndistribution, because they all rely upon parametric model assumptions and thus are not robust to model\nmisspeci\ufb01cations. In contrast, both BMS and NOT behave well under various error distributions.\nAlso, NOT and SML perform the best in controlling the over-segmentation errors, while the resulting\nestimator p tends to be larger than the true p0. On the other hand, BMS allows for slightly larger\nover-segmentation errors in order to maintain p to be more concentrated around p0.\n\n2 and s\n\nwith q\n\n\u232b\n\n3.2 Data sequence with spikes\nWe further evaluate the BMS, NOT and SML methods based on the data sequences contaminated\nwith spike points. Assuming normal noises, we generate 500 sequences and each contains n\n1000\npoints with mean changes of 0.01 and\n0.01 at the 400th and 440th observations, respectively. We\nset the standard deviation of the noise to be 0.002. We further generate 10 random samples uniformly\nin the ranges of\n0.07, 0.08 and 0.07, 0.08 , and add them to the original sequence at random\n\n6\n\n\f5\n3\n\n0\n3\n\n5\n2\n\n0\n2\n\n5\n1\n\n0\n1\n\n5\n\nr\no\nr\nr\ne\n\n \n\nn\no\n\ni\nt\n\na\n\nt\n\nn\ne\nm\na\ng\ne\ns\n \nm\nu\nm\nx\na\nM\n\ni\n\nModel I\nModel II\n\n5\n\n.\n\n3\n\n0\n\n.\n\n3\n\n5\n\n.\n\n2\n\n0\n\n.\n\nModel I\nModel II\n\n2\n\n0\np\n\u2212\np^\n\n5\n\n.\n\n1\n\n0\n\n.\n\n1\n\n5\n\n.\n\n0\n\n0\n\n.\n\n0\n\n0.3\n\n0.4\n\n0.5\n\n0.6\n\n0.7\n\n0.8\n\n0.9\n\n1.0\n\n0.3\n\n0.4\n\n0.5\n\n0.6\n\n0.7\n\n0.8\n\n0.9\n\n1.0\n\nh\n\nh\n\nFigure 2: The maximum segmentation error (left) and p\nin the minimum distance between candidate points) over 100 simulations with sample size n\n\np0 (right) versus h (the tuning parameter\n1000.\n\n0\n1\n\n8\n\nr\no\nr\nr\ne\n\n \n\nn\no\n\n6\n\ni\nt\nc\ne\ne\nS\n\nl\n\n \n\n4\n\n2\n\n\u03c0L\n\u03c0M\n\u03c0I\n\n\u03c0L\n\u03c0M\n\u03c0I\n\n7\n\n6\n\n5\n\n4\n\n0\np\n\u2212\np^\n\n3\n\n2\n\n1\n\n0\n\n200\n\n400\n\n600\n\n800\n\n1000\n\n200\n\n400\n\n600\n\n800\n\n1000\n\nSample size\n\nSample size\n\nFigure 3: The selection error (left) and p\np0 (right) averaged over 500 simulations under three\ndifferent prior distributions: the local prior \u21e1L, non-local moment prior \u21e1M, and non-local inverse\nmoment prior \u21e1I.\n\nlocations to form the spike points. These con\ufb01gurations are chosen to mimic the real data setting. We\nimplement BMS, NOT, and SML on the simulated samples, and for BMS we select nI\n12 which\nis the largest integer that is smaller than 0.65 log n 1.5.\nTable 2 shows that BMS, resulting in the smallest p\np0 on average, is insensitive to the spike\npoints. Figure 4 illustrates the change points detection results for three simulated data sequences. It is\nobserved that NOT ignores both the change points with small signal-noise ratios and the spike signals\nwith small segment lengths, because NOT is more appropriate for settings where the segments are\nof comparable lengths. On the other hand, SML is sensitive to extreme values, as it is developed to\nhandle frequent and irregular change points. It appears that BMS is the most suitable procedure for\nthis case, because not only does it reinforce the minimal segment length to avoid false identi\ufb01cation\nof spike signals but it also retains the minimal segment length to detect change points with small\ndistance gaps.\n\n7\n\n\fTable 1: Comparison results averaged over 200 simulations among the BMS, PELT, WBS, NOT and\nSML methods under models I and II with three error distributions: N 0, 1 , t 5 , and log-normal\nLN 0, 1 , and those with heteroscedastic variances. Standard deviations are given in parentheses.\nError\nDistribution Method\nN 0, 1\n\n3\n\n3\n\nBMS\nPELT\nWBS\nNOT\nSML\nBMS\nPELT\nNOT\nSML\nBMS\nPELT\nNOT\nSML\nBMS\nPELT\nNOT\nSML\nBMS\nPELT\nNOT\nSML\nBMS\nPELT\nNOT\nSML\n\nt 5\n\nLN 0, 1\n\nHetero-\nscedastic\nN 0, 1\n\nHetero-\nscedastic\nt 5\n\nHetero-\nscedastic\nLN 0, 1\n\np\n1\n1\n37\n0\n0\n0\n8\n31\n3\n0\n12\n21\n4\n0\n13\n31\n0\n0\n14\n35\n5\n0\n20\n21\n4\n0\n\np0\n0\n197\n162\n194\n192\n132\n190\n165\n184\n42\n180\n135\n183\n0\n176\n169\n150\n119\n181\n159\n179\n43\n173\n142\n183\n2\n\n2\n0\n1\n0\n0\n0\n0\n4\n0\n0\n0\n2\n1\n0\n0\n0\n0\n0\n0\n4\n1\n0\n0\n2\n0\n0\n\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n1\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n0\n\n1\n2\n0\n6\n7\n52\n2\n0\n3\n34\n6\n15\n7\n0\n8\n0\n23\n55\n4\n2\n10\n22\n7\n22\n7\n1\n\n2\n0\n0\n0\n1\n13\n0\n0\n6\n44\n1\n23\n1\n4\n3\n0\n21\n20\n1\n0\n2\n41\n0\n12\n4\n0\n\n0\n0\n0\n0\n3\n0\n0\n4\n80\n2\n3\n4\n196\n0\n0\n6\n6\n0\n0\n3\n94\n0\n1\n2\n197\n\nd Gn Gn\n2.41 (6.06)\n0.91 (1.19)\n1.22 (4.13)\n1.93 (8.04)\n12.94 (42.98)\n2.15 (5.76)\n0.95 (1.03)\n7.57 (27.70)\n40.13 (53.68)\n3.69 (12.12)\n12.10 (29.63)\n6.06 (26.32)\n111.77 (52.05)\n3.69 (7.08)\n1.49 (1.53)\n7.52 (12.60)\n6.75 (23.98)\n2.79 (4.87)\n1.50 (1.83)\n8.15 (25.01)\n26.74 (35.42)\n3.73 (11.72)\n6.07 (16.09)\n6.32 (24.67)\n68.05 (47.15)\n\nd Gn Gn\n1.96 (3.94)\n6.32 (11.92)\n0.86 (0.79)\n0.75 (0.80)\n0.78 (0.90)\n2.83 (7.01)\n6.24 (12.24)\n1.51 (2.57)\n0.88 (0.87)\n3.11 (6.89)\n7.22 (13.32)\n1.18 (4.45)\n0.73 (1.33)\n3.88 (7.36)\n6.15 (11.44)\n1.66 (1.68)\n1.37 (1.52)\n4.21 (8.17)\n7.76 (13.48)\n2.36 (4.70)\n1.27 (1.59)\n4.20 (7.85)\n8.10 (13.93)\n1.42 (3.70)\n0.87 (1.67)\n\nl\n\ne\nv\ne\n\nl\n\nl\n\na\nn\ng\nS\n\ni\n\nl\n\nl\n\ne\nv\ne\ne\nl\nv\n \ne\n \nl\nl\na\nl\na\nn\nn\ng\ng\nS\ni\nS\n\ni\n\nl\n\ne\nv\ne\n\nl\n\nl\n\na\nn\ng\nS\n\ni\n\n5\n0\n\n.\n\n0\n\n0\n0\n\n.\n\n0\n\n5\n0\n0\n-\n\n.\n\n5\n0\n\n.\n\n0\n\n0\n0\n\n.\n\n0\n\n5\n0\n0\n-\n\n.\n\n5\n0\n.\n0\n\n0\n0\n.\n0\n\n5\n0\n.\n0\n-\n\nIndex\n\nIndex\n\nObservation Index\n\nIndex\n\nFigure 4: Detection of change points for three simulated data sequences with spike points using BMS\n(the red solid line), NOT (the blue dashed line) and SML (the brown dotted line).\n\n4 MRgRT data\n\nWe illustrate the BMS method with application to the MRgRT data which contain 2265 observations\nordered by the distances from the sources of the radiations. The R code for implementing the BMS\nmethod can be downloaded from our GitHub repository [10]. Throughout the implementation, we\nuse the non-local inverse moment prior \u21e1I \u00b5 with q\n\n2, and we set nI\n\n2 and \u232b\n\n13.\n\n8\n\n\fTable 2: Comparison results averaged over 500 simulations among the BMS, NOT and SML methods\nbased on the data sequences with spike points.\n\nMethod\nBMS\nNOT\nSML\n\n3\n\n0\n0\n0\n\n2\n0\n0\n0\n\np\n\n1\n31\n387\n0\n\np0\n0\n276\n32\n0\n\n1\n113\n20\n0\n\n2\n67\n11\n0\n\n3\n13\n50\n500\n\nBy varying s from 2 to 10, the left panel in Figure 5 shows that when s is small, BMS identi\ufb01es more\nchange points, and as s grows the number of identi\ufb01ed change points decreases. This phenomenon is\nconsistent with Lemma 4 that the convergence rate for the non-local prior is Op exp ns 1 s\n.\nWhen s is small, the Bayes factor vanishes slowly and hence the algorithm picks more spurious\nchange points. When s is suf\ufb01ciently large, the convergence rate approaches Op exp nI\n, and\nhence the algorithm eliminates the \ufb02at points more effectively.\n\nI\n\nY\n\nl\n\ne\nv\ne\n\nl\n \nl\n\nY\na\nn\ng\nS\n\ni\n\nY\n\n0\n1\n0\n\n.\n\n0\n\n5\n0\n0\n\n.\n\n0\n\n0\n0\n0\n\n.\n\n0\n\n5\n0\n0\n0\n-\n\n.\n\n0\n1\n0\n\n.\n\n0\n-\n0\n1\n0\n\n.\n\n0\n\n5\n0\n0\n\n.\n\n0\n\n0\n0\n0\n\n.\n\n0\n\n5\n0\n0\n\n.\n\n0\n-\n\n.\n\n0\n1\n0\n0\n-\n0\n1\n0\n\n.\n\n0\n\n5\n0\n0\n\n.\n\n0\n\n0\n0\n0\n\n.\n\n0\n\n5\n0\n0\n0\n-\n\n.\n\n0\n1\n0\n0\n-\n\n.\n\nDistance from the source\n\nDistance from the source\n\nl\n\ne\nv\ne\nl\n \nl\na\nn\ng\nS\n\ni\n\n0\n1\n0\n.\n0\n\n5\n0\n0\n.\n0\n\n0\n0\n0\n.\n0\n\n5\n0\n0\n.\n0\n-\n\n0\n1\n0\n.\n0\n-\n\ns= 2\n\ns= 5\n\ns = 10\n\n0\n\n12\n\n24\n\n35\n\n47\n\n59\n\n71\n\n83\n\n95\n\n123\n\nDistance from the source\n\nDistance from the source\n\n0\n\n12\n\n24\n\n36\n\n49\n\n63\n\n75\n\n87\n\n101\n\nDistance from the source\n\n2, 5, and 10\nFigure 5: The left panel shows the detection of change points using BMS with s\nrespectively; the right panel shows the detection of change points after restricting the data in the\nrange of\n10 (the red solid line), NOT (the blue dashed line), and\nSML (the brown dotted line).\n\n0.01, 0.01 using BMS with s\n\n0.01, 0.01 . By\nWe further remove the spike points, and thus keep the data within the range of\n\ufb01xing s\n10, we implement BMS, NOT and SML on the truncated data sequence. The right panel in\nFigure 5 shows that the change point detection results using the three methods are largely overlapped.\nThe BMS and NOT methods lead to similar results, and both outperform SML. This implies that\nremoving the spike points improves the accuracy of boundary detection for all the three methods.\n\n5 Conclusion\n\nThe proposed BMS method can consistently identify multiple mean changes in a data sequence,\nwhich effectively removes the \ufb02at points without sacri\ufb01cing the detection accuracy. Our method is\nparticularly useful when the data sequence contains spike points that are not of interest as they are not\nreal change points. The BMS is applied to analyze the MRgRT data for detecting mean changes in the\nsignals, while the NOT, SMT and other methods fail to correctly detect the boundaries. We explore\nthe performance of BMS with different tuning parameters, and the resulting patterns are consistent\nwith the theoretical properties. Moreover, we demonstrate the robustness of BMS to various error\ndistributions.\n\n9\n\n\fAcknowledgment\nThe authors would like to thank Dr. Zhou Shouhao from Department of Biostatistics, M.D. Anderson\nCancer Center for providing the data. The research is partially supported by grants from the Research\nGrants Council of Hong Kong (grant number 27304117 for Jiang and 17326316 for Yin).\n\nReferences\n[1] Baranowski, R., Chen, Y., and Fryzlewicz, P. (2016). Narrowest-over-threshold detection of\n\nmultiple change-points and change-point-like features. arXiv preprint arXiv:1609.00293 .\n\n[2] Bertolino, F., Racugno, W., and Moreno, E. (2000). Bayesian model selection approach to\n\nanalysis of variance under heteroscedasticity. The Statistician pages 503\u2013517.\n\n[3] Conigliani, C. and O\u2019Hagan, A. (2000). Sensitivity of the fractional bayes factor to prior\n\ndistributions. Canadian Journal of Statistics 28, 343\u2013352.\n\n[4] De Santis, F. and Spezzaferri, F. (2001). Consistent fractional bayes factor for nested normal\n\nlinear models. Journal of Statistical Planning and Inference 97, 305\u2013321.\n\n[5] Du, C., Kao, C.-L. M., and Kou, S. (2016). Stepwise signal extraction via marginal likelihood.\n\nJournal of the American Statistical Association 111, 314\u2013330.\n\n[6] Eiauer, P. and Hackl, P. (1978). The use of MOSUMS for quality control. Technometrics 20,\n\n431\u2013436.\n\n[7] Fryzlewicz, P. (2014). Wild binary segmentation for multiple change-point detection. The Annals\n\nof Statistics 42, 2243\u20132281.\n\n[8] Glaz, J., Naus, J. I., Wallenstein, S., Wallenstein, S., and Naus, J. I. (2001). Scan Statistics.\n\nSpringer.\n\n[9] Jeffreys, H. (1998). The Theory of Probability. Oxford University Press.\n[10] Jiang, F., Yin, G., and Dominici, F. (2018). Bayesian model selection approach to boundary\n\ndetection with non-local priors. https://github.com/homebovine/BCP.git.\n\n[11] Johnson, V. E. and Rossell, D. (2010). On the use of non-local prior densities in Bayesian\n\nhypothesis tests. Journal of the Royal Statistical Society: Series B 72, 143\u2013170.\n\n[12] Killick, R., Fearnhead, P., and Eckley, I. (2012). Optimal detection of change points with a\n\nlinear computational cost. Journal of the American Statistical Association 107, 1590\u20131598.\n\n[13] Kirch, C. and Muhsal, B. (2014). A MOSUM procedure for the estimation of multiple random\n\nchange points. Preprint .\n\n[14] Lavielle, M. and Lude\u00f1a, C. (2000). The multiple change-points problem for the spectral\n\ndistribution. Bernoulli 6, 845\u2013869.\n\n[15] Niu, Y. S. and Zhang, H. (2012). The screening and ranking algorithm to detect dna copy\n\nnumber variations. The Annals of Applied Statistics 6, 1306\u20131326.\n\n[16] Preuss, P., Puchstein, R., and Dette, H. (2015). Detection of multiple structural breaks in\n\nmultivariate time series. Journal of the American Statistical Association 110, 654\u2013668.\n\n[17] Walker, S. G. (2004). Modern Bayesian asymptotics. Statistical Science pages 111\u2013117.\n[18] Yao, Y.-C. (1987). Approximating the distribution of the maximum likelihood estimate of\nthe change-point in a sequence of independent random variables. The Annals of Statistics 15,\n1321\u20131328.\n\n[19] Yau, C. Y. and Zhao, Z. (2015). Inference for multiple change points in time series via likelihood\n\nratio scan statistics. Journal of the Royal Statistical Society: Series B 78, 895\u2013916.\n\n10\n\n\f", "award": [], "sourceid": 993, "authors": [{"given_name": "Fei", "family_name": "Jiang", "institution": "The University of Hong Kong"}, {"given_name": "Guosheng", "family_name": "Yin", "institution": "University of Hong Kong"}, {"given_name": "Francesca", "family_name": "Dominici", "institution": "Harvard University"}]}