{"title": "A Non-parametric Learning Method for Confidently Estimating Patient's Clinical State and Dynamics", "book": "Advances in Neural Information Processing Systems", "page_first": 2020, "page_last": 2028, "abstract": "Estimating patient's clinical state from multiple concurrent physiological streams plays an important role in determining if a therapeutic intervention is necessary and for triaging patients in the hospital. In this paper we construct a non-parametric learning algorithm to estimate the clinical state of a patient. The algorithm addresses several known challenges with clinical state estimation such as eliminating bias introduced by therapeutic intervention censoring, increasing the timeliness of state estimation while ensuring a sufficient accuracy, and the ability to detect anomalous clinical states. These benefits are obtained by combining the tools of non-parametric Bayesian inference, permutation testing, and generalizations of the empirical Bernstein inequality. The algorithm is validated using real-world data from a cancer ward in a large academic hospital.", "full_text": "A Non-parametric Learning Method for Con\ufb01dently\n\nEstimating Patient\u2019s Clinical State and Dynamics\n\nWilliam Hoiles\n\nDepartment of Electrical Engineering\nUniversity of California Los Angeles\n\nLos Angeles, CA 90024\n\nwhoiles@ucla.edu\n\nMihaela van der Schaar\n\nDepartment of Electrical Engineering\nUniversity of California Los Angeles\n\nLos Angeles, CA 90024\nmihaela@ee.ucla.edu\n\nAbstract\n\nEstimating patient\u2019s clinical state from multiple concurrent physiological streams\nplays an important role in determining if a therapeutic intervention is necessary and\nfor triaging patients in the hospital. In this paper we construct a non-parametric\nlearning algorithm to estimate the clinical state of a patient. The algorithm ad-\ndresses several known challenges with clinical state estimation such as eliminating\nthe bias introduced by therapeutic intervention censoring, increasing the timeliness\nof state estimation while ensuring a suf\ufb01cient accuracy, and the ability to detect\nanomalous clinical states. These bene\ufb01ts are obtained by combining the tools of\nnon-parametric Bayesian inference, permutation testing, and generalizations of the\nempirical Bernstein inequality. The algorithm is validated using real-world data\nfrom a cancer ward in a large academic hospital.\n\n1\n\nIntroduction\n\nTimely clinical state estimation can signi\ufb01cantly improve the quality of care for patient\u2019s by informing\nclinicians of patient\u2019s that have entered a high-risk clinical state. This is a challenging problem as the\npatient\u2019s clinical state is not directly observable and must be inferred from the patient\u2019s vital signs\nand the clinician\u2019s domain-knowledge. Several methods exist for estimating the patient\u2019s clinical\nstate including clinical guidelines and risk scores [21, 18]. The limitation with these population\nbased methods is that they are not personalized (e.g. patient models are not unique), can not\ndetect anomalous patient dynamics, and most importantly, are biased due to therapeutic intervention\ncensoring [16]. Therapeutic intervention censoring occurs when a patient\u2019s physiological signals are\nmisclassi\ufb01ed in the training data as a result of the effects caused by therapeutic interventions. To\nimprove the quality of patient care, new methods are needed to overcome these limitations.\nIn this paper we develop an algorithm for estimating a patient\u2019s clinical state based on previously\nrecorded electronic health record (EHR) data. A schematic of the algorithm is provided in Fig.1 which\ncontains three primary components: a) learning the patient\u2019s stochastic model, b) using statistical\ntechniques to evaluate the quality of the estimated stochastic model, and c) performing clinical state\nestimation for new patients based on their estimated models. The works by Fox et al. [10, 9] and\nSaria et al. [19] for temporal segmentation are the most related to our algorithm. However [10, 19]\ndo not apply formal statistical techniques to validate and iteratively update the hyper-parameters\nof the non-parametric Bayesian inference, are not personalized, do not remove the bias caused\nby therapeutic intervention censoring, and do not utilize clinician domain knowledge for clinical\nstate estimation. Additionally, applying fully Bayesian methods [9] for clinical state estimation are\ncomputationally prohibitive as the computational complexity of constructing the stochastic model of\nall patients grows polynomially with the number of samples and maximum number of possible states\nof all patients. The computational complexity of our algorithm is only polynomial in the number\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fof samples and states of a single patient. A detailed literature review is provided in the Supporting\nMaterial.\nThe proposed algorithm (Fig.1) learns a combinatorial stochastic model for each patient based on\ntheir measured vital signs. A non-parametric Bayesian learning algorithm based on the hierarchical\nDirichlet process hidden Markov model (HDP-HMM) [10] is used to learn the patient\u2019s stochastic\nmodel which is composed of a possibly in\ufb01nite state-space HMM where each state is associated with\na unique dynamic model. The algorithm dynamically adjusts the number of detected dynamic models\nand their temporal duration based on the patient\u2019s vital signs\u2013that is, the algorithm has a data-driven\nbound on the model complexity (e.g. number of detected states). The patient\u2019s stochastic model\nprovides a \ufb01ne-grained personalized representation of each patient that is interpretable for clinicians,\nand accounts for the patient\u2019s speci\ufb01c dynamics which may result from therapeutic interventions and\nmedical complications (e.g. disease, paradoxical reaction to a drug, bone fracture). To ensure that\neach detected dynamic model is associated with a unique clinical state, the hyper-parameters in the\nHPD-HMM are updated iteratively using the results from an improved Bonferroni method [2]. This\nmitigates the major weakness of non-parametric Bayesian inference methods of how to select the\nhyper-parameters [14, 12]. Additionally, the algorithm provides statistical guarantees on the dynamic\nmodel parameters using generalizations of the scalar Bernstein inequality [13] to vector-valued\nand matrix-valued random variables. In clinical applications it is desirable to relate a collection of\ndynamic models from several patient\u2019s to a unique clinical state of interest for the clinician (e.g.\ndetecting which patients have entered a high-risk clinical state). The clinician de\ufb01nes a supervised\ntraining set that is composed of all previously observed patient\u2019s dynamic models and their associated\nclinical state, which is then used to construct a similarity metric. This construction of the similarity\nmetric between dynamic models and clinical states ensures that the bias introduced from therapeutic\nintervention censoring is removed, and also allows for the detection of anomalous dynamic models\nthat are not associated with a previously de\ufb01ned clinical state. When a new patient arrives the\nalgorithm will learn their stochastic model, and then use the similarity metric to map the detected\ndynamic models to their associated clinical states of interest.\nThough our algorithm is general and can be applied in several medical settings (e.g. mobile health,\nwireless health) here we focus on detecting the clinical state of patients in hospital wards. Speci\ufb01cally\nwe apply our algorithm to patient\u2019s in a cancer ward of a large academic hospital.\n\nModel and Parameters\n\npersonalization\n\nSegmentation\n\n(cid:18) \ufb01ne-grained\n(cid:18) \ufb01ne-grained\n\nSegmentation\n\npersonalization\n\n(cid:19)\n\n(cid:19)\n\nOf\ufb02ine Learning\n\nno\n\nClinician\n\nValidation\n\nLabel\n\nyes\n\u00afD\nL\n\nSimilarity\n\nClinical State\n\nEstimate\n\nElectronic Health\n\nRecords D\n\nNew Patient\nVitals {yt}t\u2208T\n\nFigure 1: Schematic of the proposed algorithm for learning the dynamic model and estimating the\nclinical state of the patient. From D a valid segmentation \u00afD is constructed and provided to the\nclinician to construct the labeled dataset L. New patient vital signs are labeled using the dataset L.\n\nt}t\u2208T i}i\u2208I, with yi\n\n2 Non-parametric Learning Algorithm for Patient\u2019s Stochastic Model\nIn this section we provide a method to segment patient\u2019s electronic health record data D =\n{{yi\nt \u2208 Rm the vital signs of patient i \u2208 I at time t. To segment the tem-\nporal data we assume that the vital signs of each patient originate from a switching multivariate\nGaussian (SMG) process. A Bayesian non-parametric learning algorithm is utilized to select the\nswitching times between the unique dynamic models\u2013that is, we consider the observation dynamics\nand model switching dynamics simultaneously. The \ufb01nal result of the segmentation is the dataset:\n\n\u00afD = {{yi\n\nt}t\u2208T i\n\nk\n\n, k \u2208 {1, . . . , K i} = Ki}i\u2208I\n\n(1)\n\n2\n\n\fk the time samples for segment k and Ki the set of segments for patient i. Statistical methods\nwith T i\nare used to ensure that each dynamic model is associated with a unique clinical state, refer to Sec.3\nfor details.\nWe assume that the switching process between models satis\ufb01es a HMM where each state of the HMM\nis associated with a unique dynamic model given by:\n\n\u03b5t(zt) \u223c N (\u00b5(zt), \u03a3(zt))\n\nyt = \u03b5t(zt)\n\n(2)\nwhere zt \u2208 Ki is the state of the patient, and \u03b5t(zt) is a Gaussian white noise term with covariance\nmatrix \u03a3(zt). For notational convenience we will suppress the indices i and only include explicitly\nwhen required. For segmentation each of the patients is treated independently. Each state zt is assumed\nto evolve according to a HMM with zt associated with a speci\ufb01c segment k \u2208 K. Notice that we\nmust estimate the total number of states |K|, and the associated model parameters {\u00b5(k), \u03a3(k)}k\u2208K\nusing only the data {yt}t\u2208T .\nTo learn the cardinality of the HMM we use the tools of non-parametric Bayesian inference by placing\na prior on the HMM parameters to allow a data-driven estimation of cardinality of the state-space.\nRecall that non-parametric here indicates that for larger sample size T , the number of possible states\n(i.e. dynamic models) can also increase. To model the in\ufb01nite-HMM we use the hierarchical Dirichlet\nprocess (HDP) [3, 22]. The HDP can be interpreted as a HMM with a countably in\ufb01nite state-space.\nThat is, the HDP is a non-parametric prior for the in\ufb01nite-HMM. The main idea of the HDP is to\nlink a countably in\ufb01nite set of Dirichlet processes by sharing atoms among the DPs with each DP\nassociated with a speci\ufb01c state. The stick-breaking construction of the HDP is given by [8, 22]:\n\n\u221e(cid:88)\n\nm\u22121(cid:89)\n\n\u221e(cid:88)\n\nm=1\n\n\u03c6k =\n\nm \u223c H, \u03c60 =\n\n\u03b2m\u03b4m,\n\n\u03b2m = vm\n\n(1 \u2212 vl),\n\nvm \u223c Beta(1, \u03b3),\n\nm=1\n\nl=1\n\n\u03c0km\u03b4m, \u03c0k \u223c DP(\u03b1, \u03b2).\n\n(3)\n\n(cid:17)\n\n(cid:16)\n\n\u03b1 + \u03ba,\n\n\u03b2k = vk\n\n\u03b1\u03b2 + \u03ba\u03b4k\n\n\u03b1 + \u03ba\n\nk\u22121(cid:89)\n\nl=1\n\nk = 1, 2, . . .\n\nt = 1, 2, . . . , T.\n\n(1 \u2212 vl), \u03c0k \u223c DP\n\nEq.(3) represents an in\ufb01nite state HMM with \u03c0km the transition probability of transitioning from\nstate k \u2208 K to state m \u2208 K. \u03c0k represents the transition probabilities out of state k of the HMM with\n\u03b2 the shared prior parameter of the transition distribution, H is a prior on the transition probability\ndistribution, and \u03b1 the concentration of the transition probability distribution of the HMM.\nThe patient\u2019s stochastic model is constructed by combining the SMG (2) with the HDP (or in\ufb01nite\nHMM) and is given by:\nvk \u223c Beta(1, \u03b3),\nzt \u223c \u03c0(\u00b7|zt\u22121) = \u03c0zt\u22121, yt = \u03b5(zt)\n\n(4)\nThe parameter \u03b3 controls how concentrated the state transition function is from state k to state k(cid:48).\nThis can be seen by setting \u03ba = 0 and \u03b1 = 0 such that E[\u03c0k] = \u03b2. If \u03b3 = 1 then the parameter\n\u03b2k in \u03b2 decays at approximately a geometric rate for increasing k. As \u03b3 increases, the decay of the\nelements in \u03b2 decrease. For \u03b1 > 0 and \u03ba > 0 then E[\u03c0k] = (\u03b1\u03b2 + \u03ba\u03b4k)/(\u03b1 + \u03ba), as such \u03ba controls\nthe bias of \u03c0k towards self-transitions\u2013that is, \u03c0(k|k) is given a large weight. The parameter \u03b1 + \u03ba\ncontrols the variability of \u03c0k and the base state transition distribution (\u03b1\u03b2 + \u03ba\u03b4k)/(\u03b1 + \u03ba).\nGiven the patient\u2019s stochastic model (4), non-parametric Bayesian inference are utilized to estimate\nthe model parameters from the patient\u2019s vital signs {yt}t\u2208T . To utilize Bayesian inference we de\ufb01ne\na prior and compute the associated posterior since a \u03c3-\ufb01nite density measure is present. The prior\ndistributions on \u03b2 and \u03c0 are given by:\n\n\u03b2 \u223c Dir(\u03b3/L, . . . , \u03b3/L), \u03c0k \u223c Dir(\u03b1\u03b21, . . . , \u03b1\u03b2k + \u03ba, . . . , \u03b2L) k \u2208 {1, . . . , L}.\n\n(5)\nEq.(5) is the weak limit approximation with truncation level L where L is the largest number of\nexpected states in the estimated HMM from {yt}t\u2208T [25]. Note that as L \u2192 \u221e then (5) approach\nthe HDP. If clinician domain knowledge is not available on the initial hyper-parameters \u03b3, \u03b1, and \u03ba,\nthen it is common to place Beta or Gamma priors on these distributions [25]. For the multivariate\nGaussian we utilize the Normal-Inverse-Wishart prior distribution [11]:\n\n(cid:17)\n(\u00b5 \u2212 \u00b50)(cid:48)\u03a3\u22121(\u00b5 \u2212 \u00b50))\n\n(6)\n\np(\u00b5, \u03a3|\u00b50, \u03bb, S0, v) \u221d |\u03a3| v+m+1\n\n2\n\nexp\n\n(cid:16) \u2212 1\n\n2\n\ntr(vS0\u03a3\u22121 \u2212 \u03bb\n2\n\n3\n\n\fwhere v and S0 are the degrees of freedom and the scale matrix for the inverse-Wishart distribution\non \u03a3, \u00b50 is the prior mean, and \u03bb is the number of prior measurements on the \u03a3 scale. Given the\nprior distribution with associated posterior distributions a MCMC or variational sampler (i.e. Gibbs\nsampler [10], Beam sampler [25], variational Bayes [6, 7]) can be utilized to estimate the parameters\nof the patient\u2019s stochastic model (4) given the data {yt}t\u2208T .\n\nk\n\nt}t\u2208T i\n\n3 Statistical Methods to Evaluate Stochastic Model Quality\nGiven the segmented dataset \u00afD (1) generated from all the patient\u2019s estimated stochastic models (4),\nthis section presents methods to evaluate the quality of \u00afD. This includes testing if the vital signs\n{yi\nfor each patient and unique dynamic model are consistent with a multivariate Gaussian\ndistribution, contain suf\ufb01cient samples to guarantee the accuracy of the dynamic model parameters,\nand that the detected dynamic models for each patient are unique. If the estimated stochastic models\nare of low quality then the hyper-parameters of the non-parametric Bayesian inference algorithm\ncan be iteratively updated to ensure that all the patient\u2019s stochastic models accurately represent their\ndynamics. This is a vital step in medical applications since the results of the non-parametric Bayesian\ninference algorithm are sensitive to the selected hyper-parameters [14, 12]. For example Fig.2(a)\nillustrates a poor quality segmentation that results from poorly selected hyper-parameters.\n\n3.1 Hypothesis Tests for Model Consistency with Segments\nTo ensure model consistency we must test if each segment in \u00afD is consistent with a multivariate\nGaussian process (i.e. samples are independent and normally distributed). To test if the segment\n{yt}t\u2208Tk \u2208 \u00afD contains independent samples we evaluate the autocorrelation function (ACF) [5]\nfor each segment. For {yt}t\u2208Tk the ACF must exponentially decay to zero which indicates that\nthe segment contains independent samples. Note that it is possible for a spurious autocorrelation\nstructure to be present in the segment if the segment is composed of a mixture of Gaussian processes.\nIf this is suspected then the hyper-parameters of the non-parametric Bayesian inference algorithm are\nupdated to increase the number of segments (for example by increasing L or decreasing \u03ba). Since\nthere is no universally most powerful test for multivariate normality, we use the improved Bonferroni\nmethod [23] which contains four af\ufb01ne invariant hypothesis test statistics elevating the need to select\nthe most sensitive single test while retaining the bene\ufb01ts of the these four multivariate normality tests.\n\n3.2 Data-Driven Con\ufb01dence Bounds for Dynamic Model Estimation\nAn important consideration when evaluating the quality of the segmentation \u00afD is that each segment\ncontains suf\ufb01cient samples to con\ufb01dently estimate the mean and covariance {\u00b5, \u03a3} of the SMG\nmodel. This is particularly important in medical applications as it provides an estimate of the\nmaximum number of samples needed to con\ufb01dently estimate {\u00b5, \u03a3} which are used to estimate\nthe clinical state of the patient. Note that the estimated posterior distribution for {\u00b5, \u03a3} can not be\nused to bound the number of samples required. To estimate {\u00b5, \u03a3} given {yt}t\u2208Tk, the maximum\nlikelihood estimators given by:\n\nnk(cid:88)\n(yt \u2212 \u02c6\u00b5(k))(yt \u2212 \u02c6\u00b5(k))(cid:48)\n\n(7)\n\nnk(cid:88)\n\nt=1\n\n\u02c6\u00b5(k) =\n\n1\nnk\n\nyt,\n\n\u02c6\u03a3(k) =\n\n1\nnk\n\nt=1\n\nare used with nk = |Tk| is the total number of samples in segment k \u2208 K. If each vital sign is\nindependent (i.e. spherical multivariate Gaussian distribution) then an empirical Bernstein bound [13]\ncan be constructed to estimate the error between the sample mean \u02c6\u00b5 and the actual mean \u00b5. From the\nempirical Bernstein bound, the minimum number of samples necessary to ensure that P (\u02c6\u00b5(k, j) \u2212\n\u00b5(k, j) \u2265 \u03b5) \u2264 \u03b1 for all segments k \u2208 K and streams j \u2208 {1, . . . , m} for some con\ufb01dence level\n\u03b1 > 0 and tolerance \u03b5 \u2265 0 is given by:\n\nn(\u03b5, \u03b1) \u2265(cid:16) 6\u03c32\n\n(cid:17)\n\nmax + 2\u2206max\u03b5\n\n3\u03b52\n\nln(\n\n1\n\u03b1\n\n)\n\n(8)\n\nmax the maximum possible variance and \u2206max the maximum possible difference between the\n\nwith \u03c32\nmaximum and minimum values of all values in the vital sign data.\n\n4\n\n\fTo construct a relaxed bound on the sample mean \u02c6\u00b5 \u2208 Rm, and a bound on the sample covariance\n\u02c6\u03a3 \u2208 Rm\u00d7m computed using (7), we generalize the empirical Bernstein bound to the multidimensional\ncase. The goal is to construct a bound of the form P (||Z|| \u2265 \u03b5) \u2264 \u03b1 where || \u00b7 || denotes the spectral\nnorm if Z is a matrix, or the 2-norm in the case Z is a vector. To construct a probabilistic bound on\nthe accuracy of the estimated mean we utilize the vector Bernstein inequality given by Theorem 1.\nTheorem 1 Let {Y1, . . . , Yn} be a set of independent random vectors with Yt \u2208 Rm for t \u2208\n{1, . . . , n}. Assume that each vector has uniform bounded deviation such that ||Yt|| \u2264 L \u2200t \u2208\n\n{1, . . . , n}. Writing Z =(cid:80)n\n\nt=1 Yt, then\nP (||Z|| \u2265 \u03b5) \u2264 (2m) exp\n\n(cid:16)\n\n(cid:17)\n\n\u22123\u03b52\n\n6V (Z) + 2L\u03b5\n\nn(cid:88)\n\nt=1\n\n,\n\nV (Z) =\n\nE[||Yt||2\n2].\n\n(9)\n\nThe proof of Theorem 1 is provided in the Supporting Material. To construct the bound on the number\nof samples necessary to estimate the mean we de\ufb01ne Z = \u02c6\u00b5 \u2212 \u00b5 with Yt = (yt \u2212 \u00b5)/n. Using the\ntriangle inequality, Jensen\u2019s inequality, and assuming ||yt||2 \u2264 B1 for some constant B1, we have\nthat:\n\n1 \u2212 ||\u00b5||2\n(10)\nPlugging (10) into (9) results in the minimum number of samples necessary to guarantee that\n(cid:17)\nP (|| \u02c6\u00b5 \u2212 \u00b5|| \u2265 \u03b5) \u2264 \u03b1 with the number of samples n(\u03b5, \u03b1) given by:\n2m\n\u03b1\n\nn(\u03b5, \u03b1) \u2265(cid:16) 6(B2\n\n1 \u2212 ||\u00b5||2\n3\u03b52\n\nV (Z) \u2264 1\nn\n\nL \u2264 2B1\nn\n\n(cid:0)B2\n\n2) + 4B1\u03b5\n\n(cid:1).\n\n(11)\n\nln(\n\n).\n\n2\n\n,\n\nTo bound the number of samples necessary to estimate \u03a3 we utilize the corollary of Theorem 1\nfor real-symmetric matrices with Z = \u02c6\u03a3 \u2212 \u03a3. The bound on the number of samples necessary to\nguarantee P (|| \u02c6\u03a3 \u2212 \u03a3|| \u2265 \u03b5) \u2264 \u03b1, assuming ||\u03a3|| \u2264 ||yt \u2212 \u02c6\u00b5|| \u2264 B2, is given by:\n\nn(\u03b5, \u03b1) \u2265(cid:16) 6B2\n\n(cid:17)\n\n2 + 4B2\u03b5\n\n3\u03b52\n\nln(\n\n2m\n\u03b1\n\n).\n\n(12)\n\nFor a given \u03b1 and \u03b5, and an estimate of the maximum spectral norm of \u03a3 and norm of \u00b5, equations\n(11) and (12) can be used to estimate the minimum number of samples necessary to suf\ufb01ciently\nestimate {\u00b5, \u03a3}. To accurately compute the clinical state from the unique dynamic model, each\nsegment must satisfy (11) and (12), otherwise any clinical state estimation may give unreliable results.\n\n3.3 Statistical Tests for Statistically Identical Dynamic Models\n\nIn this section we construct a novel hypothesis test for mean and covariance equality with a given\ncon\ufb01dence, and design parameters that control the importance of the mean equality compared to\nthe covariance equality. The hypothesis test both evaluates the quality of the estimated stochastic\nmodel, but can also be used to merge statistically identical segments to increase the accuracy of\nthe dynamic model parameter estimates. Given two segments of vital signs, each associated with a\nsupposedly unique dynamic model, we de\ufb01ne the null hypothesis H0 as the equality of the mean and\ncovariance matrices from the two dynamic models, and the alternate hypothesis H1 that either the\nmean or covariance are not equal. Formally:\n\nH0 : \u03a3(k) = \u03a3(k(cid:48)) and \u00b5(k) = \u00b5(k(cid:48)), H1 : \u03a3(k) (cid:54)= \u03a3(k(cid:48)) or \u00b5(k) (cid:54)= \u00b5(k(cid:48)).\n\n(13)\nSeveral methods exist for testing for covariance equality [20] and for mean equality [24], however\nwe wish to test for both covariance and location equality. To test for the global hypothesis H0 in (13),\nnote that H0 and H1 can equivalently be stated as a combination of the sub-hypothesis as follows:\n(14)\n1 : \u03a3(k) (cid:54)= \u03a3(k(cid:48)). To\nwith H 1\nconstruct the hypothesis test for H0 the non-parametric the permutation testing method [17] is used\nwhich allows us to combine the sub-hypothesis tests for covariance and mean equality to construct a\nhypothesis test for H0.\nTo test for the null hypothesis H 1\n0 we utilize Hotelling\u2019s T 2 test as it is asymptotically the most\npowerful invariant test when the data associated with k and k(cid:48) are normally distributed [4]. Given that\n\nH0 : H 1\n1 : \u00b5(k) (cid:54)= \u00b5(k(cid:48)), H 2\n\n0 : \u03a3(k) = \u03a3(k(cid:48)), and H 2\n\n0 : \u00b5(k) = \u00b5(k(cid:48)), H 1\n\nand H1 : H 1\n\n0 \u2229 H 2\n\n1 \u222a H 2\n\n1\n\n0\n\n5\n\n\fyt are generated from a multivariate normal distribution, the test statistic \u03c4 1 follows a T 2 distribution\nsuch that \u03c4 1 \u223c T 2(m, n(k)+n(k(cid:48))\u22122) where n(k) and n(k(cid:48)) are the number of samples in segments\nk and k(cid:48) respectively. To test for the null hypothesis H 2\n0 we utilize the modi\ufb01ed likelihood ratio\nstatistic provided by Bartlett [1], written \u039b\u2217, which is uniformly the most power unbiased test for\ncovariance equality [15]. The test statistic for covariance equality is given by:\n\n\u03c1 = 1 \u2212 2m2 + 3m \u2212 1\n\n6(m + 1)n\n\n\u03c4 2 = \u22122\u03c1 log(\u039b\u2217),\n\n(n/n(k) + n/n(k(cid:48)) \u2212 1), n = n(k) + n(k(cid:48)).\n\nFrom (Theorem 8.2.7 in [15]) the asymptotic cumulative distribution function of \u03c4 2 can be approxi-\nmated by a linear combination of \u03c72 distributions which has a convergence rate of O((\u03c1n)\u22123).\nTo construct the permutation test for H0 Tippett\u2019s combining function [17] is used with H0:\n\u03c4 = min(\u03bb1/k1, \u03bb2/k2) where \u03bb1 and \u03bb2 are the p-values of the sub-hypothesis tests H 1\n0 and\nH 2\n0 respectively, and k1 and k2 are design parameters. If k1 > k2 then the mean equality is weighted\nmore then the covariance equality. If k1 = k2 then both mean equality and covariance equality\nare weighted equally. For the test statistics \u03c4 1 and \u03c4 2 the p-values are given by \u03bb1 = P (\u03c4 1 \u2265 \u03c4 1\n0 )\nand \u03bb2 = P (\u03c4 2 \u2265 \u03c4 2\n0 are realizations of the test statistics. To utilize \u03c4 as a test\nstatistic we require the cumulative distribution function of \u03c4. Note that if H 1\n0 is true (i.e. mean\nequality) then the distributions of \u03c4 1 and \u03c4 2 are independent since \u03c4 1 follows a T 2 distribution which\nresults in \u03bb1 \u223c U(0, 1) and \u03bb2 \u223c U(0, 1) [17]. The cumulative distribution function of \u03c4 is given by\nP (\u03c4 \u2264 x) = (k1 + k2)x\u2212 k1k2x2 for x \u2208 [0, min(1/k1, 1/k2)]. Given P (\u03c4 \u2264 x), for a signi\ufb01cance\nlevel \u03b1, we reject the null hypothesis H0 if \u03c4 \u2264 \u03b4 where \u03b4 is the solution to P (\u03c4 \u2264 \u03b4) = \u03b1. The\n\nparameter \u03b4 is given by: \u03b4 =(cid:0)(k1 + k2) \u2212(cid:112)(k1 + k2)2 \u2212 4\u03b1k1k2(cid:1)/(2k1k2).\n\n0 ) where \u03c4 1\n\n0 and \u03c4 2\n\nFor a given signi\ufb01cance level \u03b1, and design parameters k1 and k2, we can test H0 for the samples\n{yt}t\u2208Tk and {yt}t\u2208Tk(cid:48) by evaluating \u03c40 = min(\u03bb1\n0 the realizations of\nthe p-values for \u03c41 and \u03c42. By repeatedly applying this hypothesis test to segments {yt}t\u2208Tk for\nk \u2208 K we can detect any segments with equal mean and covariance with a signi\ufb01cance level \u03b1.\nSimilar segments can be merged to increase the accuracy of the estimated dynamic model parameters,\nor be used to evaluate the quality of the patient\u2019s stochastic model.\n\n0/k2) with \u03bb1\n\n0 and \u03bb2\n\n0/k1, \u03bb2\n\n4 Estimating Patient\u2019s Clinical State using Clinician Domain-Knowledge\n\nIn this section the Algorithm 1 (Fig.1) is presented which constructs stochastic models of patients\nbased on their historical EHR data and clinician domain-knowledge, and is used to classify the\nclinical state of new patients.\nAlgorithm 1 is composed of \ufb01ve main steps. Step#1 to Step#2 are used to construct the stochastic\nmodels of the patients based on the EHR data D, and to construct the segmented dataset \u00afD (1). The\nstochastic models are constructed using the non-parametric Bayesian inference algorithm from Sec.2.\nStep#2 measures the quality of the stochastic models, and iteratively updates the hyper-parameters\nof the Bayesian inference algorithm to guarantee the quality of the detected dynamic models as\ndiscussed in Sec.3. In Step#3 each segment (e.g. dynamic model) in \u00afD is labelled by the clinician,\nbased on the clinical states of interest, to construct the dataset L. Step#4 and Step#5 involves the\nonline portion of the algorithm which constructs stochastic models for new patients and estimates\ntheir clinical state based on each patient\u2019s estimated stochastic model. Step#4 constructs the\nstochastic model for the new patient, then in Step#5 each unique dynamic model from Step#4 is\nassociated with a clinical state of interest using the labelled dataset L from Step#3. Note that L\ncontains several segments (e.g. dynamic models) that are associated with one clinical state. To\nestimate the clinical state of the new patient a similarity metric based on the Bhattacharyya distance,\nwritten DB(\u00b7), is used. If the minimum Bhattacharyya distance between the new patients segment\nk and next closest segment k(cid:48) \u2208 L is greater then \u03b4th the segment is labelled as anomalous, otherwise\nthe segment is given the label of segment k(cid:48) \u2208 L. Information on the computational complexity\nand implementation details of Algorithm 1 are provided in the Supporting Material.\n\n5 Real-World Clinical State Estimation in Cancer Ward\n\nIn this section Algorithm 1 is applied to a real-world EHR dataset composed of a cohort of patients\nadmitted to a cancer ward. A detailed description of the dataset is provided in the Supporting Material.\n\n6\n\n\falgorithm presented in Sec.2. Using the stochastic models construct the dataset \u00afD (1).\n\nAlgorithm 1 Patient Clinical State Estimation\nStep#1: Construct stochastic models for each patient using D and the non-parametric Bayesian\nStep#2: To evaluate the quality of each stochastic model, each segment in \u00afD from Step#1 is tested\nfor: i) model consistency, ii) suf\ufb01cient samples to guarantee accuracy of dynamic model\nparameter estimates, and iii) statistical uniqueness of segments using the methods in Sec.3.\nIf the quality is not suf\ufb01cient then return to Step#1 with updated hyper-parameters for the\nnon-parametric Bayesian inference algorithm.\nStep#3: Given \u00afD and the clinical states of interest, the clinician constructs the labelled dataset\nL = {({yi\nk), k \u2208 {1, . . . , K i} = Ki}.\n, li\nt}t\u2208T 0, construct the stochastic model of the\nStep#4: For a new patient i = 0 with vital signs {y0\npatient using the Bayesian non-parametric learning algorithm. Then, based on the stochastic\nmodel, construct the segmented vital sign data {{y0\nStep#5: To estimate the label l(k), written \u02c6l(k), of each segment k \u2208 K0 from Step#4, compute the\n\n, k \u2208 {1, . . . , K 0} = K0}.\n\nt}t\u2208T 0\n\nk\n\nt}t\u2208T i\n\nk\n\nsolution to the following optimization problem for each k:\n\nl\u2208L{DB(k, k(cid:48))} \u2265 \u03b4th then \u02c6l(k) = \u2205, else \u02c6l(k) \u2208 argmin\nl\u2208L\n\nif min\nwith \u2205 the anomalous state, Ll \u2208 L the set of segments that are labeled with l, L\u2212l \u2208 L the\nset of all segments that are not labeled as l, and \u03b4th is a threshold. Return to Step#4.\n\nmink(cid:48)\u2208L\u2212l{DB(k, k(cid:48))}\n\n(cid:110) mink(cid:48)\u2208Ll{DB(k, k(cid:48))}\n\n(cid:111)\n\nThe \ufb01rst step of Algorithm 1 is to segment the EHR data based on the estimated stochastic models\nof the patients. Fig.2(a) illustrates the dynamic models of a speci\ufb01c patient\u2019s estimated stochastic\nmodel for \u03ba = 0.1 and S0 = 0.1Im (Im is the identity matrix), and for \u03ba = 1 and S0 = Im. As\nseen, for \u03ba = 0.1 and S0 = 0.1Im several segments have insuf\ufb01cient samples for estimating the\nmodel parameters, and are not statistically unique. However the segments resulting from \u03ba = 1 and\nS0 = Im provide a stochastic model of suf\ufb01cient quality where each segment contains suf\ufb01cient\nsamples to accurately estimate the model parameters, the segments are statistically unique, and\nsatisfy the multivariate normality assumption. Therefore we set \u03ba = 1 and S0 = Im to construct the\nsegmented dataset \u00afD from D. The dataset L is constructed by providing the clinician with \u00afD who\nthen labels each segment as either in the ICU admission clinical state, or non-ICU clinical state.\n\n(a) Dynamic model estimates with {\u03ba, S0} =\n{0.1, 0.1Im} (dotted), and {1, Im} (solid).\n\n(b) Estimated dynamic models for the intervals of\npatient data in Fig.2(d).\n\n(c) Trade off between the TPR and PPV. The\ndashed cross-hair indicates the performance of Al-\ngorithm 1 for \u03b4b = 1.\n\n(d) Physiological signals from the patient with dis-\ncovered models in Fig.2(b).\n\nFigure 2: Dynamic model discovery and performance of Algorithm 1.\n\n7\n\nTime[hours]050010001500DynamicModels051015Time[hours]0500100015002000DynamicModels12345678910ICUAdmissionPositivePredictiveValue0.10.20.30.40.50.6TruePositiveRate0.20.40.60.81Algorithm1RothmanMEWSTime[hours]0500100015002000PhysiologicalValues50100150200Heart-RateDiastolicSystolic\fOf critical importance in medical applications is the accuracy and timeliness of the detection of\nthe clinical state of the patient. Fig.2(b) provides the trade-off between the TPR and PPV between\nAlgorithm 1, Rothman index [18] which is a state-of-the-art method utilized in many hospitals today,\nand MEWS [21] which are dependent on the threshold selected for each. As seen Algorithm 1\nhas a superior performance compared to these two popular risk scoring methods. For example if\nwe require the TPR = 71.9%, then the associated PPV values for the Rothman index and MEWS\nare 26.1% and 18.0% respectively. There is a 11.3% increase in the PPV value for the Rothman\nindex, and 19.4% increase in the PPV for MEWS compared to the PPV of Algorithm 1. We also\ncompare with methods commonly used in medical with the results presented in Table 1. As seen,\nAlgorithm 1 outperforms all these methods for estimating the patient\u2019s clinical state. There are several\npossible reasons that Algorithm 1 outperforms these methods including accounting for therapeutic\ninterventions and utilizing \ufb01ne-grained personalization. Note that the results in Table 1 are computed\n12 hours prior to ICU admission or hospital discharge. Additionally, the average detection time of\nICU admission or discharge using Algorithm 1 is approximately 24 hours prior to the clinician\u2019s\ndecision. This timeliness ensures that the patient\u2019s clinical state estimate provides clinicians with\nsuf\ufb01cient warning to apply a therapeutic intervention to stabilize the patient.\n\nTable 1: Accuracy of Methods for Predicting ICU Admission\n\nAlgorithm\nAlgorithm 1\n\nRothman Index\n\nSVMs\n\nMEWS\n\nLogistic Regression\nLasso Regularization\n\nRandom Forest\n\nTPR(%)\n71.9%\n53.9%\n28.1%\n55.7%\n55.8%\n44.5%\n32.2%\n\nPPV(%)\n37.4%\n34.5%\n26.3%\n30.7%\n30.3%\n31.1%\n29.9%\n\nA key feature of Algorithm 1 is that it learns the number of unique dynamic models for each patient,\nand as more data is collected the number of unique dynamic models discovered may increase. Fig.2(b)\nillustrates this process for a patient with associated physiological signals given in Fig.2(d). The\nhorizontal dashed line indicates the intervals and associated discovered dynamic models. Note that\ntypical hospitalization time for cancer ward patients in the dataset range from 4 hours to over 85\ndays. As seen, as more samples are obtained for the patient the number of dynamic models that\ndescribe the patient\u2019s dynamics increase. Additionally, there is good agreement between where the\npatient\u2019s dynamics change for the different time intervals. For example the change point at 40 hours\nafter hospitalization occurs as a result of an increase in the systolic and diastolic blood pressure, and\na decrease in the heart-rate. At 1700 hours the change in state results from a dramatic increase in\nboth the systolic and diastolic blood pressure, and a decrease in the heart-rate. From Fig.2(d) these\nphysiological signals were not observed previously, therefore Algorithm 1 correctly detects that this\nis a new unique state for the patient. Though Algorithm 1 can identify changes in patient state, the\ndomain-knowledge from the clinician is required to de\ufb01ne the clinical state of the patient. Only\ndynamic models 8 and 9 are associated with the ICU admission state.\nFurther results are provided in the Supporting Material that illustrate how current methods for\nconstructing risk scores suffer from the bias introduced from therapeutic intervention censoring, and\nhow a binary threshold \u03b4b can be introduced into Algorithm 1 for controlling the TPR and PPV for\nclinical state estimation.\n\n6 Conclusion\n\nIn this paper a novel non-parametric learning algorithm for con\ufb01dently learning stochastic models of\npatient\u2019s and classifying their associated clinical state was presented. Compared to state-of-the-art\nclinical state estimation methods our algorithm eliminates the bias caused by therapeutic intervention\ncensoring, is personalized to the patient\u2019s speci\ufb01c dynamics resulting from medical complication\n(e.g. disease, drug interactions, physical contusions or fractures), and can detect anomalous clinical\nstates. The algorithm was applied to real-world patient data from a cancer ward in a large academic\nhospital, and found to have a signi\ufb01cant improvement in classifying patient\u2019s clinical state in both\naccuracy and timeliness compared with current state-of-the-art methods such as the Rothman index.\nThe algorithm provides valuable information to allow clinicians to make informed decisions about\nselecting if a therapeutic intervention is necessary to improve the clinical state of the patients.\n\n8\n\n\fAcknowledgments\nThis research was supported by: NSF ECCS 1462245, and the Airforce DDDAS program.\nReferences\n[1] M. Bartlett. Properties of suf\ufb01ciency and statistical tests. Proc. Roy. Soc. London A, 160:268\u2013282, 1937.\n[2] D. Basso, F. Pesarin, L. Salmaso, and A. Solari. Permutation Tests. Springer, 2009.\n[3] M. Beal, Z. Ghahramani, and C. Rasmussen. The in\ufb01nite hidden Markov model. In Advances in neural\n\ninformation processing systems, pages 577\u2013584, 2001.\n\n[4] M. Bilodeau and D. Brenner. Theory of Multivariate Statistics. Springer, 2008.\n[5] P. Brockwell and R. Davis. Time series: theory and methods. Springer Science & Business Media, 2013.\n[6] M. Bryant and E. Sudderth. Truly nonparametric online variational inference for hierarchical Dirichlet\n\nprocesses. In Advances in Neural Information Processing Systems, pages 2699\u20132707, 2012.\n\n[7] T. Campbell, J. Straub, J. Fisher, and J. How. Streaming, distributed variational inference for Bayesian\n\nnonparametrics. In Advances in Neural Information Processing Systems, pages 280\u2013288, 2015.\n\n[8] T. Ferguson. A Bayesian analysis of some nonparametric problems. The annals of statistics, pages 209\u2013230,\n\n1973.\n\n[9] E. Fox, M. Jordan, E. Sudderth, and A. Willsky. Sharing features among dynamical systems with beta\n\nprocesses. In Advances in Neural Information Processing Systems, pages 549\u2013557, 2009.\n\n[10] E. Fox, E. Sudderth, M. Jordan, and A. Willsky. An HDP-HMM for systems with state persistence. In\n\nProceedings of the 25th international conference on Machine learning, pages 312\u2013319. ACM, 2008.\n\n[11] A. Gelman, J. Carlin, H. Stern, and D. Rubin. Bayesian data analysis, volume 2. Taylor & Francis, 2014.\n[12] A. Johnson, M. Ghassemi, S. Nemati, K. Niehaus, D. Clifton, and G. Clifford. Machine learning and\n\ndecision support in critical care. Proceedings of the IEEE, 104(2):444\u2013466, 2016.\n\n[13] A. Maurer and M. Pontil. Empirical Bernstein bounds and sample variance penalization. COLT, 2009.\n[14] G. Montanez, S. Amizadeh, and N. Laptev.\n\nInertial Hidden Markov Models: Modeling change in\n\nmultivariate time series. In AAAI, pages 1819\u20131825, 2015.\n\n[15] R. Muirhead. Aspects of multivariate statistical theory. Wiley, 1982.\n[16] C. Paxton, A. Niculescu-Mizil, and S. Saria. Developing predictive models using electronic medical\nrecords: challenges and pitfalls. In Annual Symposium proceedings/AMIA Symposium. AMIA Symposium,\nvolume 2013, pages 1109\u20131115. American Medical Informatics Association, 2012.\n\n[17] F. Pesarin and L. Salmaso. Permutation tests for complex data: theory, applications and software. John\n\nWiley & Sons, 2010.\n\n[18] M. Rothman, S. Rothman, and J. Beals. Development and validation of a continuous measure of patient\ncondition using the electronic medical record. Journal of biomedical informatics, 46(5):837\u2013848, 2013.\n[19] S. Saria, D. Koller, and A. Penn. Learning individual and population level traits from clinical temporal\ndata. In Proc. Neural Information Processing Systems (NIPS), Predictive Models in Personalized Medicine\nworkshop. Citeseer, 2010.\n\n[20] J. Schott. A test for the equality of covariance matrices when the dimension is large relative to the sample\n\nsizes. Computational Statistics & Data Analysis, 51(12):6535\u20136542, 2007.\n\n[21] P. Subbe, M. Kruger, P. Rutherford, and L. Gemmel. Validation of a modi\ufb01ed Early Warning Score in\n\nmedical admissions. Qjm, 94(10):521\u2013526, 2001.\n\n[22] Y. W. Teh, M. Jordan, M. Beal, and D. Blei. Hierarchical Dirichlet processes. Journal of the american\n\nstatistical association, 2012.\n\n[23] C. Tenreiro. An af\ufb01ne invariant multiple test procedure for assessing multivariate normality. Computational\n\nStatistics & Data Analysis, 55(5):1980\u20131992, 2011.\n\n[24] N. Timm. Applied Multivariate Analysis, volume 1. Springer, 2002.\n[25] J. Van Gael, Y. Saatci, Y. W. Teh, and Z. Ghahramani. Beam sampling for the in\ufb01nite hidden Markov\nmodel. In Proceedings of the 25th international conference on Machine learning, pages 1088\u20131095. ACM,\n2008.\n\n9\n\n\f", "award": [], "sourceid": 1077, "authors": [{"given_name": "William", "family_name": "Hoiles", "institution": "University of California"}, {"given_name": "Mihaela", "family_name": "van der Schaar", "institution": "University of California, Los Angeles"}]}