{"title": "Nonparametric Bayesian Lomax delegate racing for survival analysis with competing risks", "book": "Advances in Neural Information Processing Systems", "page_first": 5002, "page_last": 5013, "abstract": "We propose Lomax delegate racing (LDR) to explicitly model the mechanism of survival under competing risks and to interpret how the covariates accelerate or decelerate the time to event. LDR explains non-monotonic covariate effects by racing a potentially infinite number of sub-risks, and consequently relaxes the ubiquitous proportional-hazards assumption which may be too restrictive. Moreover, LDR is naturally able to model not only censoring, but also missing event times or event types. For inference, we develop a Gibbs sampler under data augmentation for moderately sized data, along with a stochastic gradient descent maximum a posteriori inference algorithm for big data applications. Illustrative experiments are provided on both synthetic and real datasets, and comparison with various benchmark algorithms for survival analysis with competing risks demonstrates distinguished performance of LDR.", "full_text": "Nonparametric Bayesian Lomax delegate racing\n\nfor survival analysis with competing risks\n\nQuan Zhang\n\nMcCombs School of Business\n\nThe University of Texas at Austin\n\nAustin, TX 78712\n\nquan.zhang@mccombs.utexas.edu\n\nmingyuan.zhou@mccombs.utexas.edu\n\nMingyuan Zhou\n\nMcCombs School of Business\n\nThe University of Texas at Austin\n\nAustin, TX 78712\n\nAbstract\n\nWe propose Lomax delegate racing (LDR) to explicitly model the mechanism of\nsurvival under competing risks and to interpret how the covariates accelerate or\ndecelerate the time to event. LDR explains non-monotonic covariate effects by\nracing a potentially in\ufb01nite number of sub-risks, and consequently relaxes the ubiq-\nuitous proportional-hazards assumption which may be too restrictive. Moreover,\nLDR is naturally able to model not only censoring, but also missing event times or\nevent types. For inference, we develop a Gibbs sampler under data augmentation\nfor moderately sized data, along with a stochastic gradient descent maximum a\nposteriori inference algorithm for big data applications. Illustrative experiments\nare provided on both synthetic and real datasets, and comparison with various\nbenchmark algorithms for survival analysis with competing risks demonstrates\ndistinguished performance of LDR.\n\n1\n\nIntroduction\n\nIn survival analysis, one can often use nonparametric approaches to \ufb02exibly estimate the survival\nfunction from lifetime data, such as the Kaplan\u2013Meier estimator [1], or to estimate the intensity\nof a point process for event arrivals, such as the isotonic Hawkes process [2] and neural Hawkes\nprocess [3] that can be applied to the analysis of recurring events. When exploring the relationship\nbetween the covariates and time to events, existing survival analysis methods often parameterize the\nhazard function with a weighted linear combination of covariates. One of the most popular ones is\nthe Cox proportional hazards model [4], which is semi-parametric in that it assumes a non-parametric\nbaseline hazard rate to capture the time effect. These methods are often applied to population-level\nstudies that try to unveil the relationship between the risk factors and hazard function, such as to what\ndegree a unit increase in a covariate is multiplicative to the hazard rate. However, the interpretability\nis often obtained by sacri\ufb01cing model \ufb02exibility, because the proportional-hazards assumption is\nviolated when the covariate effects are non-monotonic. For example, both very high and very low\nambient temperature were related to high mortality rates in Valencia, Spain, 1991-1993 [5], and a\nsigni\ufb01cantly increased mortality rate is associated with both underweight and obesity [6].\nTo accommodate nonlinear covariate effects such as non-monotonicity, existing (semi-)parametric\nmodels often expand the design matrix with transformed data, like the basis functions of smoothing\nsplines [7, 8] and other transformations guided by subjective knowledge. Instead of using hand-\ndesigned data transformations, there are several recent studies in machine learning that model\ncomplex covariate dependence with \ufb02exible functions, such as deep exponential families [9], neural\nnetworks [10\u201312] and Gaussian processes [13]. With enhanced \ufb02exibilities, these recent approaches\nare often good at assessing individual risks, such as predicting a patient\u2019s hazard function or survival\ntime. However, except for the Gaussian process whose results are not too dif\ufb01cult to interpret for\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\flow-dimensional covariates, they often have dif\ufb01culty in explaining how the survival is impacted\nby which covariates, limiting their use in survive analysis where interpretability plays a critical role.\nSome approaches discretize the real-valued survival time and model the surviving on discrete time\npoints or intervals [14\u201317]. They transform the time-to-event modeling problem into regression,\nclassi\ufb01cation, or ranking ones, at the expense of losing continuity information implied by the survival\ntime and potentially having inconsistent categories between training and testing.\nIn survival analysis, it is very common to have competing risks, in which scenario the occurrence of\nan event under a risk precludes events under any other risks. For example, if the event of interest is\ndeath, then all possible causes of death are competing risks to each other, since a subject that died of\none cause would never die of any other cause. Apart from modeling the time to event, in the presence\nof competing risks, it is also important to model the event type, or under which risk the event is likely\nto occur \ufb01rst. Though one can censor subjects with an occurrence of the event under a competing risk\nother than the risk of special interest, so that every survival model that can handle censoring is able to\nmodel competing risks, it is problematic to violate the principle of non-informative censoring [18,19].\nThe analysis of competing risks should be carefully designed and people often model two types of\nhazard functions, cause speci\ufb01c [20, 21] and subdistribution [20\u201322] hazard functions. The former\napplies to studying etiology of diseases, while the latter is favorable when developing prediction\nmodels and risk-censoring systems [19].\nIn the analysis of competing risks, there is also a trade-off between interpretability and \ufb02exibility. The\naforementioned cause speci\ufb01c and subdistribution hazard functions use a Cox model with competing\nrisk [19, 23] and a Fine-Gray subdistribution model [22], respectively, which are both proportional\nhazard models. Both models are semi-parametric, and assume that the hazard rate is proportional to\nthe exponential of the inner product of the covariate and regression coef\ufb01cient vectors, along with a\nnonparametric baseline hazard function. However, the existence of non-monotonic covariate effects\ncan easily challenge and break the proportional-hazards assumption inherited from their corresponding\nsingle-risk model. This barrier has been surmounted by nonparametric approaches, such as random\nsurvival forests [24], Gaussian processes with a single layer [25] or two [26], and classi\ufb01cation-based\nneural networks that discretize the survival time [27]. These models are designed for competing\nrisks, using the covariates as input and the survival times (or their monotonic transformation) or\nprobabilities as output. Though having good model \ufb01t, the non-parametric approaches are speci\ufb01cally\nused for studies at an individual level, such as predicting the survival time, but not able to tell how\nthe covariates affect the survival or cumulative incidence functions [22, 28]. Moreover, it might be\nquestionable for Alaa and van der Schaar [26] to assume a normal distribution on survival times\nwhich are positive almost surely and asymmetric in general.\nTo this end, we construct Lomax delegate racing (LDR) survival model, a gamma process based\nnonparametric Bayesian hierarchical model for survival analysis with competing risks. The LDR\nsurvival model utilizes the race of exponential random variables to model both the time to event\nand event type and subtype, and uses the summation of a potentially countably in\ufb01nite number\nof covariate-dependent gamma random variables as the exponential distribution rate parameters.\nIt is amenable to not only censoring data, but also missing event types or event times. Code for\nreproducible research is available at https://github.com/zhangquan-ut/Lomax-delegate-racing-for-\nsurvival-analysis-with-competing-risks.\n\n2 Exponential racing and survival analysis\nLet t \u223c Exp(\u03bb) represent an exponential distribution, with probability density function (PDF)\nf (t| \u03bb) = \u03bbe\u2212\u03bbt, t \u2208 R+, where R+ represents the nonnegative side of the real line, and \u03bb > 0 is\nthe rate parameter such that E[t] = \u03bb\u22121 and Var[t] = \u03bb\u22122. Shown below is a well-known property\nthat characterizes a race among independent exponential random variables [29, 30].\nProperty 1 (Exponential racing ). If tj \u223c Exp(\u03bbj), where j = 1, . . . , J, are independent to each\nother, then t = min{t1, . . . , tJ} and the argument of the minimum y = argminj\u2208{1,...,J} tj are\nindependent, satisfying\n\nt \u223c Exp\n\n(1)\nSuppose there is a race among teams j = 1,\u00b7\u00b7\u00b7 , J, whose completion times tj follow Exp(\u03bbj),\nwith the winner being the team with the minimum completion time. Property 1 shows the winner\u2019s\n\nj=1 \u03bbj\n\n\u03bb1\n\nj=1 \u03bbj\n\n.\n\n, y \u223c Categorical\n\nj=1 \u03bbj,\u00b7\u00b7\u00b7 , \u03bbJ\n\n(cid:16)(cid:80)J\n\n(cid:17)\n\n(cid:16)\n\n(cid:46)(cid:80)J\n\n(cid:46)(cid:80)J\n\n(cid:17)\n\n2\n\n\fcompletion time t still follows an exponential distribution and is independent of which team wins\nthe race. In the context of survival analysis, if we consider a competing risk as a team and the\nlatent survival time under this risk as the completion time of the team, then t will be the observed\ntime to event (or failure time) and y the event type (or cause of failure). Exponential racing not\nonly describes a natural mechanism of competing risks, but also provides an attractive modeling\nframework amenable to Bayesian inference, as conditioning on \u03bbj\u2019s, the joint distribution of the event\ntype y and time to event t becomes fully factorized as\n\nP (y, t|{\u03bbj}1,J ) = \u03bbye\u2212t(cid:80)J\n\nj=1 \u03bbj .\n\n(2)\n\nwith P (t \u2208 \u03a8|{\u03bbj}1,J ) =(cid:82)\n\u03a8((cid:80)\nj \u03bbj)e\u2212(cid:80)\n(cid:17)\n(cid:16)\nt|(cid:80)\n\nt, t \u2208 \u03a8|(cid:80)\n\nthe likelihood\n\n(cid:16)\n\n= f\u03a8\n\nP\n\nj \u03bbj\n\na truncated exponential random variable de\ufb01ned by PDF f\u03a8(t| \u03bb) = \u03bbe\u2212\u03bbt/(cid:82)\n\nIn survival analysis, it is rarely the case that both y and t are observed for all observations, and\none often needs to deal with missing data and right or left censoring. We write t \u223c Exp\u03a8(\u03bb) as\n\u03a8 \u03bbe\u2212\u03bbudu, where\nt \u2208 \u03a8 and \u03a8 is an open interval on R+ representing censoring. Concretely, \u03a8 can be (Tr.c.,\u221e),\nindicating right censoring with censoring time Tr.c., can be (0, Tl.c.), indicating left censoring with\ncensoring time Tl.c., or can be a more general case (T1, T2), T2 > T1.\nIf we do not observe y or t, or there exists censoring, we have the following two scenarios, for\nboth of which it is necessary to introduce appropriate auxiliary variables to achieve fully factorized\nlikelihoods: 1) If we only observe y (or t), then we can draw t (or y) shown in (1) as an auxiliary\nvariable, leading to the fully factorized likelihood as in (2); 2) If we do not observe t but know t \u2208 \u03a8\nj \u03bbj), resulting in\n\nj \u03bbj udu, then we draw t \u223c Exp\u03a8((cid:80)\n(cid:17)\n(cid:17)\n\n(3)\nTogether with y, which can be drawn by (1) if it is missing, the likelihood P (y, t, t \u2208 \u03a8|{\u03bbj}1,J )\nbecomes the same as in (2). The procedure of sampling t and/or y, generating fully factorized\nlikelihoods under different censoring conditions, plays a crucial role as a data augmentation scheme\nthat will be used for Bayesian inference of the proposed LDR survival model.\nIn survival analysis with competing risks, one is often interested in modeling the dependence of the\nevent type y and failure time t on covariates x = (1, x1, . . . , xV )(cid:48). Under the exponential racing\nframework, one may simply let \u03bbj = ex(cid:48)\u03b2j , where \u03b2j = (\u03b2j0, . . . , \u03b2jV )(cid:48) is the regression coef\ufb01cient\nvector for the jth competing risk or event type. However, the hazard rate for the jth competing risk,\nexpressed as \u03bbj = ex(cid:48)\u03b2j , is restricted to be log-linear in the covariates x. This clear restriction\nmotivates us to generalize exponential racing to Lomax racing, which can have a time-varying hazard\nrate for each competing risk, and further to Lomax delegate racing, which can use the convolution of\na potentially countably in\ufb01nite number of covariate-dependent gamma distributions to model each \u03bbj.\n\nt \u2208 \u03a8|(cid:80)\n\ne\u2212t(cid:80)\n\n(cid:16)(cid:80)\n\n(cid:16)\n\nj \u03bbj\n\nP\n\n(cid:17)\n\nj \u03bbj\n\n=\n\nj \u03bbj\n\nj \u03bbj .\n\n3 Lomax and Lomax delegate racings\n\nIn this section, we generalize exponential racing to Lomax racing, which relates survival analysis\nwith competing risks to a race of conditionally independent Lomax distributed random variables. We\nfurther generalize Lomax racing to Lomax delegate racing, which races the winners of conditionally\nindependent Lomax racings. Below we \ufb01rst brie\ufb02y review Lomax distribution.\nLet \u03bb \u223c Gamma(r, 1/b) represent a gamma distribution with E[\u03bb] = r/b and Var[\u03bb] = r/b2.\nMixing the rate parameter of an exponential distribution with \u03bb \u223c Gamma(r, 1/b) leads to a Lomax\ndistribution [31] t \u223c Lomax(r, b), with shape r > 0, scale b > 0, and PDF\n\n0 Exp(t; \u03bb)Gamma(\u03bb; r, 1/b)d\u03bb = rbr(t + b)\u2212(r+1), t \u2208 R+.\n\nWhen r > 1, we have E[t] = b/(r \u2212 1), and when r > 2, we have Var[t] = b2r/[(r \u2212 1)2(r \u2212 2)].\nThe Lomax distribution is a heavy-tailed distribution. Its hazard rate and survival function can be\nexpressed as h(t) = r/(t + b) and S(t) = (t + b\u22121)\u2212r, respectively.\n\nf (t| r, b) =(cid:82) \u221e\n\n3.1 Covariate-dependent Lomax racing\n\nWe generalize covariate-dependent exponential racing by letting\n\ntj \u223c Exp(\u03bbj), \u03bbj \u223c Gamma(r, ex(cid:48)\u03b2j ).\n\n3\n\n\fMarginalizing out \u03bbj leads to tj \u223c Lomax(r, e\u2212x(cid:48)\u03b2j ). Lomax distribution was initially introduced\nto study business failures [31] and has since then been widely used to model the time to event in\nsurvival analysis [32\u201335]. Previous research on this distribution [36\u201338], however, has been mainly\nfocused on point estimation of parameters, without modeling covariate dependence and performing\nBayesian inference. We de\ufb01ne Lomax racing as follows.\nDe\ufb01nition 1. Lomax racing models the time to event t and event type y given covariates x as\n\nt = ty, y = argminj\u2208{1,...,J} tj, tj \u223c Lomax(r, e\u2212x(cid:48)\u03b2j ).\n\n(4)\n\nTo explain the notation, suppose a patient has both diabetes (j = 1) and cancer (j = 2), then t1\nwill be the patient\u2019s latent survival time under diabetes and t2 under cancer. The patient\u2019s observed\nsurvival time is min(t1, t2). Note Lomax racing can also be considered as an exponential racing\nmodel with multiplicative random effects, since tj in (4) can also be generated as\n\ntj \u223c Exp(\u0001jex(cid:48)\u03b2j ), \u0001j \u223c Gamma(r, 1).\n\nThere are two clear bene\ufb01ts of Lomax racing over exponential racing. The \ufb01rst bene\ufb01t is that given\nx and \u03b2j, the hazard rate for the jth competing risk, expressed as r/(tj + e\u2212x(cid:48)\u03b2j ), is no longer a\nconstant as ex(cid:48)\u03b2j . The second bene\ufb01t is that closed-form Gibbs sampling update equations can be\nderived, as will be described in detail in Section 4 and the Appendix.\nFor competing risk j, we can also express tj \u223c Exp(\u0001jex(cid:48)\u03b2j ), \u0001j \u223c Gamma(r, 1) as\n\nln(tj) = \u2212x(cid:48)\u03b2j + \u03b5j, \u03b5j = ln(\u03b5j1/\u03b5j2), \u03b5j1 \u223c Exp(1), \u03b5j2 \u223c Gamma(r, 1).\n\nThus Lomax racing regression uses an accelerated failure time model [18] for each of its competing\nrisks. More speci\ufb01cally, with S0(tj) = (tj + 1)\u2212r and h0(tj) = r\nSj(tj) = (ex(cid:48)\u03b2j tj + 1)\u2212r = S0(ex(cid:48)\u03b2j tj), hj(tj) = r(tj + e\u2212x(cid:48)\u03b2j )\u22121 = ex(cid:48)\u03b2j h0(ex(cid:48)\u03b2j tj),\n(5)\nand hence e\u2212x(cid:48)\u03b2j can be considered as the accelerating factor for competing risk j. Considering all\nJ risks, we can express survival function S(t) and hazard function h(t) as\n\ntj +1, we have\n\nS(t) =\n\nSj(t) =\n\n(ex(cid:48)\u03b2j t + 1)\n\n\u2212r =\n\nj=1\n\nj=1\n\nj=1\n\nS0(ex(cid:48)\u03b2j t), h(t) =\n\n\u2212dS(t)/dt\n\nS(t)\n\n=\n\nr\n\nt + e\u2212x(cid:48)\u03b2j\n\n.\n\n(6)\n\nJ(cid:89)\n\nJ(cid:89)\n\nJ(cid:89)\n\nJ(cid:88)\n\nj=1\n\nThe nosology of competing risks is often subjected to human knowledge, diagnostic techniques, and\npatient population. Diseases with the same phenotype, categorized into one competing risk, might\nhave distinct etiology and different impacts on survival, and thus require different therapies. For\nexample, for a patient with both diabetes and cancer, it can be unknown whether the patient has Type\n1 or Type 2 diabetes, where Type 1 is ascribed to insuf\ufb01cient production of insulin from pancreas\nwhereas Type 2 arises from the cells\u2019 failure in responding properly to insulin [39]. In this regard, it\nis often necessary for a model to identify sub-risks within a pre-speci\ufb01ed competing risk, which may\nnot only improve the \ufb01t of survival time, but also help diagnose new disease subtypes. We develop\nLomax delegate racing, assuming that a risk consists of several sub-risks, under each of which the\nlatent failure time is accelerated by the exponential of a weighted linear combination of covariates.\n\n3.2 Lomax delegate racing\n\nBased on the idea of Lomax racing that an individual\u2019s observed failure time is the minimum of latent\nfailure times under competing risks, we further propose Lomax delegate racing (LDR), assuming\na latent failure time under a competing risk is the minimum of the failure times under a number of\nsub-risks appertaining to this competing risk. In particular, let us \ufb01rst denote Gj \u223c \u0393P(G0j, 1/c0j)\nas a gamma process de\ufb01ned on the product space R+ \u00d7 \u2126, where R+ = {x : x > 0}, G0j is a\n\ufb01nite and continuous base measure over a complete separable metric space \u2126, and 1/c0j is a positive\nscale parameter, such that Gj(A) \u223c Gamma(G0j(A), 1/c0j) for each Borel set A \u2282 \u2126. A draw\nfrom the gamma process consists of countably in\ufb01nite non-negatively weighted atoms, expressed as\nDe\ufb01nition 2 (Lomax delegate racing). Given a random draw of a gamma process Gj \u223c\nk=1 rjk\u03b4\u03b2jk, for each j \u2208 {1, . . . , J}, Lomax delegate\nracing models the time to event t and event type y given covariates x as\n\nGj =(cid:80)\u221e\n\u0393P(G0j, 1/c0j), expressed as Gj = (cid:80)\u221e\n\nk=1 rjk\u03b4\u03b2jk. Now we formally de\ufb01ne LDR survival model as follows.\n\ntjk, tjk \u223c Lomax(rjk, e\u2212x(cid:48)\u03b2jk ).\n\n(7)\n\nt = ty, y = argmin\nj\u2208{1,...,J}\n\ntj, tj = tj\u03baj , \u03baj = argmin\nk\u2208{1,...,\u221e}\n\n4\n\n\fIn contrast to specifying a \ufb01xed number of competing risks J, the gamma process not only admits\na race among a potentially in\ufb01nite number of sub-risks, but also parsimoniously shrinks toward\nzero the weights of negligible sub-risks [40, 41], so that the non-monotonic covariate effects on\nthe failure time under a competing risk can be interpreted as the minimum, which is a nonlinear\noperation, of failure times under sub-risks whose accelerating factor is log-linear in x. As shown in\nthe following Corollary, LDR can also be considered as a generalization of exponential racing, where\nthe exponential rate parameter of each competing risk j is a weighted summation of a countably\nin\ufb01nite number of gamma random variables with covariate-dependent weights.\nCorollary 1. Lomax delegate racing survival model can also be expressed as\n\nt = ty, y = argminj\u2208{1,...,J} tj, tj \u223c Exp\n\nex(cid:48)\u03b2jk \u02dc\u03bbjk\n\n, \u02dc\u03bbjk \u223c Gamma(rjk, 1).\n\n(8)\n\n(cid:16)(cid:88)\u221e\n\nk=1\n\n(cid:17)\n\nWe provide in the Appendix the marginal distribution of t in LDR for situations where predicting\nthe failure time is of interest. The survival and hazard functions of LDR, which generalize those of\nLomax racing in (6), can be expressed as\n\nJ(cid:89)\n\n\u221e(cid:89)\n\nJ(cid:89)\n\n\u221e(cid:89)\n\nJ(cid:88)\n\n\u221e(cid:88)\n\nS(t) =\n\nP (Tjk > tj) =\n\n(ex(cid:48)\u03b2jk tj + 1)\n\n\u2212rjk , h(t) =\n\nj=1\n\nk=1\n\nj=1\n\nk=1\n\nj=1\n\nk=1\n\nrjk\n\ntj + e\u2212x(cid:48)\u03b2jk\n\n.\n\n(9)\n\nLDR survival model can be considered as a two-phase racing, where in the \ufb01rst phase, for each of\nthe J pre-speci\ufb01ed competing risk there is a race among countably in\ufb01nite sub-risks, and in the\nsecond phase, J risk-speci\ufb01c failure times race with each other to eventually determine both the\nobserved failure time t and event type y. Moreover, Corollary 1, representing LDR as a single-phase\nexponential racing, more explicitly explains non-monotonic covariate effects on tj by writing the\nexponential rate parameter of tj as the aggregation of {ex(cid:48)\u03b2jk}\u221e\nk=1 weighted by gamma random\nvariables with the shape parameters as the atom weights of the gamma process Gj.\n\n4 Bayesian inference\n\nalmost surely and follows a Poisson distribution with mean(cid:82) \u221e\nbe K by choosing a \ufb01nite and discrete base measure as G0j =(cid:80)K\n\nLDR utilizes a gamma process [42] to support countably in\ufb01nite regression-coef\ufb01cient vectors\nfor each pre-speci\ufb01ed risk. The gamma process Gj \u223c \u0393P(G0j, 1/c0j) has an inherent shrinkage\nmechanism in that the number of atoms whose weights are larger than a positive constant \u0001 is \ufb01nite\n\u0001 r\u22121e\u2212c0j rdr. For the convenience of\nimplementation, as in Zhou et al. [43], we truncate the total number of atoms of a gamma process to\n\u03b30j\nK \u03b4\u03b2jk. Let us denote xi and\nyi as the covariates and the event type, respectively, for individual i \u2208 {1, . . . , n}. We express the\nfull hierarchical model of the (truncated) gamma process LDR survival model as\n\nk=1\n\ntijk, tijk \u223c Exp(\u03bbijk),\n\nyi = argmin\nj\u2208{1,...,J}\n\ntij, ti = min\n\nj\n\ntij, tij = tij\u03baij , \u03baij = argmin\nk\u2208{1,...,K}\n\ni\u03b2jk ), rjk \u223c Gamma(\u03b30j/K, 1/c0j), \u03b2jk \u223c(cid:89)V\n\n\u03bbijk \u223c Gamma(rjk, ex(cid:48)\n\nN (0, \u03b1\u22121\nwhere we further let \u03b1vjk \u223c Gamma(a0, 1/b0). The joint probability given {\u03bbijk}jk is\n(cid:80)S\n(cid:80)K\nP (ti, \u03baiyi, yi |{\u03bbijk}jk) = P (ti |{\u03bbijk}jk)P (\u03baiyi, yi |{\u03bbijk}jk) = \u03bbiyi\u03baiyi\nk=1 \u03bbijk ,\nx!\u0393(r) px(1 \u2212 p)r\nwhich is amenable to posterior simulation for \u03bbijk. Let us denote NB(x; r, p) = \u0393(x+r)\nas the likelihood for negative binomial distribution and \u03c3(x) = 1/(1 + e\u2212x) as the sigmoid function.\nFurther marginalize out \u03bbijk \u223c Gamma(rjk, ex(cid:48)\nP (ti, \u03baiyi, yi | xi,{\u03b2jk}jk) = t\u22121\n\nNB (1(\u03baiyi = k, yi = j); rjk, \u03c3(x(cid:48)\n\ni\u03b2jk ) leads to a fully factorized joint likelihood as\n\ni\u03b2jk + ln ti)) , (11)\n\n(cid:89)\n\n(cid:89)\n\ne\u2212ti\n\nv=0\n\nvjk),\n\n(10)\n\nj=1\n\ni\n\nj\n\nk\n\nwhich is amenable to posterior simulation using the data augmentation based inference technique\nfor negative binomial regression [44, 45]. The augmentation schemes of ti and/or yi discussed in\nSection 2 are used to achieve (11) in the presence of censoring or as a remedy for missing data. We\ndescribe in detail both Gibbs sampling and maximum a posteriori (MAP) inference in the Appendix.\n\n5\n\n\fTable 1: Synthetic data generating process.\n\nSynthetic data 1\n\nSynthetic data 2\n\nti = min(ti1, ti2, 3.5),\n\nti1 \u223c Exp(ex(cid:48)\n\ni\u03b21), ti2 \u223c Exp(ex(cid:48)\n\ni\u03b22 )\n\nti1 \u223c Exp(1/ cosh(x(cid:48)\n\ni\u03b21)), ti2 \u223c Exp(1/| sinh(x(cid:48)\n\ni\u03b22)|)\n\nti = min(ti1, ti2, 6.5),\n\n(a) C-index of risk 1 for\n\n(b) rjk by descending or-\nder for synthetic data 1.\n\nsynthetic data 1.\nfor synthetic data 2.\nFigure 1: Cause-speci\ufb01c C-indices and shrinkage of rjk by LDR for synthetic data 1 and 2.\n\nsynthetic data 2.\n\n(c) C-index of risk 1 for\n\n(d) rjk by descending order\n\n5 Experimental results\n\nIn this section, we validate the proposed LDR model by a variety of experiments using both synthetic\nand real data. Some data description, implementation of benchmark approaches, and experiment\nsettings are deferred to the Appendix for brevity. In all experiments we exclude from the testing data\nthe observations that have unknown failure times or event types. We compare the proposed LDR\nsurvival model, cause-speci\ufb01c Cox proportional hazards model (Cox) [19,23], Fine-Gray proportional\nsubdistribution hazards model (FG) [22] and its boosting algorithm (BST) which is more stable for\nhigh-dimensional covariates [46], and random survival forests (RF) [24], which are all designed\nfor survival analysis with competing risks. We show that LDR performs uniformly well regardless\nof whether the covariate effects are monotonic or not. Moreover, LDR is able to infer the missing\ncause of death and/or survival time of an observation, both of which in general cannot be handled\nby these benchmark methods. The model \ufb01ts of LDR by Bayesian inference via Gibbs sampling\nand MAP inference via stochastic gradient descent (SGD) are comparable. We will report the results\nof Gibbs sampling, as it provides an explicit criterion to prune unneeded model capacity (Steps 1\nand 8 of Appendix B), avoiding the need of model selection and parameter tuning. For large scale\ndata, performing MAP inference via SGD is recommended if Gibbs sampling takes too long to\nrun a suf\ufb01ciently large number of iterations. We quantify model performance by cause-speci\ufb01c\nconcordance index (C-index) [23], where the C-index of risk j at time \u03c4 in this paper is computed as\n\nCj(\u03c4 ) = P (Scorej(xi, \u03c4 ) > Scorej(xi(cid:48), \u03c4 )| yi = j and [ti < ti(cid:48) or yi(cid:48) (cid:54)= j]) ,\n\nwhere i (cid:54)= i(cid:48) and Scorej(xi, \u03c4 ) is a prognostic score at time \u03c4 depending on xi such that its higher\nvalue re\ufb02ects a higher risk of cause j. Intuitively, for cause j, if patient i died of this cause (i.e.,\n(cid:54)= j) or died of this cause but lived\nyi = j), and patient i(cid:48) either died of another cause (i.e., yi(cid:48)\nlonger than patient i (i.e., ti < ti(cid:48)), then it is likely that Scorej(xi, \u03c4 ) for patient i is higher than\nScorej(xi(cid:48), \u03c4 ) for patient i(cid:48), and the ranking of risks for this pair of patients is concordant. C-index\nmeasures such concordance, and a higher value indicates better model performance. Wolbers et\nal. [23] write C-index as a weighted average of time-dependent AUC that is related to sensitivity,\nspeci\ufb01city, and ROC curves for competing risks [47]. So a C-index around 0.5 implies a model failure.\nA good choice of the prognostic score is the cumulative incidence function, i.e, Scorej(xi, \u03c4 ) =\nCIFj(i, \u03c4 ) = P (ti \u2264 \u03c4, yi = j) [18, 22, 28]. Distinct from a survival function that measures the\nprobability of surviving beyond some time, CIF estimates the probability that an event occurs by a\nspeci\ufb01c time in the presence of competing risks. For LDR given {rjk} and {\u03b2jk},\nj(cid:48) ,k \u03bbij(cid:48)k\n\n1 \u2212 e\u2212\u03c4(cid:80)\nwhere the expectation is taken over {\u03bbjk}j,k, where \u03bbijk \u223c Gamma(rjk, ex(cid:48)\ni\u03b2jk ). The expectation\ncan be evaluated by Monte-Carlo estimation if we have a point estimate or a collection of post-burn-in\nMCMC samples of rjk and \u03b2jk.\n\nCIFj(i, \u03c4 ) = P (ti \u2264 \u03c4, yi = j) = E(cid:104) (cid:80)\n(cid:80)\n\n(cid:17)(cid:105)\n\nj(cid:48) ,k \u03bbij(cid:48)k\n\n(cid:16)\n\nk \u03bbijk\n\n,\n\n6\n\nllllll65707580850.51.01.52.02.53.0timeC\u2212index (%)lBSTCoxFGLDRRF0.000.250.500.751.0012345678910krj krisk j=1risk j=2llllll4050607080123456timeC\u2212index (%)lBSTCoxFGLDRRF024612345678910krj krisk j=1risk j=2\f(a) C-index of ABC.\n\n(b) C-index of GCB.\n\n(c) C-index of T3.\n\nFigure 2: Cause-speci\ufb01c C-indices for DLBCL data.\n\n5.1 Synthetic data analysis\nWe \ufb01rst simulate two datasets following Table 1, where xi \u223c N(0, I3), to illustrate the unique\nnonlinear modeling capability of LDR. In Table 1 tij denotes the latent survival time under risk j,\nj = 1, 2 and ti is the observed time to event. The event type yi = arg minj tij if ti < Tr.c., with\nyi = 0 indicating right censoring if ti = Tr.c., where the censoring time Tr.c. = 3.5 for data 1 and\n6.5 for data 2. We simulate 1,000 random observations, and use 800 for training and the remaining\n200 for testing. We randomly take 20 training/testing partitions, on each of which we evaluate the\ntesting cause-speci\ufb01c C-index at time 0.5, 1, 1.5,\u00b7\u00b7\u00b7 , 3 for data 1 and at time 1, 2,\u00b7\u00b7\u00b7 , 6 for data 2.\nThe sample mean \u00b1 standard deviation of the estimated cause-speci\ufb01c C-indices of risks 1, and the\nestimated rjk\u2019s of both risks by LDR (from one random training/testing partition but without loss of\ngenerality) for data 1 are displayed in panels (a) and (b) of Figure 1, respectively. Analogous plots\nfor data 2 are shown in panels (c) and (d). The testing C-indices of risk 2 are analogous to those of\nrisk 1 for both datasets, thus shown in Figure 5 in the Appendix for brevity.\nFor data 1 where the survival times under both risks depend on the covariates monotonically, LDR\nhas comparable performance with Cox, FG, and BST, and all these four models slightly outperform\nRF in terms of the mean values of C-indices. The underperformance of RF in the case of monotonic\ncovariate effects has also been observed in its original paper [24]. For data 2 where the survival\ntime and covariates are not monotonically related, LDR and RF at any time evaluated signi\ufb01cantly\noutperform the other three approaches, all of which fail on this dataset as their C-indices are around\n0.5 for both risks. Panels (b) and (d) of Figure 1 show rjk inferred on data 1 and 2, respectively.\nMore speci\ufb01cally, both risks consist of only one sub-risk for data 1. By contrast, two sub-risks of the\ntwo respective risks can approximate the complex data generating process of data 2.\n\n5.2 Real data analysis\n\nWe analyze a microarray gene-expression pro\ufb01le [48] to assess our model performance on real data.\nThe dataset contains a total of 240 patients with diffuse large B-cell lymphoma (DLBCL). Multiple\nunsuccessful treatments to increase the survival rate suggest that there exist several subtypes of\nDLBCL that differ in responsiveness to chemotherapy. In the DLBCL dataset, Rosenwald et al. [48]\nidentify three gene-expression subgroups, including activated B-cell-like (ABC), germinal-center\nB-cell-like (GCB), and type 3 (T3) DLBCL, which may be related to three different diseases as\na result of distinct mechanisms of malignant transformation. They also suspect that T3 may be\nassociated with more than one such mechanism. In our analysis, we treat the three subgroups and\ntheir potential malignant transformation mechanisms as competing risks from which the patients\nsuffer. As the total number of patients is small which is often the case in survival data, we consider\n434 genes that have no missing values across all the patients. Seven of the 434 genes have been\nreported to be related to clinical phenotypes and four of the seven to have non-monotonic effects on\nthe risk of death [7]. Since some gene expressions may be highly correlated, we follow the same\nselection procedure of Li and Luan [7] to include as covariates the seven genes, together with another\n33 genes having the highest Cox partial score statistic, so that both Cox proportional model and FG\nsubdistribution model for competing risks do not collapse for computational singularity. We use\n200 observations for training and the remaining 40 for testing. We take 20 random training/testing\npartitions and report in Figure 2 boxplots of the testing C-indices evaluated at year 1, 2,\u00b7\u00b7\u00b7 , 6, by the\nsame \ufb01ve approaches used in the analysis of synthetic datasets.\n\n7\n\nllllllllllll406080100123456time (year)C\u2212index (ABC, %)BSTCoxFGLDRRFlllllllll406080100123456time (year)C\u2212index (GCB, %)BSTCoxFGLDRRFlllllll406080100123456time (year)C\u2212index (T3, %)BSTCoxFGLDRRF\f(a) BC.\n\n(b) OC.\n\n(d) OC, with unknown\n\nevent types.\n\nFigure 3: C-indices for SEER breast cancer data.\n\n(c) BC, with unknown\n\nevent types.\n\nThe boxplots of BST and LDR are roughly comparable for ABC, but the median of LDR is slightly\nhigher than those of BST until year 2, and hereafter slightly lower. For GCB and T3, LDR results\nin higher median C-indices than all the other benchmarks do at any time evaluated, indicating LDR\nprovides a big improvement in predicting lymphoma CIFs. Interestingly noted is that RF has low\nperformance in both ABC and GCB, but outperforms Cox, FG, and BST and is comparable to LDR\nin T3. This implies that the gene expressions may have monotonic effects on survival under ABC or\nGCB, but it is not the case for T3, which can be validated by the fact that LDR learns one sub-risk for\nABC and GCB, respectively, and two sub-risks for T3. To better show the improvements of LDR over\nexisting approaches, we calculate the difference of C-indices between LDR and each of the other four\nbenchmarks within each training-testing partition, and report the sample mean and standard deviation\nacross partitions in Table 11 in the Appendix. On average, the improvements of LDR over Cox, FG,\nand BST are bigger for T3 than those for ABC or GCB, whereas LDR outperforms RF by a larger\nmargin for ABC and GCB than for T3. This shows another advantage of LDR that it \ufb01ts consistently\nwell regardless of whether the covariate effects are monotonic or not.\nWe further analyze a publicly accessible dataset from the Surveillance, Epidemiology, and End Results\n(SEER) Program of National Cancer Institute [49]. The SEER dataset we use contains two risks: one\nis breast cancer and the other is \u201cother causes,\u201d which we denote as BC and OC, respectively. It also\ncontains some incomplete observations, each of which with an unknown cause of death but observed\nuncensored time to death, that can be handled by LDR. The individual covariates include the patients\u2019\npersonal information, such as age, gender, race, and diagnostic and therapy information. More details\nare deferred to the Appendix.\nWe \ufb01rst eliminate all observations with unknown causes of death, so we can make comparison\nbetween LDR, Cox, FG, BST, and RF. We take 20 random training/testing partitions of the dataset,\nin each of which 80% of observations are used as training and the remaining 20% as testing. In\nFigure 3, panels (a) and (b) show the boxplots of C-indices for BC and OC, respectively, obtained\nfrom the 20 testing sets by the \ufb01ve models at months 10, 50, 100,\u00b7\u00b7\u00b7 , 300. For BC the C-indices\nby LDR are comparable to those by the other four approaches until month 150 and slightly higher\nafterwards. For the OC the C-indices by LDR are slightly lower than those by Cox, FG, and BST, but\nbecome similar after month 100. Also note that RF underperforms the other four approaches since\nmonth 100 for BC and month 50 for OC. Comparable C-indices from LDR, Cox, FG, and BST imply\nmonotonic impacts of covariates on survival times under both risks. In fact, for either risk we learn a\nsub-risk which dominates the others in terms of weights. Furthermore, we analyze the SEER dataset\nby LDR using the same training/testing partitions, but additionally including the observations having\nmissing causes of death into the 20 training sets, and show the testing C-indices in panels (c) and (d)\nof Figure 3. We see the testing C-indices are very similar to those in (a) and (b). More importantly,\nLDR provides a probabilistic inference on missing time to event or missing causes during the model\ntraining procedure.\nIn Appendix E we further provide the Brier scores [50, 51] of each risk in all data sets over time.\nBrier score quanti\ufb01es the deviation of predicted CIF\u2019s from the actual outcomes and a smaller value\nimplies a better model performance [52]. Tables 2-10 in Appendix E show Brier scores by the\nmodels compared on the four data sets, indicating the model out-of-sample prediction performance\nis basically consistent with those quanti\ufb01ed by C-indices. Speci\ufb01cally, for the cases of synthetic\ndata 1, SEER, and both ABC and GCB of DLBCL, where C-indices imply linear covariate effects,\nthe Brier scores are comparable for Cox, FG, BST, and LDR, and slightly smaller than those of RF.\nFor synthetic data 2 and T3 of DLBCL where C-indices imply nonlinear covariate effects, the Brier\n\n8\n\nlllllllllll4060801001050100150200250300time (month)C\u2212index (breast cancer, %)BSTCoxFGLDRRFllllllll4060801001050100150200250300time (month)C\u2212index (other causes, %)BSTCoxFGLDRRFllll4060801001050100150200250300time (month)C\u2212index (breast cancer, %)LDRlll4060801001050100150200250300time (month)C\u2212index (other causes, %)LDR\f(a) DLBCL.\n\n(b) SEER.\n\nFigure 4: Isomap visualization of the observations and inferred sub-risk representatives.\n\n(cid:17)\n\n(cid:80)\n\nj(cid:48)(cid:80)\n\n\u03bbijk\n\nk(cid:48) \u03bbij(cid:48) k(cid:48)\n\n, \u03bbijk \u223c Gamma(\u02c6rjk, ex(cid:48)\n\ni\n\ni wijkxi/(cid:80)\n\nsentative by evaluating a weighted average of all uncensored observations as(cid:80)\nwhere wijk = E(cid:16)\n\nscores by LDR and RF are smaller than those by Cox, FG, and BST. Moreover, the Brier scores by\nLDR are slightly larger than those of RF for synthetic data 2 but smaller for T3 of DLBCL.\nTo show the interpretability of LDR, we visualize representative individuals, each of which suffers\nfrom an inferred sub-risk. Speci\ufb01cally, for each inferred sub-risk k under risk j, we \ufb01nd the repre-\ni wijk,\n\u02c6\u03b2jk ), and \u02c6rjk and \u02c6\u03b2jk are the estimated\nvalues of rjk and \u03b2jk, respectively. The weight wijk extracts the component of xi that is likely to\nmake the event of sub-risk k under risk j \ufb01rst occur. Then we implement an Isomap algorithm [53]\nand visualize in Figure 4 the representatives along with uncensored observations in both DLBCL and\nSEER. Details are provided in the Appendix.\nIn Figure 4, small symbols denote uncensored observations and large ones the representatives. Panels\n(a) and (b) show the representatives suffering from sub-risks in the DLBCL and SEER dataset, respec-\ntively. In panel (a), we use green for ABC, pink for GCB, and black for T3. The only representative\nsuffering from ABC (GCB) is surrounded by small green (pink) symbols, indicating they signify a\ntypical gene expression pro\ufb01le that may result in the corresponding malignant transformation. There\nare two representatives suffering from the two sub-risks of T3, denoted by a large triangle and a\nlarge diamond, respectively. They approximately lie in the center of the respective cluster of small\ntriangles and diamonds, which denote patients suffering from the corresponding sub-risks of T3 with\nan estimated probability greater than 0.5. The two sub-risks of T3 and the representatives verify the\nheterogeneity of gene expressions under this risk, and strengthen the belief that T3 consists of more\nthan one type of DLBCL [48]. For the SEER data, we randomly select 100 of the 2088 uncensored\nobservations with known event types for visualization. In panel (b), we use green for BC and pink for\nOC. LDR learns only one sub-risk for each of these two risks, and place for each risk a representative\napproximately at the center of the cluster of patients who died of that risk.\n\n6 Conclussion\n\nWe propose Lomax delegate racing (LDR) for survival analysis with competing risks. LDR models\nthe survival times under risks as a two-phase race of sub-risks, which not only intuitively explains\nthe mechanism of surviving under competing risks, but also helps model non-monotonic covariate\neffects. We use the gamma process to support a potentially countably in\ufb01nite number of sub-risks\nfor each risk, and rely on its inherent shrinkage mechanism to remove unneeded model capacity,\nmaking LDR be capable of detecting unknown event subtypes without pre-specifying their numbers.\nLDR admits a hierarchical representation that facilities the derivation of Gibbs sampling under data\naugmentation, which can be adapted for various practical situations such as missing event times or\ntypes. A more scalable (stochastic) gradient descent based maximum a posteriori inference algorithm\nis also developed for big data applications. Experimental results show that with strong interpretability\nand outstanding performance, the proposed LDR survival model is an attractive alternative to existing\nones for various tasks in survival analysis with competing risks.\n\n9\n\n\u221210\u221250510\u22124\u2212202468lllllllllllllllllllllllllllllllllllllllllllllllllllllllABCGCBT3\u22121T3\u22122lABCGCBT3\u22121T3\u22122\u22122002040\u22122002040llllllllllllllllllllllllllllllllllllllllllllllllllllllllbreast cancerother causeslbreast cancerother causes\fAcknowledgments\n\nThe authors acknowledge the support of Award IIS-1812699 from the U.S. National Science Founda-\ntion, and the computational support of Texas Advanced Computing Center.\n\nReferences\n[1] E. L. Kaplan and P. Meier, \u201cNonparametric estimation from incomplete observations,\u201d Journal of the\n\nAmerican statistical association, vol. 53, no. 282, pp. 457\u2013481, 1958.\n\n[2] Y. Wang, B. Xie, N. Du, and L. Song, \u201cIsotonic Hawkes processes,\u201d in International conference on machine\n\nlearning, pp. 2226\u20132234, 2016.\n\n[3] H. Mei and J. M. Eisner, \u201cThe neural Hawkes process: A neurally self-modulating multivariate point\n\nprocess,\u201d in NIPS, pp. 6754\u20136764, 2017.\n\n[4] D. R. Cox, \u201cRegression models and life-tables,\u201d in Breakthroughs in statistics, pp. 527\u2013541, Springer,\n\n1992.\n\n[5] F. Ballester, D. Corella, S. P\u00e9rez-Hoyos, M. S\u00e1ez, and A. Herv\u00e1s, \u201cMortality as a function of temperature.\nA study in Valencia, Spain, 1991-1993.,\u201d International journal of epidemiology, vol. 26, no. 3, pp. 551\u2013561,\n1997.\n\n[6] K. M. Flegal, B. I. Graubard, D. F. Williamson, and M. H. Gail, \u201cCause-speci\ufb01c excess deaths associated\n\nwith underweight, overweight, and obesity,\u201d Jama, vol. 298, no. 17, pp. 2028\u20132037, 2007.\n\n[7] H. Li and Y. Luan, \u201cBoosting proportional hazards models using smoothing splines, with applications to\n\nhigh-dimensional microarray data,\u201d Bioinformatics, vol. 21, no. 10, pp. 2403\u20132409, 2005.\n\n[8] W. Lu and L. Li, \u201cBoosting method for nonlinear transformation models with censored survival data,\u201d\n\nBiostatistics, vol. 9, no. 4, pp. 658\u2013667, 2008.\n\n[9] R. Ranganath, A. Perotte, N. Elhadad, and D. Blei, \u201cDeep survival analysis,\u201d in Machine Learning for\n\nHealthcare Conference, pp. 101\u2013114, 2016.\n\n[10] J. L. Katzman, U. Shaham, A. Cloninger, J. Bates, T. Jiang, and Y. Kluger, \u201cDeepSurv: Personalized\ntreatment recommender system using a Cox proportional hazards deep neural network,\u201d BMC medical\nresearch methodology, vol. 18, no. 1, p. 24, 2018.\n\n[11] X. Zhu, J. Yao, and J. Huang, \u201cDeep convolutional neural network for survival analysis with pathological\nimages,\u201d in Bioinformatics and Biomedicine (BIBM), 2016 IEEE International Conference on, pp. 544\u2013547,\nIEEE, 2016.\n\n[12] P. Chapfuwa, C. Tao, C. Li, C. Page, B. Goldstein, L. Carin, and R. Henao, \u201cAdversarial time-to-event\n\nmodeling,\u201d in ICML, 2018.\n\n[13] T. Fern\u00e1ndez, N. Rivera, and Y. W. Teh, \u201cGaussian processes for survival analysis,\u201d in NIPS, pp. 5021\u20135029,\n\n2016.\n\n[14] M. Luck, T. Sylvain, H. Cardinal, A. Lodi, and Y. Bengio, \u201cDeep learning for patient-speci\ufb01c kidney graft\n\nsurvival analysis,\u201d arXiv preprint arXiv:1705.10245, 2017.\n\n[15] Y. Li, J. Wang, J. Ye, and C. K. Reddy, \u201cA multi-task learning formulation for survival analysis,\u201d in\nProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data\nMining, pp. 1715\u20131724, ACM, 2016.\n\n[16] C.-N. Yu, R. Greiner, H.-C. Lin, and V. Baracos, \u201cLearning patient-speci\ufb01c cancer survival distributions as\n\na sequence of dependent regressors,\u201d in NIPS, pp. 1845\u20131853, 2011.\n\n[17] X. Miscouridou, A. Perotte, N. Elhadad, and R. Ranganath, \u201cDeep survival analysis: Nonparametrics and\n\nmissingness,\u201d in Machine Learning for Healthcare Conference, 2018.\n\n[18] J. D. Kalb\ufb02eisch and R. L. Prentice, The statistical analysis of failure time data, vol. 360. John Wiley &\n\nSons, 2011.\n\n[19] P. C. Austin, D. S. Lee, and J. P. Fine, \u201cIntroduction to the analysis of survival data in the presence of\n\ncompeting risks,\u201d Circulation, vol. 133, no. 6, pp. 601\u2013609, 2016.\n\n[20] H. Putter, M. Fiocco, and R. B. Geskus, \u201cTutorial in biostatistics: Competing risks and multi-state models,\u201d\n\nStatistics in medicine, vol. 26, no. 11, pp. 2389\u20132430, 2007.\n\n[21] B. Lau, S. R. Cole, and S. J. Gange, \u201cCompeting risk regression models for epidemiologic data,\u201d American\n\njournal of epidemiology, vol. 170, no. 2, pp. 244\u2013256, 2009.\n\n[22] J. P. Fine and R. J. Gray, \u201cA proportional hazards model for the subdistribution of a competing risk,\u201d\n\nJournal of the American statistical association, vol. 94, no. 446, pp. 496\u2013509, 1999.\n\n10\n\n\f[23] M. Wolbers, P. Blanche, M. T. Koller, J. C. Witteman, and T. A. Gerds, \u201cConcordance for prognostic\n\nmodels with competing risks,\u201d Biostatistics, vol. 15, no. 3, pp. 526\u2013539, 2014.\n\n[24] H. Ishwaran, T. A. Gerds, U. B. Kogalur, R. D. Moore, S. J. Gange, and B. M. Lau, \u201cRandom survival\n\nforests for competing risks,\u201d Biostatistics, vol. 15, no. 4, pp. 757\u2013773, 2014.\n\n[25] J. E. Barrett and A. C. Coolen, \u201cGaussian process regression for survival data with competing risks,\u201d arXiv\n\npreprint arXiv:1312.1591, 2013.\n\n[26] A. M. Alaa and M. v. d. Schaar, \u201cDeep multi-task gaussian processes for survival analysis with competing\n\nrisks,\u201d in NIPS, 2017.\n\n[27] C. Lee, W. R. Zame, J. Yoon, and M. van der Schaar, \u201cDeepHit: A deep learning approach to survival\n\nanalysis with competing risks,\u201d AAAI, 2018.\n\n[28] M. J. Crowder, Classical competing risks. CRC Press, 2001.\n[29] S. M. Ross, Introduction to Probability Models. Academic Press, 10th ed., 2006.\n[30] F. Caron and Y. W. Teh, \u201cBayesian nonparametric models for ranked data,\u201d in NIPS, pp. 1520\u20131528, 2012.\n[31] K. Lomax, \u201cBusiness failures: Another example of the analysis of failure data,\u201d Journal of the American\n\nStatistical Association, vol. 49, no. 268, pp. 847\u2013852, 1954.\n\n[32] J. Myhre and S. Saunders, \u201cScreen testing and conditional probability of survival,\u201d Lecture Notes-\n\nMonograph Series, pp. 166\u2013178, 1982.\n\n[33] H. A. Howlader and A. M. Hossain, \u201cBayesian survival estimation of Pareto distribution of the second\nkind based on failure-censored data,\u201d Computational statistics & data analysis, vol. 38, no. 3, pp. 301\u2013314,\n2002.\n\n[34] E. Cramer and A. B. Schmiedt, \u201cProgressively Type-II censored competing risks data from Lomax\n\ndistributions,\u201d Computational Statistics & Data Analysis, vol. 55, no. 3, pp. 1285\u20131303, 2011.\n\n[35] F. Hemmati and E. Khorram, \u201cOn adaptive progressively Type-II censored competing risks data,\u201d Commu-\n\nnications in Statistics-Simulation and Computation, pp. 1\u201323, 2017.\n\n[36] S. Al-Awadhi and M. Ghitany, \u201cStatistical properties of Poisson-Lomax distribution and its application to\n\nrepeated accidents data,\u201d Journal of Applied Statistical Science, vol. 10, no. 4, pp. 365\u2013372, 2001.\n\n[37] A. Childs, N. Balakrishnan, and M. Moshref, \u201cOrder statistics from non-identical right-truncated Lomax\n\nrandom variables with applications,\u201d Statistical Papers, vol. 42, no. 2, pp. 187\u2013206, 2001.\n\n[38] D. E. Giles, H. Feng, and R. T. Godwin, \u201cOn the bias of the maximum likelihood estimator for the\ntwo-parameter Lomax distribution,\u201d Communications in Statistics-Theory and Methods, vol. 42, no. 11,\npp. 1934\u20131950, 2013.\n\n[39] R. Varma, N. M. Bressler, Q. V. Doan, M. Gleeson, M. Danese, J. K. Bower, E. Selvin, C. Dolan, J. Fine,\nS. Colman, et al., \u201cPrevalence of and risk factors for diabetic macular edema in the United States,\u201d JAMA\nophthalmology, vol. 132, no. 11, pp. 1334\u20131340, 2014.\n\n[40] M. Zhou, Y. Cong, and B. Chen, \u201cAugmentable gamma belief networks,\u201d Journal of Machine Learning\n\nResearch, vol. 17, no. 163, pp. 1\u201344, 2016.\n\n[41] M. Zhou, \u201cSoftplus regressions and convex polytopes,\u201d arXiv:1608.06383, 2016.\n[42] T. S. Ferguson, \u201cA Bayesian analysis of some nonparametric problems,\u201d Ann. Statist., vol. 1, no. 2,\n\npp. 209\u2013230, 1973.\n\n[43] M. Zhou and L. Carin, \u201cNegative binomial process count and mixture modeling,\u201d IEEE Trans. Pattern\n\nAnal. Mach. Intell., vol. 37, no. 2, pp. 307\u2013320, 2015.\n\n[44] M. Zhou, L. Li, D. Dunson, and L. Carin, \u201cLognormal and gamma mixed negative binomial regression,\u201d in\n\nICML, pp. 1343\u20131350, 2012.\n\n[45] N. G. Polson, J. G. Scott, and J. Windle, \u201cBayesian inference for logistic models using P\u00f3lya\u2013Gamma\n\nlatent variables,\u201d J. Amer. Statist. Assoc., vol. 108, no. 504, pp. 1339\u20131349, 2013.\n\n[46] H. Binder, A. Allignol, M. Schumacher, and J. Beyersmann, \u201cBoosting for high-dimensional time-to-event\n\ndata with competing risks,\u201d Bioinformatics, vol. 25, no. 7, pp. 890\u2013896, 2009.\n\n[47] P. Saha and P. Heagerty, \u201cTime-dependent predictive accuracy in the presence of competing risks,\u201d Biomet-\n\nrics, vol. 66, no. 4, pp. 999\u20131011, 2010.\n\n[48] A. Rosenwald, G. Wright, W. C. Chan, J. M. Connors, E. Campo, R. I. Fisher, R. D. Gascoyne, H. K.\nMuller-Hermelink, E. B. Smeland, J. M. Giltnane, et al., \u201cThe use of molecular pro\ufb01ling to predict survival\nafter chemotherapy for diffuse large-B-cell lymphoma,\u201d New England Journal of Medicine, vol. 346,\nno. 25, pp. 1937\u20131947, 2002.\n\n11\n\n\f[49] S. R. P. S. S. B. National Cancer Institute, DCCPS, Surveillance, Epidemiology, and End Results (SEER)\nProgram Research Data (1973-2014), Released April 2017, based on the November 2016 submission.\nReleased April 2017, based on the November 2016 submission.\n\n[50] T. A. Gerds, T. Cai, and M. Schumacher, \u201cThe performance of risk prediction models,\u201d Biometrical Journal:\n\nJournal of Mathematical Methods in Biosciences, vol. 50, no. 4, pp. 457\u2013479, 2008.\n\n[51] E. W. Steyerberg, A. J. Vickers, N. R. Cook, T. Gerds, M. Gonen, N. Obuchowski, M. J. Pencina, and\nM. W. Kattan, \u201cAssessing the performance of prediction models: A framework for some traditional and\nnovel measures,\u201d Epidemiology (Cambridge, Mass.), vol. 21, no. 1, p. 128, 2010.\n\n[52] H. Van Houwelingen and H. Putter, Dynamic prediction in clinical survival analysis. CRC Press, 2011.\n[53] J. B. Tenenbaum, V. De Silva, and J. C. Langford, \u201cA global geometric framework for nonlinear dimen-\n\nsionality reduction,\u201d Science, vol. 290, no. 5500, pp. 2319\u20132323, 2000.\n\n[54] P. G. Moschopoulos, \u201cThe distribution of the sum of independent gamma random variables,\u201d Annals of the\n\nInstitute of Statistical Mathematics, vol. 37, no. 1, pp. 541\u2013544, 1985.\n\n[55] T. A. Gerds, pec: Prediction Error Curves for Risk Prediction Models in Survival Analysis, 2017. R\n\npackage version 2.5.4.\n\n[56] T. A. Gerds and T. H. Scheike, riskRegression: Risk Regression Models for Survival Analysis with\n\nCompeting Risks, 2015. R package version 1.1.7.\n\n[57] B. Gray, cmprsk: Subdistribution Analysis of Competing Risks, 2014. R package version 2.2-7.\n[58] H. Binder, CoxBoost: Cox models by likelihood based boosting for a single survival endpoint or competing\n\nrisks, 2013. R package version 1.4.\n\n[59] H. Ishwaran and U. Kogalur, Random Forests for Survival, Regression, and Classi\ufb01cation (RF-SRC), 2018.\n\nR package version 2.6.0.\n\n[60] T. H. Cormen, Introduction to algorithms. MIT press, 2009.\n[61] I. Borg and P. J. Groenen, Modern multidimensional scaling: Theory and applications. Springer Science &\n\nBusiness Media, 2005.\n\n12\n\n\f", "award": [], "sourceid": 2433, "authors": [{"given_name": "Quan", "family_name": "Zhang", "institution": "McCombs School of Business, University of Texas at Austin"}, {"given_name": "Mingyuan", "family_name": "Zhou", "institution": "University of Texas at Austin"}]}