{"title": "Stochastic Nonparametric Event-Tensor Decomposition", "book": "Advances in Neural Information Processing Systems", "page_first": 6856, "page_last": 6866, "abstract": "Tensor decompositions are fundamental tools for multiway data analysis. Existing approaches, however, ignore the valuable temporal information along with data, or simply discretize them into time steps so that important temporal patterns are easily missed. Moreover, most methods are limited to multilinear decomposition forms, and hence are unable to capture intricate, nonlinear relationships in data. To address these issues, we formulate event-tensors, to preserve the complete temporal information for multiway data, and propose a novel Bayesian nonparametric decomposition model. Our model can (1) fully exploit the time stamps to capture the critical, causal/triggering effects between the interaction events, (2) flexibly estimate the complex relationships between the entities in tensor modes, and (3) uncover hidden structures from their temporal interactions. For scalable inference, we develop a doubly stochastic variational Expectation-Maximization algorithm to conduct an online decomposition. Evaluations on both synthetic and real-world datasets show that our model not only improves upon the predictive performance of existing methods, but also discovers interesting clusters underlying the data.", "full_text": "Stochastic Nonparametric Event-Tensor\n\nDecomposition\n\nShandian Zhe, Yishuai Du\n\nSchool of Computing, University of Utah\n\nzhe@cs.utah.edu, yishuai.du@utah.edu\n\nAbstract\n\nTensor decompositions are fundamental tools for multiway data analysis. Existing\napproaches, however, ignore the valuable temporal information along with data,\nor simply discretize them into time steps so that important temporal patterns are\neasily missed. Moreover, most methods are limited to multilinear decomposition\nforms, and hence are unable to capture intricate, nonlinear relationships in data. To\naddress these issues, we formulate event-tensors, to preserve the complete tempo-\nral information for multiway data, and propose a novel Bayesian nonparametric\ndecomposition model. Our model can (1) fully exploit the time stamps to capture\nthe critical, causal/triggering effects between the interaction events, (2) \ufb02exibly\nestimate the complex relationships between the entities in tensor modes, and (3)\nuncover hidden structures from their temporal interactions. For scalable inference,\nwe develop a doubly stochastic variational Expectation-Maximization algorithm to\nconduct an online decomposition. Evaluations on both synthetic and real-world\ndatasets show that our model not only improves upon the predictive performance\nof existing methods, but also discovers interesting clusters underlying the data.\n\n1\n\nIntroduction\n\nTensors represent the high-order interactions between entities in multiway data. Such interactions are\nubiquitous in real-world applications. For instance, in online shopping, users purchase commodities\nunder different web contexts \u2014 these interactions can be represented by a three mode tensor (user,\ncommodity, web context). To analyze tensor data, we use decomposition approaches \u2014 where we\njointly estimate a set of latent factors for each entity, and the mapping between the latent factors\nand tensor entry values. The latent factors can reveal hidden structures of the entities, such as\nclusters/communities; the mapping characterizes the entities\u2019 relationships (in terms of their factor\nrepresentations), and can be used to predict missing entry values.\nDespite the wide success of existing tensor decomposition algorithms (Tucker, 1966; Harshman,\n1970; Kang et al., 2012; Choi and Vishwanathan, 2014), most methods assume a simple multilinear\ndecomposition form, which might be insuf\ufb01cient to estimate intricate, nonlinear relationships in data.\nMore important, most methods ignore the valuable temporal information along with data or exploit\nthem in a relatively coarse way. For instance, the time stamp of each interaction is usually abandoned\nand only their counts are used for count tensor decomposition (Chi and Kolda, 2012; Hansen et al.,\n2015; Hu et al., 2015b). More elegant approaches (Xiong et al., 2010; Schein et al., 2015, 2016)\ndiscretize the time stamps into steps, e.g., weeks/months, and use a set of time factors to represent\neach step. The tensor is hence augmented with a time mode. The decomposition may further use\nMarkov assumptions to encourage smooth transitions between the time factors (Xiong et al., 2010).\nHowever, in each time step, the occurrences of the interactions are treated independently. Hence,\nimportant temporal patterns, such as causal/triggering effects in adjacent interactions, cannot be well\nmodeled or captured.\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fTo address these issues, we \ufb01rst formulate a new data abstraction, event-tensor, to preserve all the time\nstamps for multiway data. In an event-tensor, each entry comprises a sequence of interaction events\nrather than a numerical value. Second, we propose a powerful Bayesian nonparametric model to\ndecompose event-tensors (Section 3). We hybridize latent Gaussian processes and Hawkes processes\nto capture various excitation effects among the observed interaction events, and the underlying\ncomplex relationships between the entities that participated in the events. Furthermore, we design\na novel triggering function that enables discovering clusters of entities (or latent factors) in terms\nof excitation strengths. Besides, the triggering function allows us to \ufb02exibly specify the triggering\nrange (say, via domain knowledge) to better capture local excitations and to control the trade-off to\nthe computational cost. Finally, to handle data where both the tensor entries and interaction events\nare many, we derive a fully decomposed variational model evidence lower bound by using Poisson\nprocess superposition theorem and the variational sparse Gaussian process framework (Titsias, 2009).\nBased on the bound, we develop a doubly stochastic variational Expectation-Maximization algorithm\nto ful\ufb01ll a scalable, online decomposition (Section 4).\nFor evaluation, we examined our model in both predictive performance and structure discovery. On\nthree real-world datasets, our model often largely improves upon the prediction accuracy of the\nexisting methods that use Poisson processes and/or time factors to incorporate temporal information.\nSimulation shows the latent factors estimated by our model clearly re\ufb02ect the ground-truth clusters\nwhile by the competing methods do not. We further examined the structures discovered by our model\non the real-world datasets and found many interesting patterns, such as groups of 911 accidents\nwith strong associations, locations of townships that are apt to have consecutive accidents, and UFO\nshapes that are more likely to be sighted together (Section 6).\n\ns\u00d7 rk+1 \u00d7 . . .\u00d7 rK. Each entry is computed by (W \u00d7k U)i1...ik\u22121jik+1...iK =(cid:80)rk\n\n2 Background\nTensor Decomposition. We denote a K-mode tensor by M \u2208 Rd1\u00d7...\u00d7dK , where dk is the dimen-\nsion of k-th mode, corresponding to dk entities (e.g., users or items). The entry value at location\ni = (i1, . . . , iK) is denoted by mi. Given a tensor W \u2208 Rr1\u00d7...\u00d7rK , and a matrix U \u2208 Rs\u00d7t, we\ncan multiply W by U at mode k when rk = t. The result is a new tensor of size r1 \u00d7 . . . \u00d7 rk\u22121 \u00d7\nik=1 wi1...iK ujik .\nFor decomposition, we introduce K latent factor matrices, U = {U(1), . . . , U(K)}, to represent the\nentities in each tensor mode \u2014 each U(k)(j, :) are the latent factors of the j-th entity in mode k.\nThe classical Tucker decomposition (Tucker, 1966) incorporates a small core tensor W \u2208 Rr1\u00d7...\u00d7rK ,\nand assumes M = W \u00d71 U(1) \u00d72 . . . \u00d7K U(K). We can simplify Tucker decomposition, by\nrestricting r1 = . . . = rK and W to be diagonal. Then we reduce to CANDECOMP/PARAFAC (CP)\ndecomposition (Harshman, 1970). While many other decomposition methods have been proposed\ne.g., (Chu and Ghahramani, 2009; Kang et al., 2012; Choi and Vishwanathan, 2014), most of them\nare still based on the Tucker/CP forms. However, the multilinear assumptions might be insuf\ufb01cient to\ncapture intricate, highly nonlinear relationships in data.\nRecently, several Bayesian nonparametric tensor decomposition models (Xu et al., 2012; Zhe et al.,\n2016b) are proposed, which are \ufb02exible to capture various nonlinear relationships in data. For\nexample, Zhe et al. (2016b) considered each entry value mi as a function of the corresponding latent\nfactors, i.e., mi = f ([U(1)(i1, :), . . . , U(K)(iK, :)]), and placed a Gaussian process (GP) (Rasmussen\nand Williams, 2006) prior over f (\u00b7), to automatically infer the (possible) nonlinearity of f (\u00b7). These\nmethods often improve the CP/Tucker decompositions by a large margin in missing value prediction.\nDecomposition with Temporal Information. Practical tensors often come with temporal informa-\ntion, namely the time stamps of those interactions. For example, from a \ufb01le access log, we can extract\nnot only a three-mode (user, action, \ufb01le) tensor, but also the time stamps for each user taking the\naction to access a \ufb01le. To use the temporal information in the decomposition, many methods discard\nthe time stamps, use a Poisson (process) likelihood to model the interaction frequency mi in each\nentry i, p(mi) \u221d e\u2212\u03bbiT \u03bbmi\n(Chi and Kolda, 2012; Hu et al., 2015b), and perform the Tucker/CP\ndecomposition over {\u03bbi} or {log(\u03bbi)}. More re\ufb01ned approaches (Xiong et al., 2010; Schein et al.,\n2015, 2016) \ufb01rst discretize the time stamps into several steps, such as months/weeks, and augment\nthe original tensor with a time mode. Then a time factor matrix T are estimated in the decomposition.\nWhile the interactions from different time steps are modeled with distinct time factors, the ones in\nthe same interval are considered independently (given the latent factors), say, being modeled by\nPoisson likelihoods (Schein et al., 2015, 2016). A Markov assumption might be used to encourage\n\ni\n\n2\n\n\fGaussian prior over each T(k, :), p(cid:0)T(k, :)|T(k \u2212 1, :)(cid:1) = N(cid:0)T(k, :)|T(k \u2212 1, :), \u03c32I(cid:1).\n\nthe smoothness between the time factors. For example, Xiong et al. (2010) assigned a conditional\n\n3 Model\nDespite the success of existing approaches in exploiting temporal information, they entirely drop the\ntime stamps and hence are unable to capture the important, triggering or causal effects between the\ninteractions. The triggering effects are common in real-world applications. For example, the event\nthat user A purchased commodity B may excite A\u2019s friend C to purchase B as well. The triggering\neffects are usually local and decay fast with time; dropping the time stamps and considering the event\noccurrences independently make us unable to model/capture these effects.\nTo address these issues, and hence to further capture the complex relationships and important\nstructures underlying the interaction events, we formulate a new data abstraction, event-tensor,\nto preserve all the time stamps. We then propose a powerful Bayesian nonparametric model to\ndecompose the event-tensors, discussed as follows.\n\n3.1 Event-Tensor Formulation\n\nFirst, let us look at the de\ufb01nition of event-tensors. To preserve the complete temporal information in\ndecomposition, we relax the de\ufb01nition that tensors must be multidimensional arrays of numerical\nvalues. Instead, we de\ufb01ne that each entry is a sequence of events, i.e., mi = {s1\ni } where each\ni (1 \u2264 k \u2264 ni) is a time stamp when the interaction happened, and ni the count of the events. Note\nsk\nthat, different entries correspond to distinct types of interaction events, since the involved entities (or\nlatent factors) are different. We name this tensor as an event-tensor. Given the observed entries {mi},\nwe can \ufb02atten their event sequences to obtain a single sequence S = [(s1, i1), . . . (sN , iN )] where\ns1 \u2264 . . . \u2264 sN are all the time stamps, and each ik is the entry index for the event sk(1 \u2264 k \u2264 N ) .\n3.2 Nonparametric Event-Tensor Decomposition\n\ni , . . . , sni\n\nNow, we consider a probabilistic model for event-tensor decomposition. While Poisson processes\n(PPs) have many nice properties and are often good choices of modeling events (Schein et al., 2015),\nthey assume event occurences are independent (i.e., independent increments), and hence are unable\nto capture the in\ufb02uences of the events on each other. To overcome this limit, we use a much more\ntime t, \u03bb(t) = \u03bb0 +(cid:80)\nexpressive point process, Hawkes process (Hawkes, 1971), for events modeling in tensor entries.\nGiven an event sequence {t1, . . . tn}, the Hawkes process de\ufb01nes the event rate \u03bb as a function of\nti