{"title": "Call-Based Fraud Detection in Mobile Communication Networks Using a Hierarchical Regime-Switching Model", "book": "Advances in Neural Information Processing Systems", "page_first": 889, "page_last": 895, "abstract": null, "full_text": "Call-based Fraud Detection in Mobile \n\nCommunication Networks using a Hierarchical \n\nRegime-Switching Model \n\nJaakko Hollmen \n\nHelsinki University of Technology \n\nVolker Tresp \n\nSiemens AG, Corporate Technology \n\nLab. of Computer and Information Science \n\nDept. Information and Communications \n\nP.O. Box 5400, 02015 HUT, Finland \n\nlaakko.Hollmen@hut.fi \n\n81730 Munich, Germany \n\nVolker.Tresp@mchp.siemens.de \n\nAbstract \n\nFraud causes substantial losses to telecommunication carriers. Detec(cid:173)\ntion systems which automatically detect illegal use of the network can be \nused to alleviate the problem. Previous approaches worked on features \nderived from the call patterns of individual users. In this paper we present \na call-based detection system based on a hierarchical regime-switching \nmodel. The detection problem is formulated as an inference problem on \nthe regime probabilities. Inference is implemented by applying the junc(cid:173)\ntion tree algorithm to the underlying graphical model. The dynamics are \nlearned from data using the EM algorithm and subsequent discriminative \ntraining. The methods are assessed using fraud data from a real mobile \ncommunication network. \n\n1 INTRODUCTION \n\nFraud is costly to a network carrier both in terms of lost income and wasted capacity. It has \nbeen estimated that the telecommunication industry looses approximately 2-5% of its total \nrevenue to fraud. The true losses are expected to be even higher since telecommunication \ncompanies are reluctant to admit fraud in their systems. A fraudulent attack causes lots of \ninconveniences to the victimized subscriber which might motivate the subscriber to switch \nto a competing carrier. Furthermore, potential new customers would be very reluctant to \nswitch to a carrier which is troubled with fraud. \n\nMobile communication networks -which are the focus of this work- are particularly \nappealing to fraudsters as the calling from the mobile terminal is not bound to a physical \nplace and a subscription is easy to get. This provides means for an illegal high-profit \nbusiness requiring minimal investment and relatively low risk of getting caught. Fraud is \n\n\f890 \n\nJ. Hollmen and V. Tresp \n\nusually initiated by a mobile phone theft, by cloning the mobile phone card or by acquiring \na subscription with false identification. After intrusion the subscription can be used for \ngaining free services either for the intruder himself or for his illegal customers in form of \ncall-selling. In the latter case, the fraudster sells calls to customers for reduced rates. \n\nThe earliest means of detecting fraud were to register overlapping calls originating from \none subscription, evidencing card cloning. While this procedure efficiently detects cloning, \nit misses a large share of other fraud cases. A more advanced system is a velocity trap which \ndetects card cloning by using an upper speed limit at which a mobile phone user can travel. \nSubsequent calls from distant places provide evidence for card cloning. Although a velocity \ntrap is a powerful method of detecting card cloning, it is ineffective against other types of \nfraud. Therefore there is great interest in detection systems which detect fraud based on \nan analysis of behavioral patterns (Barson et aI., 1996, Burge et aI., 1997, Fawcett and \nProvost, 1997, Taniguchi et aI., 1998). \n\nIn an absolute analysis, a user is classified as a fraudster based on features derived from \ndaily statistics summarizing the call pattern such as the average number of calls. In a differ(cid:173)\nential analysis, the detection is based on measures describing the changes in those features \ncapturing the transition from a normal use to fraud. Both approaches have the problem \nof finding efficient feature representations describing normal and fraudulent behavior. As \nthey usually derive features as summary statistics over one day, they are plagued with a \nlatency time of up to a day to detect fraudulent behavior. The resulting delay in detection \ncan already lead to unacceptable losses and can be exploited by the fraudster. For these \nreasons real-time fraud detection is seen to be the most important development in fraud \ndetection (Pequeno, 1997). \n\nIn this paper we present a real-time fraud detection system which is based on a stochastic \ngenerative model. In the generative model we assume a variable victimized which indicates \nif the account has been victimized by a fraudster and a second variable fraud which indi(cid:173)\ncates if the fraudster is currently performing fraud. Both variables are hidden. Furthermore, \nwe have an observed variable call which indicates if a call being is performed or not. The \ntransition probabilities from no-call to call and from call to no-call are dependent on the \nstate of the variable fraud. Overall, we obtain a regime-switching time-series model as de(cid:173)\nscribed by Hamilton (1994), with the modifications that first, the variables in the time series \nare not continuous but binary and second, the switching variable has a hierarchical struc(cid:173)\nture. The benefit of the hierarchical structure is that it allows us to model the time-series \nat different time scales. At the lowest hierarchical level, we model the dynamical behavior \nof the individual calls, at the next level the transition from normal behavior to fraudulent \nbehavior and at the highest level the transition to being victimized. To be able to model a \ntime-series at different temporal resolutions was also the reason for introducing a hierarchy \ninto a hidden Markov model for Jordan, Ghahramani and Saul (1997). Fortunately, our \nhidden variables have only a small number of states such that we do not have to work with \nthe approximation techniques those authors have introduced. \n\nSection 2 introduces our hierarchical regime-switching fraud model. The detection prob(cid:173)\nlem is formulated as an inference problem on the regime probabilities based on subscriber \ndata. We derive iterative algorithms for estimating the hidden variables fraud and victim(cid:173)\nized based on past and present data (filtering) or based on the complete set of observed \ndata (smoothing). We present EM learning rules for learning the parameters in the model \nusing observed data. We develop a gradient based approach for fine tuning the emission \nprobabilities in the non-fraud state to enhance the discrimination capability of the model. \nIn Section 3 we present experimental results. We show that a system which is fine-tuned on \nreal data can be used for detecting fraudulent behavior on-line based on the call patterns. \nIn Section 4 we present conclusions and discuss further applications and extensions of our \nfraud model. \n\n\fFraud Detection Using a Hierarchical Regime-Switching Model \n\n891 \n\n2 THE HIERARCHICAL REGIME-SWITCHING FRAUD \n\nMODEL \n\n2.1 THE GENERATIVE MODEL \n\nThe hierarchical regime-switching model consists of three variables which evolve in time \nstochastically according to first-order Markov chains. The first binary variable Vt (victim(cid:173)\nized) is equal to one if the account is currently being victimized by a fraudster and zero \notherwise. The states of this variable evolve according to the state transition probabili(cid:173)\nties pij = P(Vt = ilVt_l = j); i,j = 0,1. The second binary variable St (fraud) is \nequal to one if the fraudster currently performs fraud and is equal to zero if the fraudster \nis inactive. The change between actively performing fraud and intermittent silence is typ(cid:173)\nical for a victimized account as is apparent from Figure 3. Note that this transient bursty \nbehavior of a victimized account would be difficult to capture with a pure feature based \napproach. The states of this variable evolve following the state transition probabilities \npfjk = P(St = ilvt = j,St-l = k,);i,j,k = 0,1. Finally, the binary variable Yt (call) \nis equal to one if the mobile phone is being used and zero otherwise with state transition \nmatrix pfjk = P(Yt = ilst = j, Yt-l = k); i, j, k = 0,1. Note that this corresponds to the \nassumption of exponentially distributed call duration. Although not quite realistic, this is \nthe general assumption in telecommunications. Typically, both the frequency of calls and \nthe lengths of the calls are increased when fraud is executed. The joint probability of the \ntime series up to time T is then \n\nP(VT' ST, YT) = P(vo, So, Yo) II P(Vt!Vt-l) II P(stlvt, St-l) II P(Ytlst, Yt-l) (1) \n\nT \n\nt=l \n\nT \n\nt=l \n\nT \n\nt=l \n\nwhere in the experiments we used a sampling time of one minute. Furthermore, VT \n{vo, ... , VT }, ST = {so, ... , ST }, YT = {Yo, ... , YT} and P(vo, So, Yo) is the prior dis(cid:173)\ntribution of the initial states. \n\n-\n\nFigure 1: Dependency graph of the hierarchical regime-switching fraud model. The square \nboxes denote hidden variables and the circles observed variables. The hidden variable Vt \non the top describes whether the subscriber account is victimized by fraud. The hidden \nvariable St indicates if fraud is currently being executed. The state of St determines the \nstatistics of the variable call Yt. \n\n2.2 \n\nINFERENCE: FILTERING AND SMOOTIDNG \n\nWhen using the fraud detection system, we are interested to estimate the probability that \nan account is victimized or that fraud is currently occurring based on the call patterns up to \nthe current point in time (filtering). We can calculate the probabilities of the states of the \nhidden variables by applying the following equations recursively with t = 1, ... , T. \n\n\f892 \n\nJ. Hol/men and V Tresp \n\nP(Vt = i, St-1 = k!Yt-1) = '2::prlP(Vt-1 = l, St-1 = kIYt- 1) \n\nI \n\nP(Vt = i, St = j!Yt-1) = '2:: PjikP(Vt = i, St-1 = kIYt-1) \n\nk \n\nwhere c is a scaling factor. These equations can be derived from the junction tree algorithm \nfor the Bayesian networks (Jensen, 1996). We obtain the probability of victimization and \nfraud by simple marginalization \n\nP(Vt = ilYt ) = L P(Vt = i, St = jlYr) ; P(St = jlYd = L P(Vt = i, St = jlYd\u00b7 \n\nj \n\ni \n\nIn some cases -in particular for the EM learning rules in the next section- we might \nbe interested in estimating the probabilities of the hidden states at some time in the past \n(smoothing). In this case we can use a variation of the smoothing equations described in \nHamilton (1994) and Kim (1994). After performing the forward recursion, we can calculate \nthe probability of the hidden states at time tf given data up to time T > tf iterating the \nfollowing equations with t = T, T - 1, ... ,1. \n\nP(Vt+1 = k,St = JIYT) = ~ P( \n\n. \n\n\"\"\"' P(Vt+1 = k,St+1 = lIYT) \nt \nI \n\n,St+1 -\n\nVt+1 -\n\n_ k \n\n-ll\u00a5') P(Vt+l = k,St = JIYt}Plkj \n\ns \n\n. \n\nP( \n\n, \n\n\u00b7IV) \n\nVt=z,St=JI.T =~ P( \n\n\"\"\"' P(Vt+1 = k,St =jIYT)p( \nk \n\n,St - J \n\nVt+1 -\n\n-k \n\n_ \n\nt \n\n\u00b7Iv\") v \n'I\u00a5,) Vt=z,St=)I.tPki \n\n. \n\n2.3 EM LEARNING RULES \n\nParameter estimation in the regime-switching model is conveniently formulated as an in(cid:173)\ncomplete data problem, which can be solved using the EM algorithm (Hamilton, 1994). \nEach iteration of the EM algorithm is guaranteed to increase the value of the marginal log(cid:173)\nlikelihood function until a fixed point is reached. This fixed point is a local optimum of the \nmarginal log-likelihood function. \n\nIn the M-step the model parameters are optimized using the estimates of the hidden states \nusing the current parameter estimates. Let 0 = {prj, Pijk' P;kj} denote the current param(cid:173)\neter estimates. The new estimates are obtained using \n\nv \nPij = \n\n2:;=1 P( Vt = i, Vt-1 = jIYT; 0) \n\n\",T \nL....t=l P Vt-1 = J }T; 0) \n\n( \n\n'I ,\" \n\ns \nPijk = \n\n2:;-1 P(St = i, Vt = j, St-1 = kIYT; 0) \n\n2:t=l P( Vt = J, St-1 = kIYT; 0) \n\nT \n\n. \n\nY \nPikj = \n\n2:;=l,if Yt=i andYt_l=j P(St-1 = kIYT;O) \n\nT \n\n2:t=l, if Yt-l=j P(St-1 = kIYT; 0) \n\n\fFraud Detection Using a Hierarchical Regime-Switching Model \n\n893 \n\nThe E-step determines the probabilities on the right sides of the equations using the current \nparameter estimates. These can be determined using the smoothing equations from the \nprevious section directly by marginalizing \n\nP(Vt = k, St = l, Vt+l = i, St+1 = jIYT) \n\nwhere the terms on the right side are obtained from the equations in the last Section. \n\n2.4 DISCRIMINATIVE TRAINING \n\nIn our data setting, it is not known when the fraudulent accounts were victimized by fraud. \nThis is why we use the EM algorithm to learn the two regimes from data in an incom(cid:173)\nplete data setting. We know, however, which accounts were victimized by fraud. After EM \nlearning the discrimination ability of the model was not satisfactory. We therefore used \nthe labeled sequences to improve the model. The reason for the poor performance was \ncredited to unsuitable call emission probabilities in the normal state. We therefore mini-\nmize the error function E = L:i (maXt P(v!i)I}~(i)) - t(i\u00bb)2 with regard to the parameter \nP;=O,j=O,k=O' where the t(i) = {O, I} is the label for the sequence i. The error function \nwas minimized with Quasi-Newton procedure with numerical differentiation. \n\n3 EXPERIMENTS \n\nTo test our approach we used a data set consisting of 600 accounts which were not affected \nby fraud and 304 accounts which were affected by fraud. The time period for non-fraud and \nfraud accounts were 49 and 92 days, respectively. We divided the data equally into training \ndata and test data. From the non-fraud data we estimated the parameters describing the \nnormal calling behavior, i.e. pr,j =O,k' Next, we fixed the probability that an account is \nvictimized from one time step to the next to PY=l,j=O = 10- 5 and the probability that \na victimized account becomes de-victimized as pi=O,j=l = 5 X 10- 4 \u2022 Leaving those \nparameters fixed the remaining parameters were trained using the fraudulent accounts and \nthe EM algorithm described in Section 2. We had to do unsupervised training since it was \nknown by velocity check that the accounts were affected but it was not clear when the \nintrusion occurred. After unsupervised training, we further enhanced the discrimination \ncapability of the system which helped us reduce the amount of false alarms. The final \nmodel parameters can be found in the Appendix. \n\nAfter training, the system was tested using the test data. Unfortunately, it is not known \nwhen the accounts were attacked by fraud, but only on per-account basis if an account was \nat some point a victim of fraud. Therefore, we declare an account to be victimized if the \nvictimized variable at some point exceeds the threshold. Also, it is interesting to study the \nresults shown in Figure 3. We show data and posterior time-evolving probabilities for an \naccount which is known to be victimized. From the call pattern it is obvious that there are \nperiods of suspiciously high traffic at which the probability of victimization is recognized \nto be very high. We also see that the variable fraud St follows the bursty behavior of \nthe fraudulent behavior correctly. Note, that for smoothing which is important both for \na retrospective analysis of call data and for learning, we achieve smoother curves for the \nvictimized variable. \n\n\f894 \n\nJ. Hollmen and V. Tresp \n\n\u2022\u2022 \n\n.. \n\n\" \n\n- - ' \n, \n, \n\n00 \n\naDI \n\n01\u00bb \n\nOeD \n\nO.Dt \n\nOIlS \n\n0,08 \n\nom \n\no.oa \n\n001 \n\n0.1 \n\nO.D1 \n\n0,(2 \n\nOeD \n\n001 \n\n0.05 \n\n001 \n\noar \n\nQ.O& \n\n001 \n\n01 \n\nFigure 2: The Receiver Operating Characteristic (ROC) curves are shown for on-line detec(cid:173)\ntion (left figure) and for retrospective classification (right figure). In the figures, detection \nprobability is plotted against the false alarm probability. The dash-dotted lines are results \nbefore, the solid lines after discriminative training. We can see that the discriminative \ntraining improves the model considerably. \n\nAfter EM training and discriminative training, we tested the model both in on-line detec(cid:173)\ntion mode (filtering) and in retrospective classification (smoothing) with smoothed proba(cid:173)\nbilities. The detection results are shown in Figure 2. With a fixed false alarm probability \nof 0.003, the detection probabilities for the training set were found to be 0.974 and 0.934 \nusing on-line detection mode and with smoothed probabilities, respectively. With a testing \nset and a fixed false alarm probability of 0.020, we obtain the detection probabilities of \n0.928 and 0.921, for the on-line detection and for retrospective classification, respectively. \n\n4 CONCLUSIONS \n\nWe presented a call-based on-line fraud detection system which is based on a hierarchi(cid:173)\ncal regime-switching generative model. The inference rules are obtained from the junction \ntree algorithm for the underlying graphical model. The model is trained using the EM algo(cid:173)\nrithm in an incomplete data setting and is further refined with gradient-based discriminative \ntraining, which considerably improves the results. \n\nA few extensions are in the process of being implemented. First of all , it makes sense to \nuse more than one fraud model for the different fraud scenarios and several user models \nto account for different user profiles. For these more complex models we might have to \nrely on approximations techniques such as the ones introduced by Jordan, Ghahramani and \nSaul (1997). \n\nAppendix \n\nThe model parameters after EM training and discriminative training. Note that entering the \nfraud state without first entering the victimized state is impossible. \n\npY . Ok = ( \n... ,)= , \n\npi,j=O,k = ( \n\n0.9559 \n0.3533 \n1.0000 \n0.0000 \n\n0.0441 ) \n0.6467 \n0.0000 ) \n1.0000 \n\np\u00a5 . 1 k = ( \nt ,J= , \n\npi,j=l ,k = ( \n\n0.9292 \n0.0570 \n0.9979 \n0.0086 \n\n0.0708 ) \n0.9430 \n0.0021 ) \n0.9914 \n\n\fFraud Detection Using a Hierarchical Regime-Switching Model \n\n_-o:t 111111111111U 11111] 111I111111[IIIIIIIIJIillIIIIIIII \n\n0.5 \n\n1 \n\n1.5 \n\n2 \n\n895 \n\nj \n\n2 .5 \n\n~o:~: : :\\J \n\n2.5 \n\n0.5 \n\n1.5 \n\n2 \n\n1 \n\n0.5 \n\n1.5 \n\n2 \n\n2.5 \n\nFigure 3: The first line shows the calling data Yt from a victimized account. The second \nand third lines show the states of the victimized and fraud variables, respectively. Both are \ncalculated with the filtering equations. The fourth and fifth lines show the same variables \nusing the smoothing equations. The displayed time window period is seventeen days. \n\nReferences \n\nBarson P., Field, S., Davey, N., McAskie, G., and Frank, R. (1996). The Detection of Fraud \nin Mobile Phone Networks. Neural Network World, Vol. 6, No.4. \nBengio, Y. (1996). Markovian Models for Sequential Data. Technical Report # 1049, \nUniversite de Montreal. \nBurge, P., Shawe-Taylor J., Moreau Y., Verrelst, H., Stormann C. and Gosset, P. (1997). \nBRUTUS - A Hybrid Detection Tool. Proc. of ACTS Mobile Telecommunications Summit, \nAalborg, Denmark. \nFawcett, T. and Provost, F. (1997). Adaptive Fraud Detection. Journal of Data Mining and \nKnowledge Discovery, , Vol. I, No.3, pp. 1-28. \nHamilton, J. D. (1994). Time Series Analysis. Princeton University Press. \nJensen, Finn V. (1996). Introduction to Bayesian Networks. UCL Press. \nJordan, M. I, Ghahramani, Z. and Saul, L. K. (1997). Hidden Markov Decision Trees, in \nAdvances in Neural Information Processing Systems: Proceedings of the 1996 Conference \n(NIPS'9), MIT-Press, pp. 501-507. \nKim, c.-J. (1994). Dynamical linear models with Markov-switching. Journal of Econo(cid:173)\nmetrics, Vol. 60, pp. 1-22. \nPequeno, K. A.(1997). Real-Time fraud detection: Telecom's next big step. Telecommuni(cid:173)\ncations (America Edition), Vol. 31, No.5, pp. 59-60. \nTaniguchi, M., Haft, M., Hollmen, J. and Tresp, V. (1998). Fraud detection in communica(cid:173)\ntions networks using neural and probabilistic methods. Proceedings of the 1998 IEEE Int. \nCon! in Acoustics, Speech and Signal Processing (ICASSP'98). Vol. 2. pp. 1241-1244. \n\n\f", "award": [], "sourceid": 1505, "authors": [{"given_name": "Jaakko", "family_name": "Hollm\u00e9n", "institution": null}, {"given_name": "Volker", "family_name": "Tresp", "institution": null}]}