{"title": "Dynamic Social Network Analysis using Latent Space Models", "book": "Advances in Neural Information Processing Systems", "page_first": 1145, "page_last": 1152, "abstract": "", "full_text": "Dynamic Social Network Analysis using Latent\n\nSpace Models\n\nPurnamrita Sarkar, Andrew W. Moore\n\nCenter for Automated Learning and Discovery\n\n(psarkar,awm)@cs.cmu.edu\n\nCarnegie Mellon University\n\nPittsburgh, PA 15213\n\nAbstract\n\nThis paper explores two aspects of social network modeling. First,\nwe generalize a successful static model of relationships into a dynamic\nmodel that accounts for friendships drifting over time. Second, we show\nhow to make it tractable to learn such models from data, even as the\nnumber of entities n gets large. The generalized model associates each\nentity with a point in p-dimensional Euclidian latent space. The points\ncan move as time progresses but large moves in latent space are improb-\nable. Observed links between entities are more likely if the entities are\nclose in latent space. We show how to make such a model tractable (sub-\nquadratic in the number of entities) by the use of appropriate kernel func-\ntions for similarity in latent space; the use of low dimensional kd-trees; a\nnew ef(cid:2)cient dynamic adaptation of multidimensional scaling for a (cid:2)rst\npass of approximate projection of entities into latent space; and an ef(cid:2)-\ncient conjugate gradient update rule for non-linear local optimization in\nwhich amortized time per entity during an update is O(log n). We use\nboth synthetic and real-world data on upto 11,000 entities which indicate\nlinear scaling in computation time and improved performance over four\nalternative approaches. We also illustrate the system operating on twelve\nyears of NIPS co-publication data. We present a detailed version of this\nwork in [1].\n\n1\n\nIntroduction\n\nSocial network analysis is becoming increasingly important in many (cid:2)elds besides sociol-\nogy including intelligence analysis [2], marketing [3] and recommender systems [4]. Here\nwe consider learning in systems in which relationships drift over time.\nConsider a friendship graph in which the nodes are entities and two entities are linked if\nand only if they have been observed to collaborate in some way. In 2002, Raftery et al\n[5]introduced a model similar to Multidimensional Scaling in which entities are associated\nwith locations in p-dimensional space, and links are more likely if the entities are close in\nlatent space. In this paper we suppose that each observed link is associated with a discrete\ntimestep, so each timestep produces its own graph of observed links, and information is\npreserved between timesteps by two assumptions. First we assume entities can move in\nlatent space between timesteps, but large moves are improbable. Second, we make a stan-\ndard Markov assumption that latent locations at time t + 1 are conditionally independent\nof all previous locations given the latent locations at time t and that the observed graph at\n\n\ftime t is conditionally independent of all other positions and graphs, given the locations at\ntime t (see Figure 1).\nLet Gt be the graph of observed pairwise links at time t. Assuming n entities, and a\np-dimensional latent space, let Xt be an n \u00a3 p matrix in which the ith row, called xi, cor-\nresponds to the latent position of entity i at time t. Our conditional independence structure,\nfamiliar in HMMs and Kalman (cid:2)lters, is shown in Figure 1. For most of this paper we treat\nthe problem as a tracking problem in which we estimate Xt at each timestep as a function\nof the current observed graph Gt and the previously estimated positions Xt\u00a11. We want\n\nX\n\nX\n\nXt = arg max\n\nP (GtjX)P (XjXt\u00a11)\n\nP (XjGt; Xt\u00a11) = arg max\n\n(1)\nIn Section 2 we design models of P (GtjXt) and P (XtjXt\u00a11) that meet our modeling\nneeds and which have learning times that are tractable as n gets large. In Sections 3 and\n4 we introduce a two-stage procedure for locally optimizing equation (1). The (cid:2)rst stage\ngeneralizes linear multidimensional scaling algorithms to the dynamic case while carefully\nmaintaining the ability to computationally exploit sparsity in the graph. This gives an\napproximate estimate of Xt. The second stage re(cid:2)nes this estimate using an augmented\nconjugate gradient approach in which gradient updates can use kd-trees over latent space\nto allow O(n log n) computation per step.\n\nX0\n\nX1\n\n\u2026\n\nG0\n\nG1\n\nXT\n\nGT\n\nFigure 1: Model through time\n\n2 The DSNL (Dynamic Social Network in Latent space) Model\nLet dij = jxi \u00a1 xjj be the Euclidian distance between entities i and j in latent space at time\nt. For clarity we will not use a t subscript on these variables except where it is needed. We\ndenote linkage at time t by i \u00bb j, and absence of a link by i 6\u00bb j. p(i \u00bb j) denotes the\nprobability of observing the link. We use p(i \u00bb j) and pij interchangeably.\n2.1 Observation Model\nThe likelihood score function P (GtjXt) intuitively measures how well the model explains\npairs of entities which are actually connected in the training graph as well as those that are\nnot. Thus it is simply\n\nP (GtjXt) = Yi\u00bbj\n\npij Yi6\u00bbj\n\n(1 \u00a1 pij)\n\n(2)\n\nFollowing [5] the link probability is a logistic function of dij and is denoted as pL\n\nij , i.e.\n\n1\n\npL\nij =\n\n1 + e(dij \u00a1\ufb01)\n\n(3)\nwhere \ufb01 is a constant whose signi(cid:2)cance is explained shortly. So far this model is similar\nto [5]. To extend this model to the dynamic case, we now make two important alterations.\nFirst, we allow entities to vary their sociability. Some entities participate in many links\nwhile others are in few. We give each entity a radius, which will be used as a sphere of\ninteraction within latent space. We denote entity i\u2019s radius as ri. We introduce the term\nrij to replace \ufb01 in equation (3). rij is the maximum of the radii of i and j. Intuitively, an\nentity with higher degree will have a larger radius. Thus we de(cid:2)ne the radius of entity i\nwith degree \u2013i as, c(\u2013i + 1), so that rij is c \u00a3 (max(\u2013i; \u2013j) + 1), and c will be estimated\nfrom the data. In practice, we estimate the constant c by a simple line-search on the score\nfunction. The constant 1 ensures a nonzero radius.\n\n\fSimple\nLogistic\n\n0.6\n\n0.55\n\n0.5\n\n0.45\n\n0.4\n\n0.35\n\n0.3\n\n0.25\n\n0.2\n\n0.15\n\n0.1\n\nNew\nLinkage\nProbability\n\n1\n\n0.5\n\n0\n0\n\n0.2\n\n0.4\n\n0.6\n\n0.8\n\n1\n\n70\n\n60\n\n50\n\n40\n\n30\n\n20\n\n10\n\n0\n\n1\n\n0\n\n\u22121\n\n\u22122\n\n1\n\n0\n\n\u22121\n\n\u22122\n\n(A)\n\n(B)\n\nFigure 2: A. The actual logistic function, and our kernelized version with \u2030 = 0:1. B.The actual\n((cid:3)at, with one minimum), and the modi(cid:2)ed (steep with two minima) constraint functions, for two\ndimensions, with Xt varying over a 2-d grid, from (\u00a12; \u00a12) to (2; 2), and Xt\u00a11 = (1; 1)\n\nThe second alteration is to weigh the link probabilities by a kernel function. We alter the\nsimple logistic link probability pL\nij, such that two entities have high probability of linkage\nonly if their latent coordinates are within distance rij of one another. Beyond this range\nthere is a constant noise probability \u2030 of linkage. Later we will need the kernelized function\nto be continuous and differentiable at rij. Thus we pick the biquadratic kernel.\n\nK(dij) = (1 \u00a1 (dij=rij)2)2;\n\n= 0;\n\nUsing this function we rede(cid:2)ne our link probability pij as pL\nThis is equivalent to having,\n\nwhen dij \u2022 rij\n(4)\notherwise\nijK(dij) + \u2030(1 \u00a1 K(dij)) .\n\npij =\n\n1\n\n1 + e(dij \u00a1rij ) K(dij) + \u2030(1 \u00a1 K(dij))\n\n= \u2030\n\nwhen dij \u2022 rij\n\notherwise\n\n(5)\n\nWe plot this function in Figure 2A.\n2.2 Transition Model\nThe second part of the score penalizes large displacements from the previous time step. We\nuse the most obvious Gaussian model: each coordinate of each latent position is indepen-\ndently subjected to a Gaussian perturbation with mean 0 and variance (cid:190)2. Thus\n\nn\n\nlog P (XtjXt\u00a11) = \u00a1\n\nX\n\ni=1\n\njXi;t \u00a1 Xi;t\u00a11j2=2(cid:190)2 + const\n\n(6)\n\n3 Learning Stage One: Linear Approximation\nWe generalize classical multidimensional scaling (MDS) [6] to get an initial estimate of the\npositions in the latent space. We begin by recapping what MDS does. It takes as input an\nn \u00a3 n matrix of non-negative distances D where Di;j denotes the target distance between\nentity i and entity j. It produces an n \u00a3 p matrix X where the ith row is the position\nof entity i in p-dimensional latent space. MDS (cid:2)nds arg minX j ~D \u00a1 XX T jF where j \u00a2 jF\ndenotes the Frobenius norm [7]. ~D is the similarity matrix obtained from D, using standard\nlinear algebra operations. Let \u00a1 be the matrix of the eigenvectors of ~D, and \u2044 be a diagonal\nmatrix with the corresponding eigenvalues. Denote the matrix of the p positive eigenvalues\nby \u2044p and the corresponding columns of \u00a1 by \u00a1p. From this follows the expression of\nclassical MDS, i.e. X = \u00a1p\u2044\nTwo questions remain. Firstly, what should be our target distance matrix D? Secondly,\nhow should this be extended to account for time? The (cid:2)rst answer follows from [5] and\n\np .\n\n1\n\n2\n\n\fde(cid:2)nes Dij as length of the shortest path from i to j in graph G. We restrict this length to\na maximum of three hops in order to avoid the full n2 computation of all-shortest paths. D\nthus has a dense mostly constant structure.\nWhen accounting for time, we do not want the positions of entities to change drastically\nfrom one time step to another. Hence we try to minimize jXt \u00a1 Xt\u00a11jF along with the\nmain objective of MDS. Let ~Dt denote the ~D matrix derived from Gt. We formulate the\nt jF + \u201ajXt \u00a1 Xt\u00a11jF , where \u201a is a parameter\nabove problem as minimization of j ~Dt \u00a1 XtX T\nwhich controls the importance of the two parts of the objective function. The above does\nnot have a closed form solution. However, by constraining the objective function further,\nwe can obtain a closed form solution for a closely related problem. The idea is to work\nwith the distances and not the positions themselves. Since we are learning the positions\nfrom distances, we change our constraint (during this linear stage of learning) to encourage\nthe pairwise distance between all pairs of entities to change little between each time step,\ninstead of encouraging the individual coordinates to change little. Hence we try to minimize\n(7)\n\nt \u00a1 Xt\u00a11X T\nwhich is equivalent to minimizing the trace of ( ~Dt \u00a1 XtX T\nt \u00a1\nt\u00a11): The above expression has an analytical solution: an\nXt\u00a11X T\naf(cid:2)ne combination of the current information from the graph and the coordinates at the last\ntimestep. Namely, the new solution satis(cid:2)es,\n\nt )T ( ~Dt \u00a1 XtX T\n\nt jF + \u201ajXtX T\n\nt\u00a11)T (XtX T\n\nt \u00a1 Xt\u00a11X T\n\nj ~Dt \u00a1 XtX T\n\nt\u00a11jF\n\nt ) + \u201a(XtX T\n\nXtX T\n\nt =\n\n1\n\n1 + \u201a\n\n~Dt +\n\n\u201a\n\n1 + \u201a\n\nXt\u00a11X T\n\nt\u00a11\n\n(8)\n\nWe plot the two constraint functions in Figure 2B. When \u201a is zero, XtX T\nt equals ~Dt ,\nand when \u201a ! 1, it is equal to Xt\u00a11X T\nt\u00a11. As in MDS, eigendecomposition of the right\nhand side of equation 8 yields the solution Xt which minimizes the objective function in\nequation 7.\nWe now have a method which (cid:2)nds latent coordinates for time t that are consistent with\nGt and have similar pairwise distances as Xt\u00a11. But although all pairwise distances may\nbe similar, the coordinates may be very different. Indeed, even if \u201a is very large and we\nonly care about preserving distances, the resulting X may be any re(cid:3)ection, rotation or\ntranslation of the original Xt\u00a11. We solve this by applying the Procrustes transform to the\nsolution Xt of equation 8. This transform (cid:2)nds the linear area-preserving transformation\nof Xt that brings it closest to the previous con(cid:2)guration Xt\u00a11. The solution is unique\nif X T\nt =\nXtU V T , where X T\nBefore moving on to stage two\u2019s nonlinear optimization we must address the scalability of\nstage one. The naive implementation (SVD of the matrix from equation 8) has a cost of\nO(n3), for n nodes, since both ~Dt, and XtX T\nt , are dense n \u00a3 n matrices. However in [1]\nwe show how we use the power method [9] to exploit the dense mostly constant structure of\nDt and the fact that XtX T\nis just an outer product of two thin n \u00a3 p matrices. The power\nmethod is an iterative eigendecomposition technique which only involves multiplying a\nmatrix by a vector. Its net cost can be shown to be O(n2f + n + pn) per iteration, where\nf is the fraction of non-constant entries in Dt.\n\nt Xt\u00a11 is nonsingular [8], and for zero centered Xt and Xt\u00a11, is given by X \u2044\n\nt Xt\u00a11 = U SV T using Singular Value Decomposition (SVD).\n\nt\n\n4 Stage Two: Nonlinear Search\nStage One places entities in reasonably consistent locations which (cid:2)t our intuition, but it is\nnot tied to the probabilistic model from Section 2. Stage two uses these locations as ini-\ntializations for applying nonlinear optimization directly to the model in equation 1. We use\nconjugate gradient (CG) which was the most effective of several alternatives attempted. The\nmost important practical question is how to make these gradient computations tractable,\nespecially when the model likelihood involves a double sum over all entities. We must\n\n\fcompute the partial derivatives of logP (GtjXt) + logP (XtjXt\u00a11) with respect to all values\nxi;k;t for i 2 1:::n and k 2 1::p. First consider the P (GtjXt) term:\n\n@ log P (GtjXt)\n\n@Xi;k;t\n\n=X\n\nj;i\u00bbj\n\n@ log pij\n@Xi;k;t\n\n+X\n\nj;i6\u00bbj\n\n@log(1 \u00a1 pij)\n\n@Xi;k;t\n\n= X\n\nj;i\u00bbj\n\n@pij=@Xi;k;t\n\npij\n\n\u00a1X\n\nj;i6\u00bbj\n\n@pij=@Xi;k;t\n\n1 \u00a1 pij\n\n(9)\n\n= \u02c6i;j;k;t\n\n@pij=@Xi;k;t =\n\n@(pL\n\nijK + \u2030(1 \u00a1 K))\n\n@Xi;k;t\n\n= K\n\n@pL\nij\n\n@Xi;k;t\n\n+ pL\nij\n\n@K\n\n@K\n\n\u00a1 \u2030\n\n@Xi;k;t\n\n@Xi;k;t\n\n(10)\nHowever K, the biquadratic kernel introduced in equation 4, evaluates to zero and has a\nzero derivative when dij > rij. Plugging this information in (10), we have,\n\n@pij=@Xi;k;t = \u2030\u02c6i;j;k;t when dij \u2022 rij;\n\notherwise:\n\n0\n\nEquation (9) now becomes\n\n@ log P (GtjXt)\n\n@Xi;k;t\n\n\u02c6i;j;k;t\n\npij\n\n= X\n\nj;i\u00bbj\n\ndij \u2022rij\n\n\u00a1 X\n\nj;i6\u00bbj\n\ndij \u2022rij\n\n\u02c6i;j;k;t\n1 \u00a1 pij\n\n(11)\n\n(12)\n\nwhen dij \u2022 rij and zero otherwise. This simpli(cid:2)cation is very important because we\ncan now use a spatial data structure such as a kd-tree in the low dimensional latent space to\nretrieve all pairs of entities that lie within each other\u2019s radius in time O(rn+n log n) where\nr is the average number of in-radius neighbors of an entity [10, 11]. The computation of the\ngradient involves only those pairs. A slightly more sophisticated trick, omitted for space\nreasons, lets us compute log P (GtjXt), in O(rn + n log n) time. From equation(6), we\nhave\n\n@ log P (XtjXt\u00a11)\n\n@Xi;k;t\n\n= \u00a1\n\nXi;k;t \u00a1 Xi;k;t\u00a11\n\n(cid:190)2\n\n(13)\n\nIn the early stages of Conjugate Gradient, there is a danger of a plateau in our score\nfunction in which our (cid:2)rst derivative is insensitive to two entities that are connected, but\nare not within each other\u2019s radius. To aid the early steps of CG, we add an additional term\nto the score function, which penalizes all pairs of connected entities according to the square\nij . Weighting this by a constant pConst,\nof their separation in latent space, i.e. Pi\u00bbj d2\nour (cid:2)nal CG gradient becomes\n@ log P (GtjXt)\n\n@ log P (XtjXt\u00a11)\n\n\u00a1 pConst \u00a3 2 X\n\n(Xi;k;t \u00a1 Xj;k;t)\n\n@Xi;k;t\n\n@Xi;k;t\n\n+\n\n@Scoret\n@Xi;k;t\n\n=\n\nj\n\ni\u00bbj\n\n5 Results\nWe report experiments on synthetic data generated by a model described below and the\nNIPS co-publication data 1. We investigate three things: ability of the algorithm to re-\nconstruct the latent space based only on link observations, anecdotal evaluation of what\nhappens to the NIPS data, and scalability results on large datasets from Citeseer.\n\n5.1 Comparing with ground truth\nWe generate synthetic data for six consecutive timesteps. At each timestep the next set of\ntwo-dimensional latent coordinates are generated with the former positions as mean, and a\ngaussian noise of standard deviation (cid:190) = 0:01. Each entity is assigned a random radius.\nAt each step , each entity is linked with a relatively higher probability to the ones falling\nwithin its radius, or containing it within their radii. There is a noise probability of 0:1, by\n\n1See http://www.cs.toronto.edu/\u00bbroweis/data.html\n\n\fwhich any two entities i and j outside the maximum pairwise radii rij are connected. We\ngenerate graphs of sizes 20 to 1280, doubling the size every time. Accuracy is measured\nby drawing a test set from the same model, and determining the ROC curve for predicting\nwhether a pair of entities will be linked in the test set. We experiment with six approaches:\nA. The True model that was used to generate the data (this is an upper bound on the perfor-\n\nmance of any learning algotihm).\n\nB. The DSNL model learned using the above algorithms.\nC. A random model, guessing link probabilities randomly (this should have an AUC of 0.5).\nD. The Simple Counting model (Control Experiment). This ranks the likelihood of being\nlinked in the testset according to the frequency of linkage in the training set. It can be\nconsidered as the equivalent of the 1-nearest-neighbor method in classi(cid:2)cation: it does not\ngeneralize, but merely duplicates the training set.\n\nE. Time-varying MDS: The model that results from running stage one only.\nF. MDS with no time: The model that results from ignoring time information and running\n\nindependent MDS on each timestep.\n\nFigure 3 shows the ROC curves for the third timestep on a test set of size 160. Table 1\nshows the AUC scores of our approach and the (cid:2)ve alternatives for 3 different sizes of the\ndataset over the (cid:2)rst, third, and last time steps.\n\nTable 1. AUC score on graphs of size n for six\ndifferent models (A) True (B) Model learned\nby DSNL,(C) Random Model,(D) Simple\nCounting model(Control), (E) MDS with time,\nand (F) MDS without time.\n\nTime A\n\n1\n3\n6\n\n1\n3\n6\n\n0.94\n0.93\n0.93\n\n0.86\n0.86\n0.86\n\nB\n\n0.85\n0.88\n0.82\n\n0.83\n0.79\n0.81\n\n1\n3\n6\n\n0.81\n0.80\n0.81\n\n0.79\n0.79\n0.78\n\nD\n\nC\nn=80\n\n0.48\n0.48\n0.50\n\n0.76\n0.81\n0.76\n\nn=320\n\n0.50\n0.51\n0.50\nn=1280\n0.50\n0.50\n0.50\n\n0.70\n0.70\n0.71\n\n0.68\n0.69\n0.68\n\nE\n\n0.77\n0.77\n0.77\n\n0.72\n0.72\n0.74\n\nF\n\n0.67\n0.65\n0.67\n\n0.65\n0.62\n0.64\n\n0.61\n0.74\n0.70\n\n0.70\n0.71\n0.70\n\n600\n\n500\n\n400\n\n300\n\n200\n\n100\n\ne\nv\ni\nt\ni\ns\no\nP\n \ne\nu\nr\nT\n\nTrue Model\nRandom Model\nMDS without time\nTime\u2212varying MDS\nModel Learned\nControl Experiment\n\n0\n\n0\n\n2000\n\n4000\n\n6000\n8000\nFalse Positive\n\n10000\n\n12000\n\n14000\n\nFigure 3: ROC curves of the six different\nmodels described earlier for test set of size\n160 at timestep 3, in simulated data.\nIn all the cases we see that the true model has the highest AUC score, followed by the\nmodel learned by DSNL. The simple counting model rightly guesses some of the links in\nthe test graph from the training graph. However it also predicts the noise as links, and ends\nup being beaten by the model we learn. The results show that it is not suf(cid:2)cient to only\nperform Stage One. When the number of links is small, MDS without time does poorly\ncompared to our temporal version. However as the number of links grows quadratically\nwith the number of entities, regular MDS does almost as well as the temporal version:\nthis is not a surprise because the generalization bene(cid:2)t from the previous timestep becomes\nunnecessary with suf(cid:2)cient data on the current timestep. Further experiments we conducted\n[1] show that the experiments initialized with time-variant MDS converges almost twice as\nfast as those with random initialization, and also converges to a better log-likelihood.\n5.2 Visualizing the NIPS coauthorship data over time\nFor clarity we present a subset of the NIPS dataset, obtained by choosing a well-connected\nauthor, and including all authors and links within a few hops. We dropped authors who\n\n\fappeared only once and we merged the timesteps into three groups: 1987-1990 (Figure 4A),\n1991-1994(Figure 4B), and 1995-1998(Figure 4C). In each picture we have the links for\nthat timestep, a few well connected people highlighted, with their radii. These radii are\nlearnt from the model. Remember that the distance between two people is related to the\nradii. Two people with very small radii, are considered far apart in the model even if they\nare physically close. To give some intuition of the movement of the rest of the points, we\ndivided the area in the (cid:2)rst timestep in 4 parts, and colored and shaped the points in each\ndifferently. This coloring and shaping is preserved throughout all the timesteps.\nIn this paper we limit ourselves to anecdotal examination of the latent positions. For ex-\nample, with BurgesC and V apnikV we see that they had very small radii in the (cid:2)rst four\nyears, and were further apart from one another, since there was no co-publication. However\nin the second timestep they move closer, though there are no direct links. This is because\nof the fact that they both had co-published with neighbors of one another. On the third time\nstep they make a connection, and are assigned almost identical coordinates, since they have\na very overlapping set of neighbors.\nWe end the discussion with entities HintonG, GhahramaniZ, and JordanM. In the (cid:2)rst\ntimestep they did not coauthor with one another, and were placed outside one-another\u2019s\nradii. In the second timestep GhahramaniZ , and HintonG coauthor with JordanM.\nHowever since HintonG had a large radius and more links than the former, it is harder\nfor him to meet all the constraints, and he doesn\u2019t move very close to JordanM. In the\nnext timestep however GhahramaniZ has a link with both of the others, and they move\nsubstantially closer to one another.\n5.3 Performance Issues\nFigure 4D shows the performance against the number of entities. When kd-trees are used\nand the graphs are sparse scaling is clearly sub-quadratic and nearly linear in the number\nof entities, meeting our expectation of O(n log n) performance. We successfully applied\nour algorithms to networks of sizes up to 11,000 [1]. The results show subquadratic time-\ncomplexity along with satisfactory link prediction on test sets.\n\n6 Conclusions and Future Work\nThis paper has described a method for modeling relationships that change over time. We\nbelieve it is useful both for understanding relationships in a mass of historical data and also\nas a tool for predicting future interactions, and we plan to explore both directions further.\nIn [1] we develop a forward-backward algorithm, optimizing the global likelihood instead\nof treating the model as a tracking model. We also plan to extend this to (cid:2)nd the posterior\ndistributions of the coordinates following the approach used by [5].\nAcknowledgments\nWe are very grateful to Anna Goldenberg for her valuable insights. We also thank Paul\nKomarek and Sajid Siddiqi for some very helpful discussions and useful comments. This\nwork was partially funded by DARPA EELD grant F30602-01-2-0569.\n\nReferences\n[1] P. Sarkar and A. Moore. Dynamic social network analysis using latent space models. SIGKDD\n\nExplorations: Special Issue on Link Mining, 2005.\n\n[2] J. Schroeder, J. J. Xu, and H. Chen. Crimelink explorer: Using domain knowledge to facilitate\n\nautomated crime association analysis. In ISI, pages 168(cid:150)180, 2003.\n\n[3] J. J. Carrasco, D. C. Fain, K. J. Lang, and L. Zhukov. Clustering of bipartite advertiser-keyword\n\ngraph. In ICDM, 2003.\n\n[4] J. Palau, M. Montaner, and B. L\u00b7opez. Collaboration analysis in recommender systems using\n\nsocial networks. In Eighth Intl. Workshop on Cooperative Info. Agents (CIA\u201904), 2004.\n\n\fKoch\nC\n\nViola\n\nP\n\nSejnowski\nT\n\nManwani\nA\n\nBurges\nC\n\nVapnik\nV\n\nJordan\nHinton\nM\nG\n\nGhahramani\n\nZ\n\nBurges\nC\n\nVapnik\nV\n\nKoch\nC\n\nManwani\nA\n\nViola\nSejnowski\nP\nT\nHinton\nG\n\nJordan\nM\nGhahramani\nZ\n\n(B)\n\nquadratic score\nscore using kd\u2212tree\n\n(A)\n\nManwani\nA\nKoch\nC\n\nViola\n\nP\n\nSejnowski\nT\n\nHinton\nGhahramani\nG\n\nJordan\nM\n\n(C)\n\n1\n\n0.9\n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\ns\nd\nn\no\nc\ne\nS\n \nn\ni\n \ne\nm\nT\n\ni\n\nBurges\nC\n\nVapnik\nV\n\nZ\n\n0\n300\n\n400\n\n500\n\n600\n\n700\n\n800\n\n900\n\n1000\n\nNumber of entities\n\n(D)\n\nFigure 4: NIPS coauthorship data at A. Timestep 1: green stars in upper-left corner, ma-\ngenta pluses in top right, cyan spots in lower right, and blue crosses in the bottom-left. B.\nTimestep 2. C. Timestep 3. D. Time taken for score calculation vs number of entities.\n\n[5] A. E. Raftery, M. S. Handcock, and P. D. Hoff. Latent space approaches to social network\n\nanalysis. J. Amer. Stat. Assoc., 15:460, 2002.\n\n[6] R. L. Breiger, S. A. Boorman, and P. Arabie. An algorithm for clustering relational data with\napplications to social network analysis and comparison with multidimensional scaling. J. of\nMath. Psych., 12:328(cid:150)383, 1975.\n\n[7] I. Borg and P. Groenen. Modern Multidimensional Scaling. Springer-Verlag, 1997.\n[8] R. Sibson. Studies in the robustness of multidimensional scaling : Perturbational analysis of\n\nclassical scaling. J. Royal Stat. Soc. B, Methodological, 41:217(cid:150)229, 1979.\n\n[9] David S. Watkins. Fundamentals of Matrix Computations. John Wiley & Sons, 1991.\n[10] F. Preparata and M. Shamos. Computational Geometry: An Introduction. Springer, 1985.\n[11] A. G. Gray and A. W. Moore. N-body problems in statistical learning. In NIPS, 2001.\n\n\f", "award": [], "sourceid": 2879, "authors": [{"given_name": "Purnamrita", "family_name": "Sarkar", "institution": null}, {"given_name": "Andrew", "family_name": "Moore", "institution": null}]}