{"title": "Link Discovery using Graph Feature Tracking", "book": "Advances in Neural Information Processing Systems", "page_first": 1966, "page_last": 1974, "abstract": "We consider the problem of discovering links of an evolving undirected graph given a series of past snapshots of that graph. The graph is observed through the time sequence of its adjacency matrix and only the presence of edges is observed. The absence of an edge on a certain snapshot cannot be distinguished from a missing entry in the adjacency matrix. Additional information can be provided by examining the dynamics of the graph through a set of topological features, such as the degrees of the vertices. We develop a novel methodology by building on both static matrix completion methods and the estimation of the future state of relevant graph features. Our procedure relies on the formulation of an optimization problem which can be approximately solved by a fast alternating linearized algorithm whose properties are examined. We show experiments with both simulated and real data which reveal the interest of our methodology.", "full_text": "Link Discovery using Graph Feature Tracking\n\nEmile Richard\n\nENS Cachan - CMLA & MilleMercis, France\n\nr.emile.richard@gmail.com\n\nNicolas Baskiotis\n\nENS Cachan - CMLA\n\nnicolas.baskiotis@lip6.com\n\nTheodoros Evgeniou\n\nTechnology Management and Decision Sciences,\n\nINSEAD\n\nBd de Constance, Fontainebleau 77300, France\n\ntheodoros.evgeniou@insead.edu\n\nNicolas Vayatis\n\nENS Cachan & UniverSud - CMLA UMR CNRS 8536, France\n\nnicolas.vayatis@cmla.ens-cachan.fr\n\nAbstract\n\nWe consider the problem of discovering links of an evolving undirected graph\ngiven a series of past snapshots of that graph. The graph is observed through the\ntime sequence of its adjacency matrix and only the presence of edges is observed.\nThe absence of an edge on a certain snapshot cannot be distinguished from a\nmissing entry in the adjacency matrix. Additional information can be provided by\nexamining the dynamics of the graph through a set of topological features, such as\nthe degrees of the vertices. We develop a novel methodology by building on both\nstatic matrix completion methods and the estimation of the future state of relevant\ngraph features. Our procedure relies on the formulation of an optimization prob-\nlem which can be approximately solved by a fast alternating linearized algorithm\nwhose properties are examined. We show experiments with both simulated and\nreal data which reveal the interest of our methodology.\n\n1\n\nIntroduction\n\nThe prediction of the future state of an evolving graph is a challenge of interest in many applications\nsuch as predicting hyperlinks of webpages [16], \ufb01nding protein-protein interactions [7], studying\nsocial networks [9], as well as collaborative \ufb01ltering and recommendations [6]. Link prediction can\nalso be seen as a special case of matrix completion where the goal is to estimate the missing entries\nof the adjacency matrix of the graph where the entries can be only \u201d0s\u201d and \u201d1s\u201d. Matrix completion\nbecame popular after the Net\ufb02ix Challenge and has been extensively studied on both theoretical and\nalgorithmic aspects [15]. In this paper we consider a special case of predicting the evolution of a\ngraph, where we only predict the new edges given a \ufb01xed set of vertices of an undirected graph by\nusing the dynamics of the graph over time.\nMost of the existing methods in matrix completion assume that weights over the entries (i.e.\nthe\nedges of the graph, e.g. scores in movie recommendation applications) are observed [3]. These\nweights provide a richer information than the binary case (existence or absence of a link). Consider\nfor instance the issue of link prediction in recommender systems. In that case, we consider a bipartite\ngraph for which the vertices represent products and users, and the edges connect users with the\nproducts they have purchased in the past. The setup we consider in the present paper corresponds to\n\n1\n\n\fthe binary case where we only observe purchase data, say the presence of a link in the graph, without\nany score or feedback on the product for a given user. Hence, we will deal here with the situation\nwhere the components of snapshots of the adjacency matrix only consist of \u201d1s\u201d and missing values.\nMoreover, link prediction methods typically use only one snapshot of the graph\u2019s adjacency matrix -\nthe most recent one - to predict its missing entries [9], or rely on latent variables providing semantic\ninformation for each vertex [11]. Since these methods do not use any information over time, they can\nbe called static methods. Static methods are based on the heuristic that some topological features,\nsuch as the degree, the clustering coef\ufb01cient, or the length of the paths, follow speci\ufb01c distributions.\nHowever, information about how the links of the graph and its topological features have been evolv-\ning over time may also be useful to predict future links. In the example of recommender systems,\nknowing that a particular product has been purchased by increasingly more people in a short time\nwindow provides useful information about the type of the recommendations to be made in the next\nperiod. The main idea underlying our work lies in the observation that a few graph features can\ncapture the dynamics of the graph evolution and provide information for predicting future links.\nThe purpose of the paper is to present a procedure which exploits the dynamics of the evolution of\nthe graph to \ufb01nd unrevealed links in the graph. The main idea is to learn over time the evolution\nof well-chosen local features (at the level of the vertices) of the graph and then, use the predicted\nvalue of these features on the next time period to discover the missing links. Our approach is related\nto two theoretical streams of research: matrix completion and diffusion models. In the latter only\nthe dynamics over time of the degree of a particular vertex of the graph are modeled - the diffusion\nof the product corresponding to that vertex for example [17, 14]. Beyond the large number of\nstatic matrix completion methods, only a few methods have been developed that combine static and\ndynamic information mainly using parametric methods \u2013 see [4] for a survey. For example, [13]\nembeds graph vertices on a latent space and use either a Markov model or a Gaussian one to track\nthe position of the vertices in this space; [10] uses a probabilistic model of the time interval between\nthe appearance of two edges or subgraphs to predict future edges or subgraphs. However, to the best\nof our knowledge, there has not been any regularization based method for this problem, which we\nconsider in this paper.\nThe setup of dynamic feature-based matrix completion is presented in Section 2. In Section 3, we\ndevelop a fast linearized algorithm for ef\ufb01cient link prediction. We then discuss the use and esti-\nmation of relevant features within this regularization approach in Section 4. Eventually, numerical\nexperiments on synthetic and real data sets are depicted in Section 5.\n\n2 Dynamic feature-based matrix completion\nSetup. We consider a sequence of T undirected graphs with n vertices and n \u00d7 n binary adjacency\nmatrices At, t \u2208 {1, 2, ..., T} where for each t the edges of the graph are also contained in the graph\nat time t + 1. Given At, t \u2208 {1, 2, ..., T} the goal is to predict the edges of the graph that are most\nlikely to appear at time T + 1, that is, the most likely non-zero elements of the binary adjacency\nmatrix AT +1. To this purpose we want to learn an n \u00d7 n real-valued matrix S whose elements\nindicate how likely it is that there is a non-zero value at the corresponding position of matrix AT +1.\nThe edges that we predict to be the most likely ones at time T + 1 are the ones corresponding to the\nlargest values in S.\nWe assume that certain features of matrices At evolve over time smoothly. Such an assumption is\nnecessary to allow learnability of the evolution of At over time. For simplicity we consider a linear\nfeature map f : At (cid:55)\u2192 Ft where Ft is an n \u00d7 k matrix of the form Ft = At\u03a6, with \u03a6 an n \u00d7 k\nmatrix of features. Various feature maps, possibly nonlinear, can be used. We discuss an example\nof such features \u03a6 and a way to predict FT +1 given past values of the feature map F1, F2, ..., FT in\nSection 4 \u2013 but other features or prediction methods can be used in combination with the main part\nof the proposed approach. In the proposed method discussed in Section 3 we assume for now that\nwe already have an estimate of FT +1.\n\nAn optimization problem. The procedure we propose for link prediction is based on the assumption\nthat the dynamics of graph features also drive the discovery of the location of new links. Given the\n\nlast adjacency matrix AT , a set of features \u03a6, and an estimate (cid:98)F of FT +1 based on the sequence\n\n2\n\n\fof adjacency matrices At, t \u2208 {1, 2, ..., T}, we want to \ufb01nd a matrix S which ful\ufb01lls the following\nrequirements:\n\nprovide a proxy for the training error.\n\n\u2022 S has low rank - this is a standard assumption in matrix completion problems [15].\n\u2022 S is close to the last adjacency matrix AT - the distance between these two matrices will\n\u2022 The values of the feature map at S and AT +1 are similar.\n\nFor any matrix M, we denote by (cid:107)M(cid:107)F =(cid:112)Tr(M(cid:48)M ) , the Frobenius norm of M, with M(cid:48) being\nsquare matrix N. We also de\ufb01ne (cid:107)M(cid:107)\u2217 =(cid:80)n\n\nthe transpose of M and the trace operator Tr(N ) computes the sum of the diagonal elements of the\nk=1 \u03c3k(M ) , the nuclear norm of a square matrix M\nof size n \u00d7 n, where \u03c3k(M ) denotes the k-th largest singular value of M. We recall that a singular\nvalue of matrix M corresponds to the square root of an eigenvalue of M(cid:48)M ordered decreasingly.\nThe proposed optimization problem for feature-based matrix completion is then:\n\nL(S, \u03c4, \u03bd) , with L(S, \u03c4, \u03bd)\n\nmin\n\nS\n\n= \u03c4(cid:107)S(cid:107)\u2217 +\n.\n\n(cid:107)S \u2212 AT(cid:107)2\n\nF +\n\n1\n2\n\n1\n2\n\n\u03bd(cid:107)S\u03a6 \u2212 (cid:98)F(cid:107)2\n\nF ,\n\n(1)\n\nand where \u03c4 and \u03bd are positive regularization parameters. Each term of the functional L re\ufb02ects\nthe aforementioned requirements for the desired matrix S. In the case where \u03bd = 0, we do not use\ninformation about the dynamics of the graph. The minimizer of L corresponds to the singular value\nthresholding approach developed in [2], which is therefore a special case of (1). Note that a key\ndifference between link prediction and matrix completion is that in (1) the training error uses all\nentries of the adjacency matrix while in the case of matrix completion only the known entries (in our\ncase the \u201d1s\u201d) are used. We now discuss an ef\ufb01cient optimization algorithm for (1), the main part of\nthis work.\n\n3 An algorithm for link discovery\n\nSolving (1) is computationally slow. We adapt the fast linearization method developed in [5] to\nour problem, which attains an optimal iteration complexity when using only \ufb01rst order information.\nHere, the functional L(S, \u03c4, \u03bd) is continuous and convex but not differentiable with respect to S.\nWe propose to convert the minimization of the target functional L(S, \u03c4, \u03bd) into a tractable problem\nthrough the following steps:\n\n1. Variable splitting - Set:\ng(S, \u03c4 ) = \u03c4(cid:107)S(cid:107)\u2217\n\n(cid:107)S \u2212 AT(cid:107)2\n\n1\n2\n\nand\n\nh(S, \u03bd) =\n\nDenote by S,(cid:101)S two n \u00d7 n matrices. Then, the optimization problem (1) is equivalent to:\nwhere L(S,(cid:101)S)\n\nL(S,(cid:101)S) ,\n= g(S, \u03c4 ) + h((cid:101)S, \u03bd).\n\nsubject to S \u2212(cid:101)S = 0 .\n\nS,(cid:101)S\n\nF +\n\nmin\n\n.\n\n(2)\n\n\u03bd(cid:107)S\u03a6 \u2212 (cid:98)F(cid:107)2\n\nF .\n\n1\n2\n\n2. Smoothing the nuclear norm - We recall the variational formulation of the nuclear norm\n: \u03c31(Z) \u2264 1}. Using the technique from [12], we can use a\n(cid:107)S(cid:107)\u2217 = maxZ{(cid:104)S, Z(cid:105)\nsmooth approximation of the nuclear norm and replace g in the functional by a surrogate\nfunction g\u03b7 with \u03b7 > 0 being a smoothing parameter:\n\ng\u03b7(S, \u03c4 ) = \u03c4 \u00b7 max\n\nZ\n\n(cid:111)\nF : \u03c31(Z) \u2264 1\n\n(cid:107)Z(cid:107)2\n\n(cid:110)(cid:104)S, Z(cid:105) \u2212 \u03b7\n= g\u03b7(S, \u03c4 ) + h((cid:101)S, \u03bd) ,\n\n2\n\nL\u03b7(S,(cid:101)S)\n\n3. Alternating minimization - We propose to minimize the functional which is continuous,\n\ndifferentiable and convex:\n\nunder the constraint that S = (cid:101)S. To do this, one has to minimize simultaneously the two\n\nfunctions g\u03b7 and h. In order to derive the iterative algorithm based on linearized alternating\n\n(3)\n\n.\n\n3\n\n\fminimization, we introduce two strictly convex approximations of these functions which\ninvolve an additional parameter \u00b5 > 0:\n\nG\u03b7,\u00b5(S, \u02dcS) = g\u03b7(S, \u03c4 ) + (cid:104)\u2207h((cid:101)S), S \u2212(cid:101)S(cid:105) +\nH\u00b5(S,(cid:101)S) = h((cid:101)S, \u03bd) + (cid:104)\u2207g\u03b7(S),(cid:101)S \u2212 S(cid:105) +\n\n(cid:107)S \u2212(cid:101)S(cid:107)2\n(cid:107)S \u2212(cid:101)S(cid:107)2\n\nF\n\nF\n\n1\n2\u00b5\n\n1\n2\u00b5\n\nwhere (cid:104)B, C(cid:105) = Tr(B(cid:48)C) for two matrices B, C. The tuning of the parameter \u00b5 will be discussed\n\nwith the convergence results at the end of this section. We denote by mG((cid:101)S) the minimizer of\nG\u03b7,\u00b5(S, \u02dcS) with respect to S and mH (S) the minimizer of H\u00b5(S,(cid:101)S) with respect to (cid:101)S. We can\nnow formulate an algorithm for the fast minimization of the functional L\u03b7(S,(cid:101)S) inspired by the\nlinear combination of the updates for S and (cid:101)S. The work by Ma and Goldfarb shows indeed that the\nlink prediction algorithm are those of the minimizers mG((cid:101)S) and mH (S). It turns out that in our\n\nalgorithm FALM in [5] (see Algorithm 1). Note that, in the alternating descent for the simultaneous\nminimization of the two functions G\u03b7,\u00b5 and H\u00b5, we use an auxiliary matrix Zk. This matrix is a\n\nparticular choice made here leads to an optimal rate of numerical convergence. Key formulas in the\n\ncase, these minimizers have explicit expressions which can be derived when solving the \ufb01rst-order\noptimality condition as Proposition 1 shows.\n\nAlgorithm 1 - Link Discovery Algorithm\n\nParameters: \u03c4, \u03bd, \u03b7\nInitialization: W0 = Z1 = AT , \u03b11 = 0\nfor k = 1, 2, . . . do\nSk \u2190 mG(Zk)\nWk \u2190 1\n2\n\u03b1k+1 \u2190 1\n(1 +\n2\nZk+1 \u2190 Wk +\n\n(cid:101)Sk \u2190 mH (Sk)\n(cid:0) \u03b1k((cid:101)Sk \u2212 Wk\u22121) \u2212 (Wk \u2212 Wk\u22121)(cid:1)\n\n(Sk +(cid:101)Sk)\n(cid:113)\n\n1 + 4\u03b12\nk)\n1\n\nand\n\nend for\n\n\u03b1k+1\n\nProposition 1 Let \u02c6S = (cid:101)S \u2212 \u00b5\u2207h((cid:101)S) and the singular value decomposition (cid:98)S = (cid:98)UDiag((cid:98)\u03c3)(cid:98)V . We\n\nalso consider the singular value decomposition of S denoted by S = UDiag(\u03b7\u03bb)V . We set the\nnotation, for x > 0:\n\n\u03b1(x) = max\n\n\uf8f1\uf8f4\uf8f2\uf8f4\uf8f3 x\n\uf8fc\uf8f4\uf8fd\uf8f4\uf8fe\nS + \u03bd(cid:98)F \u03a6(cid:48)(cid:19)(cid:18)(cid:18)\nAT \u2212 \u03c4 UDiag(cid:0)min{\u03bb, 1}(cid:1)V +\n\n, x \u2212 \u03c4 \u00b5\n\n\u03c4 \u00b5\n\u03b7\n\n1 +\n\n(cid:18)\n\n1\n\u00b5\n\nWe then have:\n\nmG((cid:101)S) = (cid:98)UDiag{\u03b1(\u02c6\u03c3)}(cid:98)V\n\nmH (S) =\n\n(cid:19)\n\nIn + \u03bd\u03a6\u03a6(cid:48)(cid:19)\u22121\n\n.\n\n1\n\u00b5\n\n1 +\n\nThe proof can be found in the Appendix.\n\nValidity of the approximations and rates of convergence. Our strategy replaces the non-differentiable\nterm in L by a smooth version of it. The next result offers guarantees that minimizing the surrogate\nfunction (3) provides an approximate solution of the initial problem (1). We will say that an element\nx\u0001 is an \u0001-optimal solution of a function \u03a8(x) if it is such that \u03a8(x\u0001) \u2264 inf x \u03a8(x) + \u0001.\n\n4\n\n\fProposition 2 The following statements hold true:\n\n\u2022 We have, for any (S,(cid:101)S):\n\u2022 To \ufb01nd an \u0001-optimal solution of L(S,(cid:101)S), it suf\ufb01ces to \ufb01nd an \u0001/2-optimal solution of\nL\u03b7(S,(cid:101)S) with \u03b7 = \u0001/n.\n\nL\u03b7(S,(cid:101)S) \u2264 L(S,(cid:101)S) \u2264 L\u03b7(S,(cid:101)S) +\n\nn\u03b7\n2\n\n.\n\nThe proof of this result can be derived straightforwardly from [5]. Moreover, following the proof\n\u221a\nof Theorem 4.3 in [5], one can show that the number of iterations in order to reach an \u0001-optimal\nsolution of L\u03b7 using Algorithm 1 is of the order O(1/\n\u0001). In that result of [5], an optimal choice\nof the parameter \u00b5 is provided as the inverse of the largest value for the Lipschitz constant of each\n\nof the gradients of g\u03b7 and h. With our notations, we can easily derive here: \u00b5 = min(cid:0)\u03b7/\u03c4, 1/(1 +\n\u03bd\u03c31(\u03a6))(cid:1), where \u03c31(\u03a6) is the largest singular value of \u03a6.\n\n4 Learning the graph features\nAs discussed above one can use various features \u03a6 and methods to predict the n \u00d7 k matrix FT +1\ngiven past values of the feature map F1, F2, ..., FT . We consider a particular case here to use in\nconjunction with the main algorithm in the previous section. In particular, we use as features \u03a6 the\n\ufb01rst k eigenvectors of the adjacency matrix AT . Let AT = \u2126\u039b\u2126(cid:48) be the orthonormal eigenvalue\n(:,1:k), an n \u00d7 k matrix. Note that\ndecomposition of AT which is symmetric. We set \u03a6 = \u2126(:,1:k)\u039b\u22121\nAT \u03a6 = \u2126(:,1:k) and that \u2126(:,1:k) is the most informative n \u00d7 k matrix for the reconstruction of AT .\nThe suggested method aims to estimate AT +1\u03a6 that is informative for the reconstruction of AT +1.\nWe denote by \u03a6j, j \u2208 {1, 2, ..., k} the n-dimensional feature vectors which are the columns of \u03a6.\nFor each feature j \u2208 {1, 2, ..., k}, we consider the n-dimensional time series {At\u03a6j , t = 1, . . . , T}\nwhich describes the evolution of the j-th feature over the n vertices of the graph. We now describe\nthe procedure for learning the evolution of this j-th graph feature over time:\n\n1. Fix an integer m < T to learn a map between m past values (At\u2212m\u03a6j, . . . , At\u22121\u03a6j) and\n\nthe current value of the n-dimensional vector At\u03a6j.\n2. Construct the training data for the learning step by using a sliding window of size m from\ntime t = 1 to t = T ; we then have T \u2212 m + 1 training data of dimension n \u00d7 m for each\nfeature j.\n\n3. Use ridge regression to \ufb01t the training data.\n4. Estimate the j-th column of FT +1 as the predicted value for AT +1\u03a6j using the regression\n\nmodel at the \u201cpoint\u201d (AT\u2212m+1\u03a6j, . . . , AT \u03a6j).\n\nCollecting the results for each j \u2208 {1, 2, ..., k}, we obtain the estimate (cid:98)F for the matrix FT +1 used\n\nin (1). We point out that using construction with a time shift means that implicitly the relation\nbetween m consecutive values of (At\u2212m\u03a6j, . . . , At\u22121\u03a6j) and the next value At\u03a6j is stable over\ntime (stationarity assumption). Clearly methods other than ridge regression or other ways of creating\nthe training data can be used, which we leave for future work.\n\n5 Experimental Results\n\nWe tested the proposed method using both simulated and real data sets. As benchmarks we use the\nfollowing methods:\n\n1. Static matrix completion corresponding to \u03bd = 0 in (1).\n2. The Katz algorithm [8] considered as one of the best static link prediction methods.\n\n5\n\n\f3. The Preferential Attachment method [1] for which the score (\u201dlikelihood\u201d) of an edge\n\n{u, v} is dudv where du and dv are the degrees of u and v.\n\n5.1 Synthetic Data\n\nWe generate sequences of graphs as follows. We \ufb01rst generate a sequence of T matrices Q(t) of\nsize n \u00d7 r whose entries Qi,j(t) are increasing over time as a sigmoid function :\n\n\uf8eb\uf8ed1 + erf\n\nQi,j(t) =\n\n1\n2\n\n\uf8eb\uf8ed t \u2212 \u00b5i,j(cid:113)\n\n2\u03c32\ni,j\n\n\uf8f6\uf8f8\uf8f6\uf8f8\n\nwhere \u00b5i,j \u2208 [0; T ], \u03c3i,j \u2208 [0; T /3] are picked uniformly for each (i, j). These matrices provide a\nsynthetic model for the evolution of the graph over time. We then add noise to the time dynamics\nas follows. For a given noise level \u03c9 \u2208 [0, 1] we replace each entry of Qi,j(t) with probability \u03c9\nwith any of the other values Qi,j(s) for s picked uniformly from {1, 2, ..., T}. Having constructed\nthe matrices Q(t), we then generate matrices S(t) = Q(t)Q(t)(cid:48) which are of rank r. We \ufb01nally\ngenerate the adjacency matrix At as\n\nA(t) = 1[\u03b8;\u221e[(S(t))\n\nfor a threshold \u03b8. We pick \u03b8 so that the sparsity (i.e. proportion of non-zero entries) of AT re\ufb02ects\nthe sparsity of the real data used in the next section (\u2248 10\u22123). In the experiments, we simulated\ngraphs with n = 1000 vertices.\n\n5.2 Real Data\n\nCollaborative Filtering1 We can see the purchase histories of e-commerce websites as graph\nsequences where links are established between a user and a product when the user purchases that\nproduct. We use data from 10 months music purchase history of a major e-commerce website to\nevaluate our method. For our test we selected a set of 103 users and 103 products that had the\nhighest degrees (number of sales). We split the 8.5 \u00d7 103 edges of the graph (corresponding to\npurchases) into two parts following their occurrence time. We used the data of the 8 \ufb01rst months to\npredict the features at the end of the 10th month and use these features as well as the matrix at the\nend of the 8th month to discover the purchases during the 2 last months.\n\n5.3 Results\n\nThe results are shown in Figure 1 and Tables 1 and 2. The Area Under an ROC Curve (AUC) is\nreported. For the simulation data we report the average AUC over 10 simulation runs.\nFrom the simulation results we observe that for low rank underlying matrices, our method outper-\nforms the rivals. The same comparative results were observed for ranks as high as 100. Our method\n(as well as the static low rank method based on the low rank hypothesis) however fails when the\nrank of S(t) is high. However, even in this case our method outperforms the method of static matrix\ncompletion.\nThe results with the real data further indicate the advantage of using information about the evolution\nof the graph over time. Similarly to the simulation data, the proposed method outperforms the static\nmatrix completion one.\n\n6 Conclusion\n\nThe main contribution of this work is the formulation of a learning problem that can be used to\npredict the evolution of the edges of a graph over time. A regularization approach to combine both\nstatic graph information as well as information about the dynamics of the evolution of the graph over\ntime is proposed and an optimization algorithm is developed. Despite using simple graph features\n\n1Notice that we are looking to discover only unobserved links and not new occurences of past links. Thus\n\nthe comparaison with some popular benchmarks (as coauthorship data sets) is inappropriate.\n\n6\n\n\fFigure 1: AUC performance of the proposed algorithm with respect to the two parameters \u03c4 and \u03bd\non simulated data.\n(r,\u03c9) \\ Method\n(5,0.000)\n(5,0.250)\n(5,0.750)\n(500,0.000)\n(500,0.250)\n(500,0.750)\n\nProposed Method\n0.671\u00b10.008\n0.675 \u00b1 0.009\n0.519 \u00b1 0.007\n0.592 \u00b1 0.008\n0.607 \u00b1 0.011\n0.601 \u00b1 0.010\n\nStatic\n0.648 \u00b1 0.008\n0.642 \u00b1 0.007\n0.525 \u00b1 0.005\n0.587 \u00b1 0.007\n0.588 \u00b1 0.009\n0.583 \u00b1 0.007\n\nPref. A.\n0.627 \u00b1 0.015\n0.602 \u00b1 0.016\n0.497 \u00b1 0.007\n0.671 \u00b1 0.010\n0.649 \u00b1 0.009\n0.645 \u00b1 0.017\n\nKatz\n0.616 \u00b1 0.015\n0.592 \u00b1 0.016\n0.491 \u00b1 0.007\n0.667 \u00b1 0.009\n0.643 \u00b1 0.009\n0.641 \u00b1 0.017\n\nTable 1: Simulation data. The average AUC over 10 simulation runs is reported. For each row the\npair of numbers in the \ufb01rst column show the rank r and the noise level \u03c9.\n\nas well as estimation of the evolution of the feature values over time, experiments indicate that the\nproposed optimization method improves performance relative to benchmarks. Testing, or learning,\nother graph features as well as other ways to model their dynamics over time may further improve\nperformance and is part of future work.\n\nAppendix - Proof of Proposition 1\n\nWe \ufb01rst write the optimality condition for G\u03b7,\u00b5(S, \u02dcS) with respect to S:\n\n\u2207Sg\u03b7(S) + \u2207h((cid:101)S) +\n\n(S \u2212(cid:101)S) = 0 .\n\n1\n\u00b5\n\n\u03c4 \\ \u03bd\n0\n1\n2\n3\n4\n\n0\n0.568\n0.626\n0.638\n0.569\n0.569\n\n0.1\n0.584\n0.684\n0.678\n0.646\n0.556\n\n0.3\n0.585\n0.683\n0.671\n0.635\n0.562\n\n0.7\n0.585\n0.675\n0.688\n0.645\n0.565\n\n1.6\n0.562\n0.668\n0.672\n0.643\n0.563\n\nTable 2: Collaborative Filtering data; AUC for different values of \u03c4 and \u03bd. The AUC of preferential\nattachment is 0.6019, and Katz reaches 0.6670\n\n7\n\n0246810051015200.450.50.550.60.65(cid:105)(cid:111)AUC\fWith the notations for (cid:98)S, the previous condition can be written:\n\u00b5\u2207g\u03b7(S) + S \u2212(cid:98)S = 0 .\nV = (cid:98)V ,\n\nU = (cid:98)U ,\n\nand\n\n\u02c6\u03c3 = \u00b5\u03c4 min{\u03b3, 1} + \u03b7\u03b3 ,\n\nWe now use the fact that \u2207g\u03b7(S) = \u03c4 UDiag(min{\u03b3, 1})V where S/\u03b7 = UDiag(\u03b3)V (see [5]).\nThis observation leads to the solution where S satis\ufb01es:\n\nwhich gives the \ufb01rst result, since there is a unique solution due to the strict convexity of the function.\n\nSimilarly, the optimality condition of H\u00b5(S,(cid:101)S) with respect to (cid:101)S is\n\nSince the function h is differentiable as the sum of two quadratic terms, we have:\n\n1\n\u00b5\n\n( \u02dcS \u2212 S) = 0 .\n\n\u2207h((cid:101)S) + \u2207g\u03b7(S) +\n\u2207h((cid:101)S) = (cid:101)S \u2212 AT + \u03bd((cid:101)S\u03a6 \u2212 (cid:98)F )\u03a6(cid:48) ,\n(cid:19)\n\nS + \u03bd(cid:98)F \u03a6(cid:48)(cid:19)(cid:18)(cid:18)\n\n1 +\n\n1\n\u00b5\n\nand we can derive the optimal value for (cid:101)S:\n\n(cid:18)\n\nmH (S) =\n\nAT \u2212 \u2207g\u03b7(S) +\n\n1\n\u00b5\n\nIn + \u03bd\u03a6\u03a6(cid:48)(cid:19)\u22121\n\n. (cid:3)\n\nAcknowledgments\nThis work was partially supported by DIGIT \u00b4EO (B \u00b4EMOL project), that authors greatly thank.\n\nReferences\n[1] A. L. Barab\u00b4asi, H. Jeong, Z. Nda, A. Schubert, and T. Vicsek. Evolution of the social network\nof scienti\ufb01c collaborations. Physica A: Statistical Mechanics and its Applications, 311(3-\n4):590\u2013614, 2002.\n\n[2] Emmanuel J. Cand`es and Terence. Tao. A singular value thresholding algorithm for matrix\n\ncompletion. SIAM Journal on Optimization, 20(4):1956\u20131982, 2008.\n\n[3] Emmanuel J. Cand`es and Terence Tao. The power of convex relaxation: Near-optimal matrix\n\ncompletion. IEEE Transactions on Information Theory, 56(5), 2009.\n\n[4] Lise Getoor and Christopher P. Diehl. Link mining: a survey. SIGKDD Explorations Newslet-\n\nter, 7(2):3\u201312, 2005.\n\n[5] Donald Goldfarb and Shiqlan Ma. Fast alternating linearization methods for minimizing the\nsum of two convex functions. Technical Report, Department of IEOR, Columbia University,\n2009.\n\n[6] Yifan Hu, Yehuda Koren, and Chris Volinsky. Collaborative \ufb01ltering for implicit feedback\ndatasets. In Proceedings of the 8th IEEE International Conference on Data Mining (ICDM\n2008), pages 263\u2013272, 2008.\n\n[7] Hisashi Kashima, Tsuyoshi Kato, Yoshihiro Yamanishi, Masashi Sugiyama, and Koji Tsuda.\nLink propagation: A fast semi-supervised learning algorithm for link prediction. In Proceed-\nings of the SIAM International Conference on Data Mining, SDM 2009, pages 1099\u20131110,\n2009.\n\n[8] Leo Katz. A new status index derived from sociometric analysis. Psychometrika, 18(1):39\u201343,\n\n1953.\n\n[9] David Liben-Nowell and Jon Kleinberg. The link-prediction problem for social networks.\nJournal of the American Society for Information Science and Technology, 58(7):1019\u20131031,\n2007.\n\n[10] Tanya Y. Berger-Wolf Mayank Lahiri. Structure prediction in temporal networks using frequent\nsubgraphs. IEEE Symposium on Computational Intelligence and Data Mining (CIDM), 2007.\n\n8\n\n\f[11] Kurt Miller, Thomas Grif\ufb01ths, and Michael Jordan. Nonparametric latent feature models for\nlink prediction. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta,\neditors, Advances in Neural Information Processing Systems 22, pages 1276\u20131284. 2009.\n\n[12] Yu Nesterov. Smooth minimization of non-smooth functions. Mathematical Programming,\n\n103(1):127\u2013152, 2005.\n\n[13] Purnamrita Sarkar, Sajid Siddiqi, and Geoffrey J. Gordon. A latent space approach to dynamic\nembedding of cooccurrence data. In In Proceedings of the Eleventh International Conference\non Arti\ufb01cial Intelligence and Statistics (AI-STATS), 2007.\n\n[14] Ashish Sood, Gareth M. James, and Gerard J. Tellis. Functional regression: A new model for\n\npredicting market penetration of new products. Marketing Science, 28(1):36\u201351, 2009.\n\n[15] Nathan Srebro, Jason D. M. Rennie, and Tommi S. Jaakkola. Maximum-margin matrix fac-\ntorization. In Lawrence K. Saul, Yair Weiss, and L\u00b4eon Bottou, editors, Advances in Neural\nInformation Processing Systems 17, pages 1329\u20131336. MIT Press, Cambridge, MA, 2005.\n\n[16] Ben Taskar, Ming-Fai Wong, Pieter Abbeel, and Daphne Koller. Link prediction in relational\ndata. In Sebastian Thrun, Lawrence Saul, and Bernhard Sch\u00a8olkopf, editors, Advances in Neural\nInformation Processing Systems 16. MIT Press, Cambridge, MA, 2004.\n\n[17] Demetrios Vakratsas, Fred M. Feinberg, Frank M. Bass, and Gurumurthy Kalyanaram. The\nShape of Advertising Response Functions Revisited: A Model of Dynamic Probabilistic\nThresholds. Marketing Science, 23(1):109\u2013119, 2004.\n\n9\n\n\f", "award": [], "sourceid": 915, "authors": [{"given_name": "Emile", "family_name": "Richard", "institution": null}, {"given_name": "Nicolas", "family_name": "Baskiotis", "institution": null}, {"given_name": "Theodoros", "family_name": "Evgeniou", "institution": null}, {"given_name": "Nicolas", "family_name": "Vayatis", "institution": null}]}