{"title": "Directed Graph Embedding: an Algorithm based on Continuous Limits of Laplacian-type Operators", "book": "Advances in Neural Information Processing Systems", "page_first": 990, "page_last": 998, "abstract": "This paper considers the problem of embedding directed graphs in Euclidean space while retaining directional information. We model the observed graph as a sample from a manifold endowed with a vector field, and we design an algo- rithm that separates and recovers the features of this process: the geometry of the manifold, the data density and the vector field. The algorithm is motivated by our analysis of Laplacian-type operators and their continuous limit as generators of diffusions on a manifold. We illustrate the recovery algorithm on both artificially constructed and real data.", "full_text": "Directed Graph Embedding: an Algorithm based on\n\nContinuous Limits of Laplacian-type Operators\n\nDominique C. Perrault-Joncas\n\nDepartment of Statistics\nUniversity of Washington\n\nSeattle, WA 98195\n\ndcpj@stat.washington.edu\n\nmmp@stat.washington.edu\n\nMarina Meil\u02d8a\n\nDepartment of Statistics\nUniversity of Washington\n\nSeattle, WA 98195\n\nAbstract\n\nThis paper considers the problem of embedding directed graphs in Euclidean\nspace while retaining directional information. We model the observed graph as\na sample from a manifold endowed with a vector \ufb01eld, and we design an algo-\nrithm that separates and recovers the features of this process: the geometry of the\nmanifold, the data density and the vector \ufb01eld. The algorithm is motivated by our\nanalysis of Laplacian-type operators and their continuous limit as generators of\ndiffusions on a manifold. We illustrate the recovery algorithm on both arti\ufb01cially\nconstructed and real data.\n\n1 Motivation\n\nRecent advances in graph embedding and visualization have focused on undirected graphs, for which\nthe graph Laplacian properties make the analysis particularly elegant [1, 2]. However, there is\nan important number of graph data, such as social networks, alignment scores between biological\nsequences, and citation data, which are naturally asymmetric. A commonly used approach for this\ntype of data is to disregard the asymmetry by studying the spectral properties of W +W T or W T W ,\nwhere W is the af\ufb01nity matrix of the graph.\nSome approaches have been offered to preserve the asymmetry information contained in data: [3],\n[4], [5] or to de\ufb01ne directed Laplacian operators [6]. Although quite successful, these works adopt\na purely graph-theoretical point of view. Thus, they are not concerned with the generative process\nthat produces the graph, nor with the interpretability and statistical properties of their algorithms.\nIn contrast, we view the nodes of a directed graph as a \ufb01nite sample from a manifold in Euclidean\nspace, and the edges as macroscopic observations of a diffusion kernel between neighboring points\non the manifold. We explore how this diffusion kernel determines the overall connectivity and\nasymmetry of the resulting graph and demonstrate how Laplacian-type operators of this graph can\noffer insights into the underlying generative process.\nBased on the analysis of the Laplacian-type operators, we derive an algorithm that, in the limit of in-\n\ufb01nite sample and vanishing bandwidth, recovers the key features of the sampling process: manifold\ngeometry, sampling distribution, and local directionality, up to their intrinsic indeterminacies.\n\n2 Model\nThe \ufb01rst premise here is that we observe a directed graph G, with n nodes, having weights\nW = [Wij] for the edge from node i to node j. In following with common Laplacian-based embed-\nding approaches, we assume that G is a geometric random graph constructed from n points sampled\naccording to distribution p = e\u2212U on an unobserved compact smooth manifold M \u2286 Rl of known\nintrinsic dimension d \u2264 l. The edge weight Wij is then determined by a directed similarity kernel\nk\u0001(xi, xj) with bandwidth \u0001. The directional component of k\u0001(xi, xj) will be taken to be derived\n\n1\n\n\ffrom a vector \ufb01eld r on M, which assigns a preferred direction between weights Wij and Wji. The\nchoice of a vector \ufb01eld r to characterize the directional component of G might seem restrictive at\n\ufb01rst. In the asymptotic limit of \u0001 \u2192 0 and n \u2192 \u221e however, kernels are characterized by their\ndiffusion, drift, and source components [7]. As such, r is suf\ufb01cient to characterize any directionality\nassociated with a drift component and as it turns out, the component of r normal M in Rl can also\nbe use to characterize any source component. As for the diffusion component, it is not possible\nto uniquely identify it from G alone [8]. Some absolute knownledge of M is needed to say any-\nthing about it. Hence, without loss of generality, we will construct k\u0001(xi, xj) so that the diffusion\ncomponent ends being isotropic and constant, i.e. equal to Laplace-Beltrami operator \u2206 on M.\nThe schematic of this generative process is shown in the top left of Figure 1 below.\n\nFrom left to right: the graph gen-\nerative process mapping the sam-\nple on M to geometric random\ngraph G via the kernel k\u0001(x, y),\nthen the subsequent embedding\n\u03a8n of G by operators H (\u03b1)\naa,n,\nH (\u03b1)\nss,n (de\ufb01ned in section 3.1).\nAs these operators converge to\ntheir respective limits, H (\u03b1)\naa and\nss , so will \u03a8n \u2192 \u03a8, pn \u2192 p,\nH (\u03b1)\nand rn \u2192 r.\nWe design an algorithm that,\ngiven G, produces the top right\nembedding (\u03a8n, pn, and rn).\n\nFigure 1: Schematic of our framework.\n\nThe question is then as follows: can the generative process\u2019 geometry M, distribution p = e\u2212U , and\ndirectionality r, be recovered from G? In other words, is there an embedding of G in Rm, m \u2265 d\nthat approximates all three components of the process and that is also consistent as sample size\nincreases and the bandwidth vanishes? In the case of undirected graphs, the theory of Laplacian\neigenmaps [1] and Diffusion maps [9] answers this question in the af\ufb01rmative, in that the geometry\nof M and p = e\u2212U can be inferred using spectral graph theory. The aim here is to build on\nthe undirected problem and recover all three components of the generative process from a directed\ngraph G.\nThe spectral approach to undirected graph embedding relies on the fact that eigenfunctions of the\nLaplace-Beltrami operator are known to preserve the local geometry of M [1]. With a consistent\nempirical Laplace-Beltrami operator based on G, its eigenvectors also recover the geometry of M\nand converge to the corresponding eigenfunctions on M. For a directed graph G, an additional\noperator is needed to recover the local directional component r, but the principle remains the same.\nThe schematic for this is shown in Figure 1 where two operators - H (\u03b1)\nss,n, introduced in [9] for\nundirected embeddings, and H (\u03b1)\naa,n, a new operator de\ufb01ned in section 3.1 - are used to obtain the\nembedding \u03a8n, distribution pn, and vector \ufb01eld rn. As H (\u03b1)\naa and\nH (\u03b1)\nss , \u03a8n, pn, and rn also converge to \u03a8, p, and r, where \u03a8 is the local geometry preserving the\nembedding of M into Rm.\nThe algorithm we propose in Section 4 will calculate the matrices corresponding to H (\u03b1)\n\u00b7,n from the\ngraph G, and with their eigenvectors, will \ufb01nd estimates for the node coordinates \u03a8, the directional\ncomponent r, and the sampling distribution p. In the next section we brie\ufb02y describe the mathemat-\nical models of the diffusion processes that our model relies on.\n\nss,n converge to H (\u03b1)\n\naa,n and H (\u03b1)\n\n2\n\n\f(cid:90)\n\n2.1 Problem Setting\nThe similarity kernel k\u0001(x, y) can be used to de\ufb01ne transport operators on M. The natural transport\noperator is de\ufb01ned by normalizing k\u0001(x, y) as\n\n(cid:90)\n\nT\u0001[f](x) =\n\nk\u0001(x, y)p(y)/(cid:82) k\u0001(x, y(cid:48))p(y(cid:48))dy(cid:48).\n\nrepresents\n\nM\n\nk\u0001(x, y)\np\u0001(x) f(y)p(y)dy , where p\u0001(x) =\n\nM\n\nk\u0001(x, y)p(y)dy .\n\n(1)\n\nthe diffusion of\n\nT\u0001[f](x)\ntransition density\nThe eigenfunctions of this in\ufb01nitesimal operator are the\ncontinuous limit of the eigenvectors of the transition probability matrix P = D\u22121W given by\nnormalizing the af\ufb01nity matrix W of G by D = diag(W 1) [10]. Meanwhile, the in\ufb01nitesimal\ntransition\n\na distribution f(y) by the\n\n(2)\nde\ufb01nes the backward equation for this diffusion process over M based on kernel k\u0001. Obtaining the\nexplicit expression for transport operators like (2) is then the main technical challenge.\n\n= lim\n\u0001\u21920\n\n\u0001\n\n\u2202f\n\u2202t\n\n(T\u0001 \u2212 I)f\n\n2.2 Choice of Kernel\n\nIn order for T\u0001[f] to have the correct asymptotic form, some hypotheses about the similarity ker-\nnel k\u0001(x, y) are required. The hypotheses are best presented by considering the decomposition of\nk\u0001(x, y) into symmetric h\u0001(x, y) = h\u0001(y, x) and anti-symmetric a\u0001(x, y) = \u2212a\u0001(y, x) components:\n\nk\u0001(x, y) = h\u0001(x, y) + a\u0001(x, y) .\n\n(3)\nThe symmetric component h\u0001(x, y) is assumed to satisfy the following properties: 1. h\u0001(||y \u2212\nx||2) = h(||y\u2212x||2/\u0001)\n, and 2. h \u2265 0 and h is exponentially decreasing as ||y \u2212 x|| \u2192 \u221e. This form\nof symmetric kernel was used in [9] to analyze the diffusion map. For the asymmetric part of the\nsimilarity kernel, we assume the form\n\n\u0001d/2\n\nr(x, y)\n\n\u00b7 (y \u2212 x) h(||y \u2212 x||2/\u0001)\n\n,\n\n2\n\n\u0001d/2\n\na\u0001(x, y) =\n\n(4)\nwith r(x, y) = r(y, x) so that a\u0001(x, y) = \u2212a\u0001(y, x). Here r(x, y) is a smooth vector \ufb01eld on the\nmanifold that gives an orientation to the asymmetry of the kernel k\u0001(x, y). It is worth noting that the\ndependence of r(x, y) on both x and y implies that r : M \u00d7 M \u2192 Rl with Rl the ambient space of\nM; however in the asymptotic limit, the dependence in y is only important \u201clocally\u201d (x = y), and\nas such it is appropriate to think of r(x, x) being a vector \ufb01eld on M. As a side note, it is worth\npointing out that even though the form of a\u0001(x, y) might seem restrictive at \ufb01rst, it is suf\ufb01ciently\nrich to describe any vector \ufb01eld . This can be seen by taking r(x, y) = (w(x) + w(y))/2 so that at\nx = y the resulting vector \ufb01eld is given by r(x, x) = w(x) for an arbitrary vector \ufb01eld w(x).\n\n3 Continuous Limit of Laplacian Type Operators\n\nWe are now ready to state the main asymptotic result.\nProposition 3.1 Let M be a compact, closed, smooth manifold of dimension d and k\u0001(x, y) an\nasymmetric similarity kernel satisfying the conditions of section 2.2, then for any function f \u2208\nC 2(M), the integral operator based on k\u0001 has the asymptotic expansion\n\n(cid:90)\n\nk\u0001(x, y)f(y)dy = m0f(x) + \u0001g(f(x), x) + o(\u0001) ,\n\n(5)\n\n(6)\n\nM\n\nwhere\n\nand m0 =(cid:82)\n\ng(f(x), x) = m2\n2\n\nRd h(||u||2)du, m2 =(cid:82)\n\ni h(||u||2)du.\n\nRd u2\n\n3\n\n(\u03c9(x)f(x) + \u2206f(x) + r \u00b7 \u2207f(x) + f(x)\u2207 \u00b7 r + c(x)f(x))\n\n\fThe proof can be found in [8] along with the de\ufb01nition of \u03c9(x) and c(x) in (6). For now, it suf\ufb01ces\nto say that \u03c9(x) corresponds to an interaction between the symmetric kernel h\u0001 and the curvature of\nM and was \ufb01rst derived in [9]. Meanwhile, c(x) is a new term that originates from the interaction\nbetween h\u0001 and the component of r that is normal to M in the ambient space Rl. Proposition 3.1\nforeshadows a general fact about spectral embedding algorithms: in most cases, Laplacian operators\nconfound the effects of spatial proximity, sampling density and directional \ufb02ow due to the presence\nof the various terms above.\n\n3.1 Anisotropic Limit Operators\n\nProposition 3.1 above can be used to derive the limits of a variety of Laplacian type operators\nassociated with spectral embedding algorithms like [5, 6, 3]. Although we will focus primarily on\na few operators that give the most insight into the generative process and enable us to recover the\nmodel de\ufb01ned in Figure 1, we \ufb01rst present four distinct families of operators for completeness.\nThese operator families are inspired by the anisotropic family of operators that [9] introduced for\nundirected graphs, which make use of anisotropic kernels of the form:\n\n(x, y) = k\u0001(x, y)\n\u0001 (x)p\u03b1\np\u03b1\n\nas p(\u03b1)\n\n\u0001\n\n\u0001\n\nk(\u03b1)\n\u0001\n\n\u0001 (y) ,\n\n(x) = (cid:82)\n\nasymmetric p\u0001 or symmetric q\u0001 =(cid:82)\n\n(7)\nwith \u03b1 \u2208 [0, 1] where \u03b1 = 0 is the isotropic limit. To normalize the anisotropic kernels, we need\nto rede\ufb01ne the outdegrees distribution of k(\u03b1)\n(x, y)p(y)dy. From (7), four\nfamilies of diffusion processes of the form ft = H (\u03b1)[f](x) can be derived depending on which\nkernel is normalized and which outdegree distribution is used for the normalization. Speci\ufb01cally,\nwe de\ufb01ne transport operators by normalizing the asymmetric k(\u03b1)\nkernels with the\nM h\u0001(x, y)p(y)dy outdegree distribution1. To keep track of all\noptions, we introduce the following notation: the operators will be indexed by the type of kernel and\noutdegree distribution they correspond to (symmetric or asymmetric), with the \ufb01rst index identifying\nthe kernel and the second index identifying the outdegree distribution. For example, the family of\nanisotropic limit operators introduced by [9] is de\ufb01ned by normalizing the symmetric kernel by\nthe symmetric outdegree distribution, hence they will be denoted as H (\u03b1)\nss , with the superscript\ncorresponding to the anisotropic power \u03b1.\n\nor symmetric h(\u03b1)\n\nM k(\u03b1)\n\n\u0001\n\n\u0001\n\n\u0001\n\nProposition 3.2 With the above notation,\n\naa [f] = \u2206f \u2212 2 (1 \u2212 \u03b1)\u2207U\u00b7\u2207f + r\u00b7\u2207f\n(8)\nH (\u03b1)\nas [f] = \u2206f \u2212 2 (1 \u2212 \u03b1)\u2207U \u00b7 \u2207f \u2212 cf + (\u03b1 \u2212 1)(r \u00b7 \u2207U)f \u2212 (\u2207 \u00b7 r)f + r \u00b7 \u2207f (9)\nH (\u03b1)\nsa [f] = \u2206f \u2212 2 (1 \u2212 \u03b1)\u2207U \u00b7 \u2207f + (c + \u2207 \u00b7 r + (\u03b1 \u2212 1)r \u00b7 \u2207U)f\n(10)\nH (\u03b1)\nss [f] = \u2206f \u2212 2(1 \u2212 \u03b1)\u2207U \u00b7 \u2207f.\n(11)\nH (\u03b1)\n\n1\n\np + 2r \u00b7 \u2207p\n\np + 2\u2207 \u00b7 r + c)] + o(\u0001).\n\nThe proof of this proposition, which can be found in [8], follows from repeated application of\nProposition 3.1 to p(y) or q(y) and then to k\u03b1(x, y) or h\u03b1(x, y), as well as the fact that 1\n=\np\u03b1\np\u2212\u03b1 [1 \u2212 \u03b1\u0001(\u03c9 + \u2206p\n\u0001\nThus, if we use the asymmetric k\u0001 and p\u0001, we get H (\u03b1)\ntion (8). In general, H (\u03b1)\nembedding directed graphs with this operator problematic. Nevertheless, H (1)\ntant role in extracting the directionality of the sampling process.\nIf we use the symmetric kernel h\u0001 but the asymmetric outdegree distribution p\u0001, we get the family\nof operators H (\u03b1)\nsa , of which the WCut of [3] is a special case (\u03b1 = 0). If we reverse the above, i.e.\nuse k\u0001 and q\u0001, we obtain H (\u03b1)\n\naa , de\ufb01ned by the advected diffusion equa-\naa is not hermitian, so it commonly has complex eigenvectors. This makes\naa will play an impor-\n\nas . This turns out to be merely a combination of H (\u03b1)\n\naa and H (\u03b1)\nsa .\n\n1The reader may notice that there are in fact eight possible combinations of kernel and degree distribution,\nsince the anisotripic kernel (7) could also be de\ufb01ned using a symmetric or asymmetric outdegree distribution.\nHowever, there are only four distinct asymptotic results and they are all covered by using one kernel (symmetric\nor asymmetric) and one degree distribution (symmetric or asymmetric) throughout.\n\n4\n\n\fAlgorithm 1 Directed Embedding\n\nj=1 Si,j, Q = diag(q)\n\n2. qi \u2190(cid:80)n\ni \u2190(cid:80)n\n\nInput: Af\ufb01nity matrix Wi,j and embedding dimension m, (m \u2265 d)\n1. S \u2190 (W + W T )/2 (Steps 1\u20136 estimate the coordinates as in [11])\n3. V \u2190 Q\u22121SQ\u22121\n4. q(1)\n5. H (1)\n6. Compute the \u03a8 the n\u00d7 (m + 1) matrix with orthonormal columns containing the m + 1 largest\nss,n as well as the \u039b the (m + 1)\u00d7 (m + 1) diagonal matrix\nright eigenvector (by eigenvalue) of H (1)\nof eigenvalues. Eigenvectors 2 to m + 1 from \u03a8 are the m coordinates of the embedding.\n7. Compute \u03c0 the left eigenvector of H (1)\n\nss,n with eigenvalue 1. (Steps 7\u20138 estimate the density)\n\nj=1 Vi,j, Q(1) = diag(q(1))\n\nss,n \u2190 Q(1)\u22121\n\nV\n\ni=1 \u03c0i is the density distribution over the embedding.\n\nj=1 Wi,j, P = diag(p) (Steps 9\u201313 estimate the vector \ufb01eld r)\n\n8. \u03c0 \u2190 \u03c0/(cid:80)n\n9. pi \u2190(cid:80)n\ni \u2190(cid:80)n\n\nj=1 Ti,j, P (1) = diag(p(1))\n\n10. T \u2190 P \u22121W P \u22121\n11. p(1)\n12. H (1)\n13. R \u2190 (H (1)\ndirection of the corresponding coordinates of the embedding.\n\naa,n \u2190 P (1)\u22121\n\naa,n \u2212 H (1)\n\nT\nss,n)\u03a8/2. Columns 2 to m + 1 of R are the vector \ufb01eld components in the\n\nFinally, if we only consider the symmetric kernel h\u0001 and degree distribution q\u0001, we recover H (\u03b1)\nss , the\nanisotropic kernels of [9] for symmetric graphs. This operator for \u03b1 = 1 is shown to separate the\nmanifold from the probability distribution [11] and will be used as part of our recovery algorithm.\n\nIsolating the Vector Field r\n\n4\nOur aim is to esimate the manifold M, the density distribution p = e\u2212U , and the vector \ufb01eld r. The\n\ufb01rst two components of the data can be recovered from H (1)\nss as shown in [11] and summarized in\nAlgorithm 1.\nAt this juncture, one feature of generative process is missing: the vector \ufb01eld r. The natural approach\nfor recovering r is to isolate the linear operator r \u00b7 \u2207 from H (\u03b1)\n\naa by substracting H (\u03b1)\nss :\n\naa \u2212 H (\u03b1)\nH (\u03b1)\n\nss = r \u00b7 \u2207 .\n\n(12)\nThe advantage of recovering r in operator form as in (12) is that r \u00b7 \u2207 is coordinate free. In other\nwords, as long as the chosen embedding of M is diffeomorphic to M2, (12) can be used to express\nthe component of r that lies in the tangent space TM, which we denote by r||.\nSpeci\ufb01cally, let \u03a8 be a diffeomorphic embedding of M ; the component of r along coordinate \u03c8k is\nthen given by r \u00b7 \u2207\u03c8k = rk, and so, in general,\n\nr|| = r \u00b7 \u2207\u03a8 .\n\n(13)\nThe subtle point that only r|| is recovered from (13) follows from the fact that the operator r \u00b7 \u2207 is\nonly de\ufb01ned along M and hence any directional derivative is necessarily along TM.\nEquation (13) and the previous observations are the basis for Algorithm 1, which recovers the three\nimportant features of the generative process for an asymmetric graph with af\ufb01nity matrix W .\nA similar approach can be employed to recover c + \u2207 \u00b7 r, or simply \u2207 \u00b7 r if r has no component\nperpendicular to the tangent space TM (meaning that c \u2261 0). Recovering c + \u2207 \u00b7 r is achieved by\ntaking advantage of the fact that\n\n(H (1)\n\nsa \u2212 H (1)\n\nss ) = (c + \u2207 \u00b7 r) ,\n\n(14)\n\n2A diffeomorphic embedding is guaranteed by using the eigendecomposition of H (1)\nss .\n\n5\n\n\fwhich is a diagonal operator. Taking into account that for \ufb01nite n (H (1)\ndiagonal, using \u03c8n \u2261 1n (vector of ones), i.e. (H (1)\nempirically to be more stable than simply extracting the diagonal of (H (1)\n\nsa,n \u2212 H (1)\n\nss,n) is not perfectly\nss,n)[1n] = (cn +\u2207\u00b7 rn), has been found\n\nsa,n \u2212 H (1)\n\nsa,n \u2212 H (1)\n\nss,n).\n\n5 Experiments\n\nArti\ufb01cial Data For illustrative purposes, we begin by applying our method to an arti\ufb01cial example.\nWe use the planet Earth as a manifold with a topographic density distribution, where sampling\nprobability is proportional to elevation. We also consider two vector \ufb01elds: the \ufb01rst is parallel to the\nline of constant latitude and purely tangential to the sphere, while the second is parallel to the line\nof constant longitude with a component of the vector \ufb01eld perpendicular to the manifold. The true\nmodel with constant latitude vector \ufb01eld is shown in Figure 2, along with the estimated density and\nvector \ufb01eld projected on the true manifold (sphere).\n\nModel\n\nRecovered\n\nLatitudinal\n\nLongitudinal\n\n(a)\n\n(b)\n\nFigure 2:\n(a): Sphere with latitudinal vector \ufb01eld, i.e East-West asymmetry, with Wew > Wwe if node w\nlies to the West of node e. The graph nodes are sampled non-uniformly, with the topographic map of the world\nas sampling density. We sample n = 5000 nodes, and observe only the resulting W matrix, but not the node\nlocations. From W , our algorithm estimates the sample locations (geometry), the vector \ufb01eld (black arrows)\ngenerating the observed asymmetries, and the sampling distribution at each data point (colormap). (b) Vector\n\ufb01elds on a spherical region (blue), and their estimates (red): latitudinal vector \ufb01eld tangent to the manifold\n(left) and longitudinal vector \ufb01eld with component perpendicular to manifold tangent plane (right).\nBoth the estimated density and vector \ufb01eld agree with the true model, demonstrating that for arti\ufb01cial\ndata, the recovery algorithm 1 performs quite well. We note that the estimated density does not\nrecover all the details of the original density, even for large sample size (here n = 5000 with \u0001 =\n0.07). Meanwhile, the estimated vector \ufb01eld performs quite well even when the sampling is reduced\nto n = 500 with \u0001 = 0.1. This can be seen in Figure 2, b, where the true and estimated vector \ufb01elds\nare superimposed. Figure 2 also demonstrates how r \u00b7 \u2207 only recovers the tangential component of\nr. The estimated geometry is not shown on any of these \ufb01gures, since the success of the diffusion\nmap in recovering the geometry for such a simple manifold is already well established [2, 9].\nReal DataThe National Longitudinal Survey of Youth (NLSY) 1979 Cohort is a representative sam-\nple of young men and women in the United States who were followed from 1979 to 2000 [12, 13].\nThe aim here is to use this survey to obtain a representation of the job market as a diffusion process\nover a manifold.\nThe data set consists of a sample of 7,816 individual career sequences of length 64, listing the jobs\na particular individual held every quarter between the ages of 20 and 36. Each token in the sequence\nidenti\ufb01es a job. Each job corresponds to an industry \u00d7 occupation pair. There are 25 unique industry\nand 20 unique occupation indices. Out of the 500 possible pairings, approximately 450 occur in the\ndata, with only 213 occurring with suf\ufb01cient frequency to be included here. Thus, our graph G has\n213 nodes - the jobs - and our observations consist of 7,816 walks between the graph nodes.\nWe convert these walks to a directed graph with af\ufb01nity matrix W . Speci\ufb01cally, Wij represents the\nnumber of times a transition from job i to job j was observed (Note that this matrix is asymmetric,\n\n6\n\n\fi.e Wij (cid:54)= Wji). Normalizing each row i of W by its outdegree di gives P = diag(di)\u22121W , the\nnon-parametric maximum likelihood estimator for the Markov chain over G for the progression\nof career sequences. This Markov chain has as limit operator H (0)\naa , as the granularity of the job\nmarket increases along with the number of observations. Thus, in trying to recover the geometry,\ndistribution and vector \ufb01eld, we are actually interested in estimating the full advective effect of the\naa ; that is, we want to estimate r\u00b7\u2207\u2212 2\u2207U \u00b7\u2207 where we can use\ndiffusion process generated by H (0)\n\u22122\u2207U \u00b7 \u2207 = H (0)\n\nss to complement Algorithm 1.\n\nss \u2212 H (1)\n\n(a)\n\n(b)\n\nFigure 3: Embedding the job market along with \ufb01eld r \u2212 2\u2207U over the \ufb01rst two non-constant eigenvectors.\nThe color map corresponds to the mean monthly wage in dollars (a) and to the female proportion (b) for each\njob.\nWe obtain an embedding of the job market that describes the relative position of jobs, their distri-\nbution, and the natural time progression from each job. Of these, the relative position and natural\ntime progression are the most interesting. Together, they summarize the job market dynamics by\ndescribing which jobs are naturally \u201cclose\u201d as well as where they can lead in the future. From a\npublic policy perspective, this can potentially improve focus on certain jobs for helping individuals\nattain better upward mobility.\nThe job market was found to be a high dimensional manifold. We present only the \ufb01rst two dimen-\nsions, that is, the second and third eigenvectors of H (0)\nss , since the \ufb01rst eigenvector is uninformative\n(constant) by construction. The eigenvectors showed correlation with important demographic data,\nsuch as wages and gender. Figure 3 displays this two-dimensional sub-embedding along with the\ndirectional information r \u2212 2\u2207U for each dimension. The plot shows very little net progression\ntoward regions of increasing mean salary3. This is somewhat surprising, but it is easy to overstate\nthis observation: diffusion alone would be enough to move the individuals towards higher salary.\nWhat Figure 3 (a) suggests is that there appear to be no \u201cexternal forces\u201d advecting individuals to-\nwards higher salary. Nevertheless, there appear to be other external forces at play in the job market:\nFigure 3 (b), which is analogous to Figure 3 (a), but with gender replacing the salary color scheme,\nsuggests that these forces push individuals towards greater gender differentiation. This is especially\ntrue amongst male-dominated jobs which appear to be advected toward the left edge of the embed-\nding. Hence, this simple analysis of the job market can be seen as an indication that males and\nfemales tend to move away from each other over time, while neither seems to have a monopoly on\nhigh- or low- paying jobs.\n\n6 Discussion\n\nThis paper makes three contributions: (1) it introduces a manifold-based generative model for di-\nrected graphs with weighted edges, (2) it obtains asymptotic results for operators constructed from\nthe directed graphs, and (3) these asymptotic results lead to a natural algorithm for estimating the\nmodel.\n\n3It is worth noting that in the NLSY data set, high paying jobs are teacher, nurse and mechanic. This is due\n\nto the fact that the career paths observed stop at at age 36, which is relatively early in an individual\u2019s career.\n\n7\n\n\u22120.1\u22120.0500.050.10.150.2\u22120.25\u22120.2\u22120.15\u22120.1\u22120.0500.050.10.150.20.25 !2!3800100012001400160018002000\u22120.1\u22120.0500.050.10.150.2\u22120.25\u22120.2\u22120.15\u22120.1\u22120.0500.050.10.150.20.25 !2!30.10.20.30.40.50.60.70.80.9\fGenerative Models that assume that data are sampled from a manifold are standard for undirected\ngraphs, but to our knowledge, none have yet been proposed for directed graphs. When W is sym-\nmetric, it is natural to assume that it depends on the points\u2019 proximity. For asymmetric af\ufb01nities W ,\none must include an additional component to explain the asymmetry. In the asymptotic limit, this is\ntantamount to de\ufb01ning a vector \ufb01eld on the manifold.\nAlgorithm We have used from [9] the idea of de\ufb01ning anisotropic kernels (indexed by \u03b1) in order to\nseparate the density p and the manifold geometry M. Also, we adopted their general assumptions\nabout the symmetric part of the kernel. As a consequence, the recovery algorithm for p and M is\nidentical to theirs.\nHowever, insofar as the asymmetric part of the kernel is concerned, everything, starting from the\nde\ufb01nition and the introduction of the vector \ufb01eld r as a way to model the asymmetry, through the\nderivation of the asymptotic expression for the symmetric plus asymmetric kernel, is new. We go\nsigni\ufb01cantly beyond the elegant idea of [9] regarding the use of anisotropic kernels by analyzing the\nfour distinct renormalizations possible for a given \u03b1, each of them combining different aspects of\nM, p and r. Only the successful (and novel) combination of two different anisotropic operators is\nable to recover the directional \ufb02ow r.\nAlgorithm 1 is natural, but we do not claim it is the only possible one in the context of our model.\nsa to recover the operator \u2207 \u00b7 r (which empirically seems to have\nFor instance, we can also use H (\u03b1)\nworse numerical properties than r \u00b7 \u2207). In the National Longitudinal Survery of Youth study, we\nwere interested in the whole advective term, so we estimated it from a different combination of\noperators. Depending on the speci\ufb01c question, other features of the model could be obtained\nLimit Results Proposition 3.1 is a general result on the asymptotics of asymmetric kernels. Re-\ncovering the manifold and r is just one, albeit the most useful, of the many ways of exploiting these\nresults. For instance, H (0)\nsa is the limit operator of the operators used in [3] and [5]. The limit analysis\ncould be extended to other digraph embedding algorithms such as [4, 6].\nHow general is our model? Any kernel can be decomposed into a symmetric and an asymmetric\npart, as we have done. The assumptions on the symmetric part h are standard. The paper of [7] goes\none step further from these assumptions; we will discuss it in relationship with our work shortly.\nThe more interesting question is how limiting are our assumptions regarding the choice of kernel,\nespecially the asymmetric part, which we parameterized as a\u0001(x, y) = r/2 \u00b7 (y \u2212 x)h\u0001(x, y) in (4).\nIn the asymptotic limit, this choice turns out to be fully general, at least up to the identi\ufb01able aspects\nof the model. For a more detailed discussion of this issue, see [8].\nIn [7], Ting, Huang and Jordan presented asymptotic results for a general family of kernels that\nincludes asymmetric and random kernels. Our k\u0001 can be expressed in the notation of [7] by taking\nwx(y) \u2190 1+r(x, y)\u00b7(y\u2212x), rx(y) \u2190 1, K0 \u2190 h, h \u2190 \u0001. Their assumptions are more general than\nthe assumptions we make here, yet our model is general up to what can be identi\ufb01ed from G alone.\nThe distinction arises because [7] focuses on the graph construction methods from an observed\nsample of M, while we focus on explaining an observed directed graph G through a manifold\ngenerative process. Moreover, while the [7] results can be used to analyze data from directed graphs,\nthey differ from our Proposition 3.1. Speci\ufb01cally, with respect to the limit in Theorem 3 from\n[7], we obtain the additional source terms f(x)\u2207 \u00b7 r and c(x)f(x) that follow from not enforcing\nconservation of mass while de\ufb01ning operators H (\u03b1)\nWe applied our theory of directed graph embedding to the analysis of the career sequences in\nSection 5, but asymmetric af\ufb01nity data abound in other social contexts, and in the physical and\nlife sciences.\nIndeed, any \u201csimilarity\u201d score that is obtained from a likelihood of the form\nWvu =likelihood(u|v) is generally asymmetric. Hence our methods can be applied to study not\nonly social networks, but also patterns of human movement, road traf\ufb01c, and trade relations, as well\nas alignment scores in molecular biology. Finally, the physical interpretation of our model also\nmakes it naturally applicable to physical models of \ufb02ows.\n\nsa and H (\u03b1)\nas .\n\nAcknowledgments\n\nThis research was partially supported by NSW awards IIS-0313339 and IIS-0535100.\n\n8\n\n\fReferences\n[1] Belkin and Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural\n\nComputation, 15:1373\u20131396, 2002.\n\n[2] Nadler, Lafon, and Coifman. Diffusion maps, spectral clustering and eigenfunctions of fokker-planck\n\noperators. In Neural Information Processing Systems Conference, 2006.\n\n[3] Meila and Pentney. Clustering by weighted cuts in directed graphs. In SIAM Data Mining Conference,\n\n2007.\n\n[4] Zhou, Huang, and Scholkopf. Learning from labeled and unlabeled data on a directed graph. In Interna-\n\ntional Conference on Machine Learning, pages 1041\u20131048, 2005.\n\n[5] Zhou, Schlkopf, and Hofmann. Semi-supervised learning on directed graphs.\n\nInformation Processing Systems, volume 17, pages 1633\u20131640, 2005.\n\nIn Advances in Neural\n\n[6] Fan R. K. Chung. The diameter and laplacian eigenvalues of directed graphs. Electr. J. Comb., 13, 2006.\n[7] Ting, Huang, and Jordan. An analysis of the convergence of graph Laplacians. In International Confer-\n\nence on Machine Learning, 2010.\n\n[8] Dominique Perrault-Joncas and Marina Meil\u02d8a. Directed graph embedding: an algorithm based on contin-\nuous limits of laplacian-type operators. Technical Report TR 587, University of Washington - Department\nof Statistics, November 2011.\n\n[9] Coifman and Lafon. Diffusion maps. Applied and Computational Harmonic Analysis, 21:6\u201330, 2006.\n[10] Mikhail Belkin and Partha Niyogi. Convergence of laplacian eigenmaps. preprint, short version NIPS\n\n2008, 2008.\n\n[11] Coifman, Lafon, Lee, Maggioni, Warner, and Zucker. Geometric diffusions as a tool for harmonic analysis\nand structure de\ufb01nition of data: Diffusion maps. In Proceedings of the National Academy of Sciences,\npages 7426\u20137431, 2005.\n\n[12] United States Department of Labor.\n\nhttp://www.bls.gov/nls/, retrived October 2011.\n\nNational\n\nlongitudinal\n\nsurvey of youth 1979 cohort.\n\n[13] Marc A. Scott. Af\ufb01nity models for career sequences. Journal of the Royal Statistical Society: Series C\n\n(Applied Statistics), 60(3):417\u2013436, 2011.\n\n9\n\n\f", "award": [], "sourceid": 611, "authors": [{"given_name": "Dominique", "family_name": "Perrault-joncas", "institution": null}, {"given_name": "Marina", "family_name": "Meila", "institution": null}]}