{"title": "Stochastic blockmodel approximation of a graphon: Theory and consistent estimation", "book": "Advances in Neural Information Processing Systems", "page_first": 692, "page_last": 700, "abstract": "Given a convergent sequence of graphs, there exists a limit object called the graphon from which random graphs are generated. This nonparametric perspective of random graphs opens the door to study graphs beyond the traditional parametric models, but at the same time also poses the challenging question of how to estimate the graphon underlying observed graphs. In this paper, we propose a computationally efficient algorithm to estimate a graphon from a set of observed graphs generated from it. We show that, by approximating the graphon with stochastic block models, the graphon can be consistently estimated, that is, the estimation error vanishes as the size of the graph approaches infinity.", "full_text": "Stochastic blockmodel approximation of a graphon:\n\nTheory and consistent estimation\n\nEdoardo M. Airoldi\n\nDept. Statistics\n\nHarvard University\n\nThiago B. Costa\n\nStanley H. Chan\n\nSEAS, and Dept. Statistics\n\nSEAS, and Dept. Statistics\n\nHarvard University\n\nHarvard University\n\nAbstract\n\nNon-parametric approaches for analyzing network data based on exchangeable\ngraph models (ExGM) have recently gained interest. The key object that de\ufb01nes\nan ExGM is often referred to as a graphon. This non-parametric perspective on\nnetwork modeling poses challenging questions on how to make inference on the\ngraphon underlying observed network data. In this paper, we propose a computa-\ntionally ef\ufb01cient procedure to estimate a graphon from a set of observed networks\ngenerated from it. This procedure is based on a stochastic blockmodel approxi-\nmation (SBA) of the graphon. We show that, by approximating the graphon with\na stochastic block model, the graphon can be consistently estimated, that is, the\nestimation error vanishes as the size of the graph approaches in\ufb01nity.\n\n1 Introduction\n\nRevealing hidden structures of a graph is the heart of many data analysis problems. From the well-\nknown small-world network to the recent large-scale data collected from online service providers\nsuch as Wikipedia, Twitter and Facebook, there is always a momentum in seeking better and more\ninformative representations of the graphs [1, 14, 29, 3, 26, 12]. In this paper, we develop a new com-\nputational tool to study one type of non-parametric representations which recently draws signi\ufb01cant\nattentions from the community [4, 19, 5, 30, 23].\n\nThe root of the non-parametric model discussed in this paper is in the theory of exchangeable ran-\ndom arrays [2, 15, 16], and it is presented in [11] as a link connecting de Finetti\u2019s work on partial\nexchangeability and graph limits [20, 6]. In a nutshell, the theory predicts that every convergent\nsequence of graphs (Gn) has a limit object that preserves many local and global properties of the\ngraphs in the sequence. This limit object, which is called a graphon, can be represented by mea-\nsurable functions w : [0, 1]2 \u2192 [0, 1], in a way that any w\u2032 obtained from measure preserving\ntransformations of w describes the same graphon.\nGraphons are usually seen as kernel functions for random network models [18]. To construct an\nn-vertex random graph G(n, w) for a given w, we \ufb01rst assign a random label ui \u223c Uniform[0, 1] to\neach vertex i \u2208 {1, . . . , n}, and connect any two vertices i and j with probability w(ui, uj), i.e.,\n\nPr (G[i, j] = 1 | ui, uj) = w(ui, uj),\n\ni, j = 1, . . . , n,\n\n(1)\n\nwhere G[i, j] denotes the (i, j)th entry of the adjacency matrix representing a particular realization\nof G(n, w) (See Figure 1). As an example, we note that the stochastic block-model is the case where\nw(x, y) is a piecewise constant function.\nThe problem of interest is de\ufb01ned as follows: Given a sequence of 2T observed directed graphs\n\nG1, . . . , G2T , can we make an estimate bw of w, such that bw \u2192 w with high probability as n \u2192 \u221e?\n\nThis question has been loosely attempted in the literature, but none of which has a complete solution.\nFor example, Lloyd et al. [19] proposed a Bayesian estimator without a consistency proof; Choi and\n\n1\n\n\fw\n\nw(ui, uj)\n\nG2T\n\n\u00d7\n\nuj\n\nui\n\n(ui, uj)\n\nG1\n\n[Left] Given a graphon w :\n\nsamples ui, uj from\nFigure 1:\nUniform[0,1] and assign Gt[i, j] = 1 with probability w(ui, uj), for t = 1, . . . , 2T .\n[Middle]\nHeat map of a graphon w. [Right] A random graph generated by the graphon shown in the middle.\nRows and columns of the graph are ordered by increasing ui, instead of i for better visualization.\n\n[0, 1]2 \u2192 [0, 1], we draw i.i.d.\n\nWolfe [9] studied the consistency properties, but did not provide algorithms to estimate the graphon.\nTo the best of our knowledge, the only method that estimates graphons consistently, besides ours, is\nUSVT [8]. However, our algorithm has better complexity and outperforms USVT in our simulations.\nMore recently, other groups have begun exploring approaches related to ours [28, 24].\n\nThe proposed approximation procedure requires w to be piecewise Lipschitz. The basic idea is to\n\napproximate w by a two-dimensional step function bw with diminishing intervals as n increases.The\n\nproposed method is called the stochastic blockmodel approximation (SBA) algorithm, as the idea of\nusing a two-dimensional step function for approximation is equivalent to using the stochastic block\nmodels [10, 22, 13, 7, 25]. The SBA algorithm is de\ufb01ned up to permutations of the nodes, so the\nestimated graphon is not canonical. However, this does not affect the consistency properties of the\nSBA algorithm, as the consistency is measured w.r.t. the graphon that generates the graphs.\n\n2 Stochastic blockmodel approximation: Procedure\n\nIn this section we present the proposed SBA algorithm and discuss its basic properties.\n\n2.1 Assumptions on graphons\n\nWe assume that w is piecewise Lipschitz, i.e., there exists a sequence of non-overlaping intervals\nIk = [\u03b1k\u22121, \u03b1k] de\ufb01ned by 0 = \u03b10 < . . . < \u03b1K = 1, and a constant L > 0 such that, for any\n(x1, y1) and (x2, y2) \u2208 Iij = Ii \u00d7 Ij,\n\n|w(x1, y1) \u2212 w(x2, y2)| \u2264 L (|x1 \u2212 x2| + |y1 \u2212 y2|) .\n\nFor generality we assume w to be asymmetric i.e., w(u, v) 6= w(v, u), so that symmetric graphons\ncan be considered as a special case. Consequently, a random graph G(n, w) generated by w is\ndirected, i.e., G[i, j] 6= G[j, i].\n2.2 Similarity of graphon slices\n\nThe intuition of the proposed SBA algorithm is that if the graphon is smooth, neighboring cross-\nsections of the graphon should be similar. In other words, if two labels ui and uj are close i.e.,\n|ui \u2212 uj| \u2248 0, then the difference between the row slices |w(ui,\u00b7)\u2212 w(uj,\u00b7)| and the column slices\n|w(\u00b7, ui) \u2212 w(\u00b7, uj)| should also be small. To measure the similarity between two labels using the\ngraphon slices, we de\ufb01ne the following distance\n\ndij =\n\n1\n\n2(cid:18)Z 1\n\n0\n\n[w(x, ui) \u2212 w(x, uj )]2 dx +Z 1\n\n0\n\n[w(ui, y) \u2212 w(uj , y)]2 dy(cid:19) .\n\n(2)\n\n2\n\n \n \n\fThus, dij is small only if both row and column slices of the graphon are similar.\nThe usage of dij for graphon estimation will be discussed in the next subsection. But before\nwe proceed, it should be noted that in practice dij has to be estimated from the observed graphs\n\n1\n\n1\n\nij =\n\nij =\n\nw(x, ui)w(x, uj )dx\n\nand\n\nw(ui, y)w(uj , y)dy,\n\ncij =Z 1\n\n0\n\nrij =Z 1\n\n0\n\ntors can be easily obtained. To this end, we let\n\nGt2 [k, j]\uf8f6\uf8f8 ,\nGt2 [j, k]\uf8f6\uf8f8 .\n\nand express dij as dij = 1\nwe consider the following estimators for cij and rij:\n\nG1, . . . , G2T . To derive an estimator bdij of dij, it is helpful to express dij in a way that the estima-\n2h(cii\u2212cij\u2212cji +cjj )+(rii\u2212rij\u2212rji +rjj )i. Inspecting this expression,\nT 2\uf8eb\uf8ed X1\u2264t1\u2264T\nbck\nT 2\uf8eb\uf8ed X1\u2264t1\u2264T\nbrk\n2\" 1\nS Xk\u2208S(cid:8)(cid:0)brk\nii \u2212brk\n\nrespectively. Summing all possible k\u2019s yields an estimator bdij that looks similar to dij:\njj(cid:1)(cid:9)# ,\nwhere S = {1, . . . , n}\\{i, j} is the set of summation indices.\nThe motivation of de\ufb01ning the estimators in (3) and (4) is that a row of the adjacency matrix G[i,\u00b7]\nis fully characterized by the corresponding row of the graphon w(ui,\u00b7). Thus the expected value of\nT (cid:16)P1\u2264t1\u2264T Gt1 [i,\u00b7](cid:17) is w(ui,\u00b7), and hence 1\nij is an estimator for rij. To theoretically\njustify this intuition, we will show in Section 3 that bdij is indeed a good estimator: it is not only\nunbiased, but is also concentrated round dij for large n. Furthermore, we will show that it is possible\nto use a random subset of S instead of {1, . . . , n}\\{i, j} to achieve the same asymptotic behavior.\nAs a result, the estimation of dij can be performed locally in a neighborhood of i and j, instead of\nall n vertices.\n\nGt1 [k, i]\uf8f6\uf8f8\uf8eb\uf8ed XT 0. If bdip,iv < \u22062, then we assign iv to the same block as ip. Therefore,\nafter scanning through \u2126 once, a block bB1 = {ip, iv1, iv2 , . . .} will be de\ufb01ned. By updating \u2126 as\n\u2126 \u2190 \u2126\\bB1, the process repeats until \u2126 = \u2205.\n\n3\n\n\fThe proposed greedy algorithm is only a local solution in a sense that it does not return the globally\noptimal clusters. However, as will be shown in Section 3, although the clustering algorithm is not\n\nglobally optimal, the estimated graphon bw is still guaranteed to be a consistent estimate of the true\n\ngraphon w as n \u2192 \u221e. Since the greedy algorithm is numerically ef\ufb01cient, it serves as a practical\ncomputational tool to estimate w.\n\n2.4 Main algorithm\n\nAlgorithm 1 Stochastic blockmodel approximation\n\nInput: A set of observed graphs G1, . . . , G2T and the precision parameter \u2206.\n\nOutput: Estimated stochastic blocks bB1, . . . , bBK.\nInitialize: \u2126 = {1, . . . , n}, and k = 1.\nwhile \u2126 6= \u2205 do\nRandomly choose a vertex ip from \u2126 and assign it as the pivot for bBk: bBk \u2190 ip.\nfor Every other vertices iv \u2208 \u2126\\{ip} do\nCompute the distance estimate bdip,iv .\nIf bdip,iv \u2264 \u22062, then assign iv as a member of bBk: bBk \u2190 iv.\nUpdate \u2126: \u2126 \u2190 \u2126\\bBk.\nUpdate counter: k \u2190 k + 1.\n\nend while\n\nend for\n\nAlgorithm 1 illustrates the pseudo-code for the proposed stochastic block-model approximation.\nThe complexity of this algorithm is O(T SKn), where T is half the number of observations, S is\nthe size of the neighborhood, K is the number of blocks and n is number of vertices of the graph.\n\n3 Stochastic blockmodel approximation: Theory of estimation\n\nIn this section we present the theoretical aspects of the proposed SBA algorithm. We will \ufb01rst\n\ndiscuss the properties of the estimator bdij, and then show the consistency of the estimated graphon\nbw. Details of the proofs can be found in the supplementary material.\n3.1 Concentration analysis of bdij\nOur \ufb01rst theorem below shows that the proposed estimator bdij is both unbiased, and is concentrated\nTheorem 1. The estimator bdij for dij is unbiased, i.e., E[bdij ] = dij. Further, for any \u01eb > 0,\n\naround its expected value dij.\n\n32/T +8\u01eb/3 ,\n\n(7)\n\nwhere S is the size of the neighborhood S, and 2T is the number of observations.\nProof. Here we only highlight the important steps to present the intuition. The basic idea of the\nij and show that it is unbiased. To this end, we use the\n\nproof is to zoom-in a microscopic term ofbrk\n\nfact that Gt1 [i, k] and Gt2 [j, k] are conditionally independent on uk to show\nE[Gt1 [i, k]Gt2[j, k] | uk] = Pr[Gt1 [i, k] = 1, Gt2[j, k] = 1 | uk]\n\nPrh(cid:12)(cid:12)(cid:12)bdij \u2212 dij(cid:12)(cid:12)(cid:12) > \u01ebi \u2264 8e\u2212 S\u01eb2\n\n(a)\n\n= Pr[Gt1 [i, k] = 1 | uk] Pr[Gt2 [j, k] = 1 | uk]\n= w(ui, uk)w(uj , uk),\n\nij ] =\nij | uk]] = rij. The concentration inequality follows from a similar idea to bound the variance\nij and apply Bernstein\u2019s inequality.\n\nij | uk] = w(ui, uk)w(uj , uk), and by iterated expectation we have E[brk\n\nwhich then implies E[brk\nE[E[brk\nofbrk\n\n4\n\n\fThat Gt1[i, k] and Gt2[j, k] are conditionally independent on uk is a critical fact for the success of\nthe proposed algorithm. It also explains why at least 2 independently observed graphs are necessary,\nfor otherwise we cannot separate the probability in the second equality above marked with (a).\n\n3.2 Choosing the number of blocks\n\nThe performance of the Algorithm 1 is sensitive to the number of blocks it de\ufb01nes. On the one hand,\nit is desirable to have more blocks so that the graphon can be \ufb01nely approximated. But on the other\nhand, if the number of blocks is too large then each block will contain only few vertices. This is bad\nbecause in order to estimate the value on each block, a suf\ufb01cient number of vertices in each block is\nrequired. The trade-off between these two cases is controlled by the precision parameter \u2206: a large\n\u2206 generates few large clusters, while small \u2206 generates many small clusters. A precise relationship\nbetween the \u2206 and K, the number of blocks generated the algorithm, is given in Theorem 2.\nTheorem 2. Let \u2206 be the accuracy parameter and K be the number of blocks estimated by Algo-\nrithm 1, then\n\nPr\"K >\n\n\u2206 # \u2264 8n2e\u2212\nQL\u221a2\n\nS\u22064\n\n128/T +16\u22062/3 ,\n\n(8)\n\nwhere L is the Lipschitz constant and Q is the number of Lipschitz blocks in w.\n\nIn practice, we estimate \u2206 using a cross-validation scheme to \ufb01nd the optimal 2D histogram bin\nwidth [27]. The idea is to test a sequence of potential values of \u2206 and seek the one that minimizes\nthe cross validation risk, de\ufb01ned as\n\n(9)\n\n2\n\nh(n \u2212 1) \u2212\n\nn + 1\nh(n \u2212 1)\n\nbJ(\u2206) =\n\nj ,\n\nKXj=1bp2\n\nfor a sequence of \u2206\u2019s do\n\nAlgorithm 2 Cross Validation\nInput: Graphs G1, . . . , G2T .\n\nwherebpj = |bBj|/n and h = 1/K. Algorithm 2 details the proposed cross-validation scheme.\nOutput: Blocks bB1, . . . , bBK, and optimal \u2206.\nEstimate blocks bB1, . . . , bBK from G1, . . . , G2T . [Algorithm 1]\nComputebpj = |bBj|/n, for j = 1, . . . , K.\nh(n\u22121)PK\nCompute bJ(\u2206) = 2\nj=1bp2\nPick the \u2206 with minimum bJ(\u2206), and the corresponding bB1, . . . , bBK.\n3.3 Consistency of bw\nThe goal of our next theorem is to show that bw is a consistent estimate of w, i.e., bw \u2192 w as n \u2192 \u221e.\n\nTo begin with, let us \ufb01rst recall two commonly used metric:\nDe\ufb01nition 1. The mean squared error (MSE) and mean absolute error (MAE) are de\ufb01ned as\n\nh(n\u22121) \u2212 n+1\n\nj , with h = 1/K.\n\nend for\n\n1\nn2\n\nMSE(bw) =\nMAE(bw) =\n\nnXiv =1\nnXjv =1\n(w(uiv , ujv ) \u2212 bw(uiv , ujv ))2\nnXiv =1\nnXjv =1\n|w(uiv , ujv ) \u2212 bw(uiv , ujv )| .\nTheorem 3. If S \u2208 \u0398(n) and \u2206 \u2208 \u03c9(cid:18)(cid:16) log(n)\n4(cid:19) \u2229 o(1), then\nn (cid:17) 1\nE[MAE(bw)] = 0\n\nE[MSE(bw)] = 0.\n\nlim\nn\u2192\u221e\n\nlim\nn\u2192\u221e\n\n1\nn2\n\nand\n\n5\n\n\fProof. The details of the proof can be found in the supplementary material . Here we only outline\nthe key steps to present the intuition of the theorem. The goal of Theorem 3 is to show convergence\n\n1\n\n1\n\n1\n2T\n\nw(ui, uj) =\n\nw(uix , ujx),\n\nbw(ui, uj) =\n\n(G1[ix, jy] + G2[ix, jy] + . . . + G2T [ix, jy]) ,\n\nof |bw(ui, uj) \u2212 w(ui, uj)|. The idea is to consider the following two quantities:\n\n|bBi||bBj| Xix\u2208 bBi Xjx\u2208 bBj\n|bBi||bBj| Xix\u2208 bBi Xjy\u2208 bBj\nso that if we can bound |w(ui, uj) \u2212 w(ui, uj)| and |w(ui, uj) \u2212 bw(ui, uj)|, then consequently\n|bw(ui, uj) \u2212 w(ui, uj)| can also be bounded.\nThe bound for the \ufb01rst term |w(ui, uj) \u2212 w(ui, uj)| is shown in Lemma 1: By Algorithm 1, any\nvertex iv \u2208 bBi is guaranteed to be within a distance \u2206 from the pivot of bBi. Since w(ui, uj) is an\naverage over bBi and bBj, by Theorem 1 a probability bound involving \u2206 can be obtained.\nThe bound for the second term |w(ui, uj)\u2212bw(ui, uj)| is shown in Lemma 2. Different from Lemma\n\n1, here we need to consider two possible situations: either the intermediate estimate w(ui, uj) is\nclose to the ground truth w(ui, uj), or w(ui, uj) is far from the ground truth w(ui, uj). This ac-\ncounts for the sum in Lemma 2. Individual bounds are derived based on Lemma 1 and Theorem 1.\n\nCombining Lemma 1 and Lemma 2, we can then bound the error and show convergence.\n\nLemma 1. For any iv \u2208 bBi and jv \u2208 bBj,\nPrh|w(ui, uj) \u2212 w(uiv , ujv )| > 8\u22061/2L1/4i \u2264 32|bBi||bBj|e\u2212\nLemma 2. For any iv \u2208 bBi and jv \u2208 bBj,\nPrh|bwij \u2212 wij| > 8\u22061/2L1/4i \u2264 2e\u2212256(T| bBi| | bBj|\u221aL\u2206) + 32|bBi|2|bBj|2e\u2212\nThe condition S \u2208 \u0398(n) is necessary to make Theorem 3 valid, because if S is independent of n,\nthe right hand sides of (10) and (11) cannot approach 0 even if n \u2192 \u221e. The condition on \u2206 is also\nimportant as it forces the numerators and denominators in the exponentials of (10) and (11) to be\nwell behaved.\n\n32/T +8\u22062/3) .\n\n32/T +8\u22062/3 .\n\n(10)\n\n(11)\n\nS\u22064\n\nS\u22064\n\n4 Experiments\n\nIn this section we evaluate the proposed SBA algorithm by showing some empirical results. For the\npurpose of comparison, we consider (i) the universal singular value thresholding (USVT) [8]; (ii)\nthe largest-gap algorithm (LG) [7]; (iii) matrix completion from few entries (OptSpace) [17].\n\n4.1 Estimating stochastic blockmodels\n\nAccuracy as a function of growing graph size. Our \ufb01rst experiment is to evaluate the proposed\nSBA algorithm for estimating stochastic blockmodels. For this purpose, we generate (arbitrarily) a\ngraphon\n\n(12)\n\nw =\uf8ee\uf8ef\uf8f0\n\n0.8 0.9 0.4 0.5\n0.1 0.6 0.3 0.2\n0.3 0.2 0.8 0.3\n0.4 0.1 0.2 0.9\n\n\uf8f9\uf8fa\uf8fb ,\n\nwhich represents a piecewise constant function with 4 \u00d7 4 equi-space blocks.\nSince USVT and LG use only one observed graph whereas the proposed SBA require at least 2\nobservations, in order to make the comparison fair, we use half of the nodes for SBA by generating\ntwo independent n\n\n2 observed graphs. For USVT and LG, we use one n \u00d7 n observed graph.\n\nFigure 2(a) shows the asymptotic behavior of the algorithms when n grows. Figure 2(b) shows the\nestimation error of SBA algorithm as T grows for graphs of size 200 vertices.\n\n2 \u00d7 n\n\n6\n\n\f\u22120.5\n\n\u22121\n\n)\nE\nA\nM\n\n(\n0\n1\ng\no\nl\n\n\u22121.5\n\n\u22122\n\n\u22122.5\n\n \n\u22123\n0\n\n \n\nProposed\n\n \n\n\u22122\n\n\u22122.1\n\n\u22122.2\n\n\u22122.3\n\n\u22122.4\n\n\u22122.5\n\n\u22122.6\n\n\u22122.7\n\n\u22122.8\n\n\u22122.9\n\n)\nE\nA\nM\n\n(\n0\n1\ng\no\nl\n\nProposed\nLargest Gap\nOptSpace\nUSVT\n\n200\n\n400\n\nn\n\n600\n\n800\n\n1000\n\n(a) Growing graph size, n\n\n \n\u22123\n0\n\n5\n\n10\n\n15\n\n20\n2T\n\n25\n\n30\n\n35\n\n40\n\n(b) Growing no. observations, 2T\n\nFigure 2: (a) MAE reduces as graph size grows. For the fairness of the amount of data that can be\nused, we use n\n2 \u00d7 2 observations for SBA, and n \u00d7 n \u00d7 1 observation for USVT [8] and LG\n[7]. (b) MAE of the proposed SBA algorithm reduces when more observations T is available. Both\nplots are averaged over 100 independent trials.\n\n2 \u00d7 n\n\nAccuracy as a function of growing number of blocks. Our second experiment is to evaluate the\nperformance of the algorithms as K, the number of blocks, increases. To this end, we consider a\nsequence of K, and for each K we generate a graphon w of K \u00d7 K blocks. Each entry of the\nblock is a random number generated from Uniform[0, 1]. Same as the previous experiment, we \ufb01x\nn = 200 and T = 1. The experiment is repeated over 100 trials so that in every trial a different\ngraphon is generated. The result shown in Figure 3(a) indicates that while estimation error increases\nas K grows, the proposed SBA algorithm still attains the lowest MAE for all K.\n\n)\nE\nA\nM\n\n(\n0\n1\ng\no\nl\n\n\u22120.7\n\n\u22120.8\n\n\u22120.9\n\n\u22121\n\n\u22121.1\n\n\u22121.2\n\n\u22121.3\n\n \n\u22121.4\n0\n\n \n\nProposed\nLargest Gap\nUSVT\n\n15\n\n20\n\n5\n\n10\nK\n\n(a) Growing no. blocks, K\n\n)\nE\nA\nM\n\n(\n0\n1\ng\no\nl\n\n\u22120.6\n\n\u22120.7\n\n\u22120.8\n\n\u22120.9\n\n\u22121\n\n\u22121.1\n\n\u22121.2\n\n\u22121.3\n\n\u22121.4\n\n\u22121.5\n\n \n\u22121.6\n0\n\n \n\nProposed\nLargest Gap\nOptSpace\nUSVT\n\n5\n\n10\n\n15\n\n20\n\n% missing links\n\n(b) Missing links\n\nFigure 3: (a) As K increases, MAE of all three algorithm increases but SBA still attains the lowest\nMAE. Here, we use n\n2 \u00d7 2 observations for SBA, and n \u00d7 n \u00d7 1 observation for USVT [8] and\nLG [7]. (b) Estimation of graphon in the presence of missing links: As the amount of missing links\nincreases, estimation error also increases.\n\n2 \u00d7 n\n\n4.2 Estimation with missing edges\n\nOur next experiment is to evaluate the performance of proposed SBA algorithm when there are\nmissing edges in the observed graph. To model missing edges, we construct an n \u00d7 n binary matrix\nM with probability Pr[M [i, j] = 0] = \u03be, where 0 \u2264 \u03be \u2264 1 de\ufb01nes the percentage of missing\nedges. Given \u03be, 2T matrices are generated with missing edges, and the observed graphs are de\ufb01ned\nas M1 \u2299 G1, . . . , M2T \u2299 G2T , where \u2299 denotes the element-wise multiplication. The goal is to\n\nstudy how well SBA can reconstruct the graphon bw in the presence of missing links.\n\n7\n\n\fThe modi\ufb01cation of the proposed SBA algorithm for the case missing links is minimal: when com-\n\nputing (6), instead of averaging over all ix \u2208 bBi and jy \u2208 bBj, we only average ix \u2208 bBi and jy \u2208 bBj\n\nthat are not masked out by all M\u2032s. Figure 3(b) shows the result of average over 100 independent\ntrials. Here, we consider the graphon given in (12), with n = 200 and T = 1. It is evident that SBA\noutperforms its counterparts at a lower rate of missing links.\n\n4.3 Estimating continuous graphons\n\nOur \ufb01nal experiment is to evaluate the proposed SBA algorithm in estimating continuous graphons.\nHere, we consider two of the graphons reported in [8]:\n\nw1(u, v) =\n\n1\n\n1 + exp{\u221250(u2 + v2)}\n\n,\n\nand w2(u, v) = uv,\n\nwhere u, v \u2208 [0, 1]. Here, w2 can be considered as a special case of the Eigenmodel [13] or latent\nfeature relational model [21].\n\nThe results in Figure 4 shows that while both algorithms have improved estimates when n grows, the\nperformance depends on which of w1 and w2 that we are studying. This suggests that in practice the\nchoice of the algorithm should depend on the expected structure of the graphon to be estimated: If the\ngraph generated by the graphon demonstrates some low-rank properties, then USVT is likely to be\na better option. For more structured or complex graphons the proposed procedure is recommended.\n\n)\nE\nA\nM\n\n(\n0\n1\ng\no\nl\n\n\u22122.9\n\n\u22122.95\n\n\u22123\n\n\u22123.05\n\n\u22123.1\n\n\u22123.15\n\n \n\u22123.2\n0\n\n \n\nProposed\nUSVT\n\n200\n\n400\n\nn\n\n600\n\n800\n\n1000\n\n(a) graphon w1\n\n)\nE\nA\nM\n\n(\n0\n1\ng\no\nl\n\n\u22120.6\n\n\u22120.8\n\n\u22121\n\n\u22121.2\n\n\u22121.4\n\n\u22121.6\n\n\u22121.8\n\n \n\u22122\n0\n\n \n\nProposed\nUSVT\n\n200\n\n400\n\nn\n\n600\n\n800\n\n1000\n\n(b) graphon w2\n\nFigure 4: Comparison between SBA and USVT in estimating two continuous graphons w1 and w2.\nEvidently, SBA performs better for w1 (high-rank) and worse for w2 (low-rank).\n\n5 Concluding remarks\n\nWe presented a new computational tool for estimating graphons. The proposed algorithm approx-\nimates the continuous graphon by a stochastic block-model, in which the \ufb01rst step is to cluster\nthe unknown vertex labels into blocks by using an empirical estimate of the distance between two\ngraphon slices, and the second step is to build an empirical histogram to estimate the graphon. Com-\nplete consistency analysis of the algorithm is derived. The algorithm was evaluated experimentally,\nand we found that the algorithm is effective in estimating block structured graphons.\n\nImplementation of the SBA algorithm is available online at https://github.com/airoldilab/SBA.\nAcknowledgments. EMA is partially supported by NSF CAREER award IIS-1149662, ARO MURI\naward W911NF-11-1-0036, and an Alfred P. Sloan Research Fellowship. SHC is partially supported\nby a Croucher Foundation Post-Doctoral Research Fellowship.\n\nReferences\n[1] E.M. Airoldi, D.M. Blei, S.E. Fienberg, and E.P. Xing. Mixed-membership stochastic blockmodels.\n\nJournal of Machine Learning Research, 9:1981\u20132014, 2008.\n\n8\n\n\f[2] D.J. Aldous. Representations for partially exchangeable arrays of random variables. Journal of Multi-\n\nvariate Analysis, 11:581\u2013598, 1981.\n\n[3] H. Azari and E. M. Airoldi. Graphlet decomposition of a weighted network. Journal of Machine Learning\n\nResearch, W&CP, 22:54\u201363, 2012.\n\n[4] P.J. Bickel and A. Chen. A nonparametric view of network models and Newman-Girvan and other mod-\n\nularities. Proc. Natl. Acad. Sci. USA, 106:21068\u201321073, 2009.\n\n[5] P.J. Bickel, A. Chen, and E. Levina. The method of moments and degree distributions for network models.\n\nAnnals of Statistics, 39(5):2280\u20132301, 2011.\n\n[6] C. Borgs, J. Chayes, L. Lov\u00b4asz, V. T. S\u00b4os, B. Szegedy, and K. Vesztergombi. Graph limits and parameter\n\ntesting. In Proc. ACM Symposium on Theory of Computing, pages 261\u2013270, 2006.\n\n[7] A. Channarond, J. Daudin, and S. Robin. Classi\ufb01cation and estimation in the Stochastic Blockmodel\n\nbased on the empirical degrees. Electronic Journal of Statistics, 6:2574\u20132601, 2012.\n\n[8] S. Chatterjee. Matrix estimation by universal singular value thresholding. ArXiv:1212.1247. 2012.\n[9] D.S. Choi and P.J. Wolfe. Co-clustering separately exchangeable network data. ArXiv:1212.4093. 2012.\n[10] D.S. Choi, P.J. Wolfe, and E.M. Airoldi. Stochastic blockmodels with a growing number of classes.\n\nBiometrika, 99:273\u2013284, 2012.\n\n[11] P. Diaconis and S. Janson. Graph limits and exchangeable random graphs. Rendiconti di Matematica e\n\ndelle sue Applicazioni, Series VII, pages 33\u201361, 2008.\n\n[12] A. Goldenberg, A.X. Zheng, S.E. Fienberg, and E.M. Airoldi. A survey of statistical network models.\n\nFoundations and Trends in Machine Learning, 2:129\u2013233, 2009.\n\n[13] P.D. Hoff. Modeling homophily and stochastic equivalence in symmetric relational data.\n\nInformation Processing Systems (NIPS), volume 20, pages 657\u2013664, 2008.\n\nIn Neural\n\n[14] P.D. Hoff, A.E. Raftery, and M.S. Handcock. Latent space approaches to social network analysis. Journal\n\nof the American Statistical Association, 97(460):1090\u20131098, 2002.\n\n[15] D.N. Hoover. Relations on probability spaces and arrays of random variables. Preprint, Institute for\n\nAdvanced Study, Princeton, NJ, 1979.\n\n[16] O. Kallenberg. On the representation theorem for exchangeable arrays. Journal of Multivariate Analysis,\n\n30(1):137\u2013154, 1989.\n\n[17] R.H. Keshavan, A. Montanari, and S. Oh. Matrix completion from a few entries. IEEE Trans. Information\n\nTheory, 56:2980\u20132998, Jun. 2010.\n\n[18] N.D. Lawrence. Probabilistic non-linear principal component analysis with Gaussian process latent vari-\n\nable models. Journal of Machine Learning Research, 6:1783\u20131816, 2005.\n\n[19] J.R. Lloyd, P. Orbanz, Z. Ghahramani, and D.M. Roy. Random function priors for exchangeable arrays\nwith applications to graphs and relational data. In Neural Information Processing Systems (NIPS), 2012.\n[20] L. Lov\u00b4asz and B. Szegedy. Limits of dense graph sequences. Journal of Combinatorial Theory, Series B,\n\n96:933\u2013957, 2006.\n\n[21] K.T. Miller, T.L. Grif\ufb01ths, and M.I. Jordan. Nonparametric latent fature models for link prediction. In\n\nNeural Information Processing Systems (NIPS), 2009.\n\n[22] K. Nowicki and T.A. Snijders. Estimation and prediction of stochastic block structures. Journal of\n\nAmerican Statistical Association, 96:1077\u20131087, 2001.\n\n[23] P. Orbanz and D.M. Roy. Bayesian models of graphs, arrays and other exchangeable random structures,\n\n2013. Unpublished manuscript.\n\n[24] P.Latouche and S. Robin. Bayesian model averaging of stochastic block models to estimate the graphon\nfunction and motif frequencies in a w-graph model. ArXiv:1310.6150, October 2013. Unpublished\nmanuscript.\n\n[25] K. Rohe, S. Chatterjee, and B. Yu. Spectral clustering and the high-dimensional stochastic blockmodel.\n\nAnnals of Statistics, 39(4):1878\u20131915, 2011.\n\n[26] M. Tang, D.L. Sussman, and C.E. Priebe. Universally consistent vertex classi\ufb01cation for latent positions\n\ngraphs. Annals of Statistics, 2013. In press.\n\n[27] L. Wasserman. All of Nonparametric Statistics. Springer, 2005.\n[28] P.J. Wolfe and S.C. Olhede. Nonparametric graphon estimation. ArXiv:1309.5936, September 2013.\n\nUnpublished manuscript.\n\n[29] Z. Xu, F. Yan, and Y. Qi. In\ufb01nite Tucker decomposition: nonparametric Bayesian models for multiway\n\ndata analysis. In Proc. Intl. Conf. Machine Learning (ICML), 2012.\n\n[30] Y. Zhao, E. Levina, and J. Zhu. Community extraction for social networks. In Proc. Natl. Acad. Sci. USA,\n\nvolume 108, pages 7321\u20137326, 2011.\n\n9\n\n\f", "award": [], "sourceid": 398, "authors": [{"given_name": "Edo", "family_name": "Airoldi", "institution": "Harvard University"}, {"given_name": "Thiago", "family_name": "Costa", "institution": "Harvard University"}, {"given_name": "Stanley", "family_name": "Chan", "institution": "Harvard University"}]}