{"title": "Private Graphon Estimation for Sparse Graphs", "book": "Advances in Neural Information Processing Systems", "page_first": 1369, "page_last": 1377, "abstract": "We design algorithms for fitting a high-dimensional statistical model to a large, sparse network without revealing sensitive information of individual members.  Given a sparse input graph $G$, our algorithms output a node-differentially private nonparametric block model approximation.  By node-differentially private, we mean that our output hides the insertion or removal of a vertex and all its adjacent edges.  If $G$ is an instance of the network obtained from a generative nonparametric model defined in terms of a graphon $W$, our model guarantees consistency: as the number of vertices tends to infinity, the output of our algorithm converges to $W$ in an appropriate version of the $L_2$ norm. In particular, this means we can estimate the sizes of all multi-way cuts in $G$.  Our results hold as long as $W$ is bounded, the average degree of $G$ grows at least like the log of the number of vertices, and the number of blocks goes to infinity at an appropriate rate.  We give explicit error bounds in terms of the parameters of the model; in several settings, our bounds improve on or match known nonprivate results.", "full_text": "Private Graphon Estimation for Sparse Graphs\u2217\n\nChristian Borgs\n\nJennifer T. Chayes\n\nMicrosoft Research New England\n\nCambridge, MA, USA.\n\n{cborgs,jchayes}@microsoft.com\n\nAdam Smith\n\nPennsylvania State University\n\nUniversity Park, PA, USA.\n\nasmith@psu.edu\n\nAbstract\n\nWe design algorithms for \ufb01tting a high-dimensional statistical model to a large,\nsparse network without revealing sensitive information of individual members.\nGiven a sparse input graph G, our algorithms output a node-differentially private\nnonparametric block model approximation. By node-differentially private, we\nmean that our output hides the insertion or removal of a vertex and all its adjacent\nedges. If G is an instance of the network obtained from a generative nonparametric\nmodel de\ufb01ned in terms of a graphon W , our model guarantees consistency: as the\nnumber of vertices tends to in\ufb01nity, the output of our algorithm converges to W in\nan appropriate version of the L2 norm. In particular, this means we can estimate\nthe sizes of all multi-way cuts in G.\nOur results hold as long as W is bounded, the average degree of G grows at least\nlike the log of the number of vertices, and the number of blocks goes to in\ufb01nity\nat an appropriate rate. We give explicit error bounds in terms of the parameters of\nthe model; in several settings, our bounds improve on or match known nonprivate\nresults.\n\n1\n\nIntroduction\n\nDifferential Privacy. Social and communication networks have been the subject of intense study\nover the last few years. However, while these networks comprise a rich source of information for\nscience, they also contain highly sensitive private information. What kinds of information can we\nrelease about these networks while preserving the privacy of their users? Simple measures, such as\nremoving obvious identi\ufb01ers, do not work; for example, several studies reidenti\ufb01ed individuals in\nthe graph of a social network even after all vertex and edge attributes were removed. Such attacks\nhighlight the need for statistical and learning algorithms that provide rigorous privacy guarantees.\nDifferential privacy [17] provides meaningful guarantees in the presence of arbitrary side informa-\ntion. In the context of traditional statistical data sets, differential privacy is now well-developed.\nBy contrast, differential privacy in the context of graph data is much less developed. There are two\nmain variants of graph differential privacy: edge and node differential privacy. Intuitively, edge\ndifferential privacy ensures that an algorithm\u2019s output does not reveal the inclusion or removal of a\nparticular edge in the graph, while node differential privacy hides the inclusion or removal of a node\ntogether with all its adjacent edges. Edge privacy is a weaker notion (hence easier to achieve) and\nhas been studied more extensively. Several authors designed edge-differentially private algorithms\nfor \ufb01tting generative graph models (e.g. [24]; see the full version for further references), but these\ndo not appear to generalize to node privacy with meaningful accuracy guarantees.\nThe stronger notion, node privacy, corresponds more closely to what was achieved in the case of\ntraditional data sets, and to what one would want to protect an individual\u2019s data: it ensures that no\nmatter what an analyst observing the released information knows ahead of time, she learns the same\n\n\u2217A full version of this extended abstract is available at http://arxiv.org/abs/1506.06162\n\n1\n\n\fthings about an individual Alice regardless of whether Alice\u2019s data are used or not. In particular,\nno assumptions are needed on the way the individuals\u2019 data are generated (they need not even be\nindependent). Node privacy was studied more recently [21, 14, 6, 26], with a focus on on the release\nof descriptive statistics (such as the number of triangles in a graph). Unfortunately, differential\nprivacy\u2019s stringency makes the design of accurate, private algorithms challenging.\nIn this work, we provide the \ufb01rst algorithms for node-private inference of a high-dimensional statis-\ntical model that does not admit simple suf\ufb01cient statistics.\nModeling Large Graphs via Graphons. Traditionally, large graphs have been modeled using\nvarious parametric models, one of the most popular being the stochastic block model [20]. Here one\npostulates that an observed graph was generated by \ufb01rst assigning vertices at random to one of k\ngroups, and then connecting two vertices with a probability that depends on their assigned groups.\nAs the number of vertices of the graph in question grows, we do not expect the graph to be well\ndescribed by a stochastic block model with a \ufb01xed number of blocks. In this paper we consider\nnonparametric models (where the number of parameters need not be \ufb01xed or even \ufb01nite) given in\nterms of a graphon. A graphon is a measurable, bounded function W : [0, 1]2 \u2192 [0,\u221e) such that\n\nW (x, y) = W (y, x), which for convenience we take to be normalized:(cid:82) W = 1. Given a graphon,\n\nwe generate a graph on n vertices by \ufb01rst assigning i.i.d. uniform labels in [0, 1] to the vertices,\nand then connecting vertices with labels x, y with probability \u03c1nW (x, y), where \u03c1n is a parameter\ndetermining the density of the generated graph Gn with \u03c1n(cid:107)W(cid:107)\u221e \u2264 1. We call Gn a W -random\ngraph with target density \u03c1n (or simply a \u03c1nW -random graph).\nTo our knowledge, random graph models of the above form were \ufb01rst introduced under the name la-\ntent position graphs [19], and are special cases of a more general model of \u201cinhomogeneous random\ngraphs\u201d de\ufb01ned in [7], which is the \ufb01rst place were n-dependent target densities \u03c1n were considered.\nFor both dense graphs (whose target density does not depend on the number of vertices) and sparse\ngraphs (those for which \u03c1n \u2192 0 as n \u2192 \u221e), this model is related to the theory of convergent graph\nsequences, [8, 23, 9, 10] and [11, 12], respectively.\nEstimation and Identi\ufb01ability. Assuming that Gn is generated in this way, we are then faced with\nthe task of estimating W from a single observation of a graph Gn. To our knowledge, this task\nwas \ufb01rst explicitly considered in [4], which considered graphons describing stochastic block models\nwith a \ufb01xed number of blocks. This was generalized to models with a growing number of blocks\n[27, 15], while the \ufb01rst estimation of the nonparametric model was proposed in [5]. Most of the\nliterature on estimating the nonparametric model makes additional assumptions on the function W ,\nthe most common one being that after a measure-preserving transformation, the integral of W over\none variable is a strictly monotone function of the other, corresponding to an asymptotically strictly\nmonotone degree distribution of Gn. (This assumption is quite restrictive: in particular, such results\ndo not apply to graphons that represent block models.) For our purposes, the most relevant works\nare Wolfe and Olhede [28], Gao et al. [18], Chatterjee [13] and Abbe and Sandon [2] (as well as\nrecent work done concurrently with this research [22]), which provide consistent estimators without\nmonotonicity assumptions (see \u201cComparison to nonprivate bounds\u201d, below).\nOne issue that makes estimation of graphons challenging is identi\ufb01ability: multiple graphons can\nlead to the same distribution on Gn. Speci\ufb01cally, two graphons W and \u02dcW lead to the same distri-\nbution on W -random graphs if and only if there are measure preserving maps \u03c6, \u02dc\u03c6 : [0, 1] \u2192 [0, 1]\n\nsuch that W \u03c6 = (cid:102)W(cid:101)\u03c6, where W \u03c6 is de\ufb01ned by W (x, y) = W (\u03c6(x), \u03c6(y)) [16]. Hence, there is\n\nno \u201ccanonical graphon\u201d that an estimation procedure can output, but rather an equivalence class of\ngraphons. Some of the literature circumvents identi\ufb01ability by making strong additional assump-\ntions, such as strict monotonicity, that imply the existence of canonical equivalent class representa-\ntives. We make no such assumptions, but instead de\ufb01ne consistency in terms of a metric on these\nequivalence classes, rather than on graphons as functions. We use a variant of the L2 metric,\n\u03b42(W, W (cid:48)) =\n\n(cid:107)W \u03c6 \u2212 W (cid:48)(cid:107)2, where \u03c6 ranges over measure-preserving bijections.\n\n(1)\n\ninf\n\n\u03c6:[0,1]\u2192[0,1]\n\nOur Contributions. In this paper we construct an algorithm that produces an estimate \u02c6W from a\nsingle instance Gn of a W -random graph with target density \u03c1n (or simply \u03c1, when n is clear from\nthe context). We aim for several properties:\n\n2\n\n\f1. \u02c6W is differentially private;\n2. \u02c6W is consistent, in the sense that \u03b42(W, \u02c6W ) \u2192 0 in probability as n \u2192 \u221e;\n3. \u02c6W has a compact representation (in our case, as a matrix with o(n) entries);\n4. The procedure works for sparse graphs, that is, when the density \u03c1 is small;\n5. On input Gn, \u02c6W can be calculated ef\ufb01ciently.\n\nHere we give an estimation procedure that obeys the \ufb01rst four properties, leaving the question of\npolynomial-time algorithms for future work. Given an input graph Gn, a privacy-parameter \u0001 and a\ntarget number k of blocks, our algorithm A produces a k-block graphon \u02c6W = A(Gn) such that\n\n\u2022 A is \u0001-differentially node private. The privacy guarantee holds for all inputs, independent\n\n\u2022 If (1) W is an arbitrary graphon, normalized so(cid:82) W = 1, (2) the expected average degree\n\nof modeling assumptions.\n(n\u2212 1)\u03c1 grows at least as fast as log n, and (3) k goes to in\ufb01nity suf\ufb01ciently slowly with n,\nthen, when Gn is \u03c1W -random, the estimate \u02c6W for W is consistent (that is, \u03b42( \u02c6W , W ) \u2192 0,\nboth in probability and almost surely).\n\n\u2022 We give a nonprivate variant of A that converges assuming only \u03c9(1) average degree.\n\nCombined with the general theory of convergent graphs sequences, these results in particular give\na node-private procedure for estimating the edge density of all cuts in a \u03c1W -random graph,see\nSection 2.2 below.\nThe main idea of our algorithm is to use the exponential mechanism of [25] to select a block model\nwhich approximately minimizes the (cid:96)2 distance to the observed adjacency matrix of G, under the\nbest possible assignment of nodes to blocks (this explicit search over assignments makes the algo-\nrithm take exponential time). In order to get an algorithm that is accurate on sparse graphs, we need\nseveral nontrivial extensions of current techniques. To achieve privacy, we use a new variation of the\nLipschitz extension technique of [21, 14] to reduce the sensitivity of the \u03b42 distance. While those\nworks used Lipschitz extensions for noise addition, we use of Lipshitz extensions inside the \u201cexpo-\nnential mechanism\u201d [25] (to control the sensitivity of the score functions). To bound our algorithm\u2019s\nerror, we provide a new analysis of the (cid:96)2-minimization algorithm; we show that approximate min-\nimizers are not too far from the actual minimizer (a \u201cstability\u201d property). Both aspects of our work\nare enabled by restricting the (cid:96)2\n2-minimization to a set of block models whose density (in fact, L\u221e\nnorm) is not much larger than that of the underlying graph. The algorithm is presented in Section 3.\nOur most general result proves consistency for arbitrary graphons W but does not provides a con-\ncrete rate of convergence. However, we provide explicit rates under various assumptions on W .\nSpeci\ufb01cally, we relate the error of our estimator to two natural error terms involving the graphon W :\nthe error \u0001(O)\n(W ) of the best k-block approximation to W in the L2 norm (see (4) below) and an\nerror term \u0001n(W ) measuring the L2-distance between the graphon W and the matrix of probabilities\nHn(W ) generating the graph Gn (see (5) below.) In terms of these error terms, Theorem 1 shows\n\nk\n\n(cid:32)\n\n(cid:115)\n\n\u03b42\n\nW, \u02c6W\n\n(W ) + 2\u0001n(W ) + OP\n\n4\n\nlog k\n\u03c1n\n\n+\n\nk2 log n\n\nn\u0001\n\n+\n\n1\n\n\u03c1\u0001n\n\n.\n\n(2)\n\n(cid:16)\n\n(cid:17) \u2264 \u0001(O)\n\nk\n\nprovided the average degree \u03c1n grows at least like log n. Along the way, we provide a novel analysis\nof a straightforward, nonprivate least-squares estimator that does not require an assumption on the\naverage degree, and leads to an error bound with a better dependence on k:\n\n(cid:33)\n\n(cid:114)\n\n(cid:32)\n\n(cid:115)\n\n(cid:33)\n\nlog k\n\u03c1n\n\n+\n\nk2\n\u03c1n2\n\n(cid:16)\n\n(cid:17) \u2264 \u0001(O)\n\nk\n\n\u03b42\n\nW, \u02c6Wnonprivate\n\n(W ) + 2\u0001n(W ) + OP\n\n4\n\n.\n\n(3)\n\n(W ) \u2192 0 as\nIt follows from the theory of graph convergence that for all graphons W , we have \u0001(O)\nk \u2192 \u221e and \u0001n(W ) \u2192 0 almost surely as n \u2192 \u221e. By selecting k appropriately, the nonprivate\nalgorithm converges for any bounded graphon as long as \u03c1n \u2192 \u221e with n; the private algorithm\nconverges whenever \u03c1n \u2265 6 log n (e.g., for constant \u0001). As proven in the full version, we also have\n\u0001n(W ) = OP (\u0001(O)\n\n(W ) + 4(cid:112)k/n), though this upper bound is loose in many cases.\n\nk\n\nk\n\n3\n\n\fk\n\nk\n\n(W ) = 0 and \u0001n(W ) = OP ( 4(cid:112)k/n) (see full version\nasymptotic error that is dominated by the (unavoidable) error of \u0001n(W ) = 4(cid:112)k/n, showing that we\n\nAs a speci\ufb01c instantiation of these bounds, let us consider the case that W is exactly described\nby a k-block model, in which case \u0001(O)\nfor a proof). For k \u2264 (n/ log2 n)1/3, \u03c1 \u2265 log(k)/k and constant \u0001, our private estimator has an\n\n(W ) = O(k\u2212\u03b1) and \u0001n(W ) = OP (n\u2212\u03b1/2); see Remark 2 below.\n\ndo not lose anything due to privacy in this special case. Another special case is when W is \u03b1-H\u00a8older\ncontinuous, in which case \u0001(O)\nComparison to Previous Nonprivate Bounds. We provide the \ufb01rst consistency bounds for estima-\ntion of a nonparametric graph model subject to node differential privacy. Along the way, for sparse\ngraphs, we provide more general consistency results than were previously known, regardless of pri-\nvacy. In particular, to the best of our knowledge, no prior results give a consistent estimator for W\nthat works for sparse graphs without any additional assumptions besides boundedness.\nWhen compared to results for nonprivate algorithms applied to graphons obeying additional assump-\ntions, our bounds are often incomparable, and in other cases match the existing bounds.\nWe start by considering graphons which are themselves step functions with a known number of steps\nk. In the dense case, the nonprivate algorithms of [18] and [13], as well as our nonprivate algorithm,\n\ngive an asymptotic error that is dominated by the term \u0001n(W ) = O( 4(cid:112)k/n), which is of the same\nand our nonprivate bound are dominated by the term 4(cid:112)k/n when \u03c1 > (log k)/k and k \u2264 \u03c1n. A\n\norder as our private estimator as long as k = \u02dco(n1/3). [28] provided the \ufb01rst convergence results\nfor estimating graphons in the sparse regime. Assuming that W is bounded above and below (so it\ntakes values in a range [\u03bb1, \u03bb2] where \u03bb1 > 0), they analyze an inef\ufb01cient algorithm (the MLE). The\nbounds of [28] are incomparable to ours, though for the case of k-block graphons, both their bounds\n\ndifferent sequence of works shows how to consistently estimate the underlying block model with a\n\ufb01xed number of blocks k in polynomial time for very sparse graphs (as for our non-private algorithm,\nthe only thing which is needed is that n\u03c1 \u2192 \u221e) [3, 1, 2]; we are not aware of concrete bounds on\nthe convergence rate.\nFor the case of dense \u03b1-H\u00a8older-continuous graphons, the results of [18] give an error which is\ndominated by the term \u0001n(W ) = OP (n\u2212\u03b1/2). For \u03b1 < 1/2, our nonprivate bound matches this\nbound, while for \u03b1 > 1/2 it is worse. [28] considers the sparse case. The rate of their estimator is\nincomparable to that of ours; further, their analysis requires a lower bound on the edge probabilities,\nwhile ours does not. Very recently, after our paper was submitted, both the bounds of [28] as well as\nour non-private bound (3) were substantially improved [22], leading to an error bound where the 4th\nroot in (3) is replaced by a square root (at the cost of an extra constant multiplying the oracle error.)\nSee the full version for a more detailed discussion of the previous literature.\n\n2 Preliminaries\n\n(cid:1).\n\n2\n\n2.1 Notation\nFor a graph G on [n] = {1, . . . , n}, we use E(G) and A(G) to denote the edge set and the adjacency\n\nmatrix of G, respectively. The edge density \u03c1(G) is de\ufb01ned as the number of edges divided by(cid:0)n\ndi =(cid:80)\n\nFinally the degree di of a vertex i in G is the number of edges containing i. We use the same notation\ni<j \u03b2ij, and\nfor a weighted graph with nonnegative edge weights \u03b2ij, where now \u03c1(G) = 2\nj(cid:54)=i \u03b2ij. We use Gn to denote the set of weighted graphs on n vertices with weights in [0, 1],\n\nn(n\u22121)\nand Gn,d to denote the set of all graphs in Gn that have maximal degree at most d.\nFrom Matrices to Graphons. We de\ufb01ne a graphon to be a bounded, measurable function W :\n[0, 1]2 \u2192 R+ such that W (x, y) = W (y, x) for all x, y \u2208 [0, 1]. It will be convenient to embed\nthe set of a symmetric n \u00d7 n matrix with nonnegative entries into graphons as follows: let Pn =\n(I1, . . . In) be the partition of [0, 1] into adjacent intervals of lengths 1/n. De\ufb01ne W [A] to be the\nstep function which equals Aij on Ii \u00d7 Ij. If A is the adjacency matrix of an unweighted graph G,\nwe use W [G] for W [A].\nDistances. For p \u2208 [1,\u221e) we de\ufb01ne the Lp norm of an n \u00d7 n matrix A and a (Borel)-measurable\n\ni,j |Aij|p(cid:1)1/p, and (cid:107)f(cid:107)p = (cid:0)(cid:82) |f (x, y)|pdxdy(cid:1)1/p,\n(cid:80)\n\nfunction W : [0, 1]2 \u2192 R by (cid:107)A(cid:107)p = (cid:0) 1\n\n(cid:80)\n\nn2\n\n4\n\n\f(cid:80)\n\nfor two n \u00d7 n matrices A and B, and (cid:104)U, W(cid:105) =(cid:82) U (x, y)W (x, y)dxdy for two square integrable\n\nrespectively. Associated with the L2-norm is a scalar product, de\ufb01ned as (cid:104)A, B(cid:105) = 1\ni,j AijBij\nfunctions U, W : [0, 1]2 \u2192 R. Note that with this notation, the edge density and the L1 norm are\nrelated by (cid:107)G(cid:107)1 = n\u22121\nRecalling (1), we de\ufb01ne the \u03b42 distance between two matrices A, B, or between a matrix A and a\ngraphon W by \u03b42(A, B) = \u03b42(W [A], W [B]) and \u03b42(A, W ) = \u03b42(W [A], W ). In addition, we will\nalso use the in general larger distances \u02c6\u03b42(A, B) and \u02c6\u03b42(A, W ), de\ufb01ned by taking a minimum over\nmatrices A(cid:48) which are obtained from A by a relabelling of the indices: \u02c6\u03b42(A, B) = minA(cid:48) (cid:107)A(cid:48)\u2212B(cid:107)2\nand \u02c6\u03b42(A, W ) = minA(cid:48) (cid:107)W [A(cid:48)] \u2212 W(cid:107)2.\n\nn \u03c1(G).\n\nn2\n\n2.2 W -random graphs, graph convergence and multi-way cuts\nW-random graphs and stochastic block models. Given a graphon W we de\ufb01ne a random n \u00d7 n\nmatrix Hn = Hn(W ) by choosing n \u201cpositions\u201d x1, . . . , xn i.i.d. uniformly at random from [0, 1]\nand then setting (Hn)ij = W (xi, xj). If (cid:107)W(cid:107)\u221e \u2264 1, then Hn(W ) has entries in [0, 1], and we can\nform a random graph Gn = Gn(W ) on n-vertices by choosing an edge between two vertices i < j\nwith probability (Hn)ij, independently for all i < j. Following [23] we call Gn(W ) a W -random\ngraph and Hn(W ) a W -weighted random graph. We incorporate a target density \u03c1n (or simply \u03c1,\n\nwhen n is clear from the context) by normalizing W so that(cid:82) W = 1 and taking G to be a sample\n\nfrom Gn(\u03c1W ). In other words, we set Q = Hn(\u03c1W ) = \u03c1Hn(W ) and then connect i to j with\nprobability Qij, independently for all i < j.\nStochastic block models are speci\ufb01c examples of W -random graph in which W is constant on sets\nof the form Ii \u00d7 Ij, where (I1, . . . , Ik) is a partition of [0, 1] into intervals of possibly different\nlengths.\nOn the other hand, an arbitrary graphon W can be well approximated by a block model. Indeed, let\n\n\u0001(O)\nk\n\n(cid:107)W \u2212 W [B](cid:107)2\n\n(W ) = min\n\n(4)\nwhere the minimum runs over all k \u00d7 k matrices B. By a straightforward argument (see, e.g., [11])\n(W ) = (cid:107)W \u2212 WPk(cid:107)2 \u2192 0 as k \u2192 \u221e. We will take this approximation as a benchmark for our\n\u0001(O)\nk\napproach, and consider it the error an \u201coracle\u201d could obtain (hence the superscript O).\nAnother key term in our algorithm\u2019s error guarantee is the distance between Hn(W ) and W ,\n\nB\n\n\u0001n(W ) = \u02c6\u03b42(Hn(W ), W ).\n\n(5)\n\nIt goes to zero as n \u2192 \u221e by the following lemma, which follows easily from the results of [11].\nLemma 1. Let W be a graphon with (cid:107)W(cid:107)\u221e< \u221e. With probability one, (cid:107)Hn(W )(cid:107)1 \u2192 (cid:107)W(cid:107)1 and\n\u0001n(W ) \u2192 0.\n\nConvergence. Given a sequence of W -random graphs with target densities \u03c1n, one might wonder\nwhether the graphs Gn = Gn(\u03c1nW ) converge to W in a suitable metric. The answer is yes,\nand involves the so-called cut-metric \u03b4(cid:3) \ufb01rst introduced in [9].\nIts de\ufb01nition is identical to the\nde\ufb01nition (1) of the norm \u03b42, except that instead of the L2-norm (cid:107)\u00b7\u00b7\u00b7(cid:107)2, it involves the Frieze-\n\nS\u00d7T W(cid:12)(cid:12) over all measurable sets S, T \u2282 [0, 1].\n\nKannan cut-norm (cid:107)W(cid:107)(cid:3) de\ufb01ned as the sup of(cid:12)(cid:12)(cid:82)\n(cid:16) 1\n(cid:17) \u2192 0, see [11] for the proof.\n\nIn the metric \u03b4(cid:3), the W -random graphs Gn = Gn(\u03c1W ) then converge to W in the sense that\n\u03b4(cid:3)\n\n\u03c1(Gn) W [Gn], W\n\nEstimation of Multi-Way Cuts. Using the results of [12], the convergence of Gn in the cut-metric\n\u03b4(cid:3) implies many interesting results for estimating various quantities de\ufb01ned on the graph Gn. In-\ndeed, a consistent approximation \u02c6W to W in the metric \u03b42 is clearly consistent in the weaker metric\n\u03b4(cid:3). This distance, in turn, controls various quantities of interest to computer scientists, e.g., the size\nof all multi-way cuts, implying that a consistent estimator for W also gives consistent estimators for\nall multi-way cuts. See the full version for details.\n\n5\n\n\f2.3 Differential Privacy for Graphs\n\nThe goal of this paper is the development of a differentially private algorithm for graphon estimation.\nThe privacy guarantees are formulated for worst-case inputs \u2014 we do not assume that G is generated\nfrom a graphon when analyzing privacy. This ensures that the guarantee remains meaningful no\nmatter what an analyst knows ahead of time about G.\nIn this paper, we consider node privacy. We call two graphs G and G(cid:48) node neighbors if one can be\nobtained from the other by removing one node and its adjacent edges.\nDe\ufb01nition 1 (\u0001-node-privacy). A randomized algorithm A is \u0001-node-private if for all events S in the\noutput space of A, and node neighbors G, G(cid:48),\n\nPr[A(G) \u2208 S] \u2264 exp(\u0001) \u00d7 Pr[A(G(cid:48)) \u2208 S] .\n\nWe also need the notion of the node-sensitivity of a function f : Gn \u2192 R, de\ufb01ned as maximum\nmaxG,G(cid:48) |f (G) \u2212 f (G(cid:48))|, where the maximum goes over node-neighbors. The node sensitivity is\nthe Lipshitz constant of f viewed as a map between appropriate metrics.\n\n3 Differentially Private Graphon Estimation\n\n3.1 Least-squares Estimation\n\nGiven a graph as input generated by an unknown graphon W , our goal is to recover a block-model\napproximation to W . The basic nonprivate algorithm we emulate is least squares estimation, which\noutputs the k \u00d7 k matrix B which is closest to the input adjacency matrix A in the distance\n\n\u02c6\u03b42(B, A) = min\n\u03c0\n\n(cid:107)B\u03c0 \u2212 A(cid:107)2,\n\nwhere the minimum runs over all equipartitions \u03c0 of [n] into k classes, i.e., over all maps \u03c0 : [n] \u2192\n[k] such that all classes have size as close to n/k as possible, i.e., such that ||\u03c0\u22121(i)| \u2212 n/k| < 1\nfor all i, and B\u03c0 is the n \u00d7 n block-matrix with entries (B\u03c0)xy = B\u03c0(x)\u03c0(y). If A is the adjacency\nmatrix of a graph G, we write \u02c6\u03b42(B, G) instead of \u02c6\u03b42(B, A).\nIn the above notation, the basic\nalgorithm we would want to emulate is then the algorithm which outputs the least square \ufb01t \u02c6B =\nargminB\n\n\u02c6\u03b42(B, G), where the argmin runs over all symmetric k \u00d7 k matrices B.\n\n3.2 Towards a Private Algorithm\n\nOur algorithm uses a carefully chosen instantiation of the exponential mechanism of McSherry and\nTalwar [25]. The most direct application of their framework would be to output a random k \u00d7 k\nmatrix \u02c6B according to the probability distribution\nPr( \u02c6B = B) \u221d exp\n\n(cid:16)\u2212C \u02c6\u03b42\n\n(cid:17)\n\n,\n\n2(B, A)\n\nfor some C > 0. The resulting algorithm is \u0001-differentially private if we set C to be \u0001 over twice the\n2(B,\u00b7). But this value of C turns out to be too small\nnode-sensitivity of the \u201cscore function\u201d, here \u03b42\nto produce an output that is a good approximation to the least square estimator. Indeed, for a given\nmatrix B and equipartition \u03c0, the node-sensitivity of (cid:107)G \u2212 B\u03c0(cid:107)2\nn, leading to a\nvalue of C which is too small to produce useful results for sparse graphs.\nTo address this, we \ufb01rst note that we can work with an equivalent score that is much less sensitive.\nGiven B and \u03c0, we subtract off the squared norm of G to obtain the following:\n\n2 can be as large as 1\n\nscore(B, \u03c0; G) = (cid:107)G(cid:107)2\nscore(B; G) = max\n\n(6)\n(7)\nwhere the max ranges over equipartitions \u03c0 : [n] \u2192 [k]. For a \ufb01xed input graph G, maximizing\n\u02c6\u03b42(B, G) = argmaxB score(B; G).\nthe score is the same as minimizing the distance, i.e. argminB\n\n2 = 2(cid:104)G, B\u03c0(cid:105) \u2212 (cid:107)B\u03c0(cid:107)2, and\n\n2 \u2212 (cid:107)G \u2212 B\u03c0(cid:107)2\nscore(B, \u03c0; G),\n\n\u03c0\n\n6\n\n\fn2 \u00b7 (cid:107)B(cid:107)\u221e times the maximum degree in G\nThe sensitivity of the new score is then bounded by 2\n(since G only affects the score via the inner product (cid:104)G, B\u03c0(cid:105)). But this is still problematic since, a\npriori, we have no control over either the size of (cid:107)B(cid:107)\u221e or the maximal degree of G.\nTo keep the sensitivity low, we make two modi\ufb01cations: \ufb01rst, we only optimize over matrices B\nwhose entries bounded by (roughly) \u03c1n (since a good estimator will have entries which are not\nmuch larger than (cid:107)\u03c1nW(cid:107)\u221e, which is of order \u03c1n); second, we restrict the score to be accurate only\non graphs whose maximum degree is at most a constant times the average degree, since this is what\none expects for graphs generated from a bounded graphon. While the \ufb01rst restriction can be directly\nenforced by the algorithm, the second is more delicate, since we need to provide privacy for all\ninputs, including graphs with very large maximum degree. We employ an idea from [6, 21]: we \ufb01rst\nconsider the restriction of score(B, \u03c0;\u00b7) to Gn,dn where dn will be chosen to be of the order of the\naverage degree of G, and then extend it back to all graphs while keeping the sensitivity low.\n\n3.3 Private Estimation Algorithm\n\nOur \ufb01nal algorithm takes as input the privacy parameter \u0001, the graph G, a number k of blocks, and a\nconstant \u03bb \u2265 1 that will have to be chosen large enough to guarantee consistency of the algorithm.\n\nAlgorithm 1: Private Estimation Algorithm\nInput: \u0001 > 0, \u03bb \u2265 1, an integer k and graph G on n vertices.\nOutput: k \u00d7 k block graphon (represented as a k \u00d7 k matrix \u02c6B) estimating \u03c1W\nCompute an (\u0001/2)-node-private density approximation \u02c6\u03c1 = \u03c1(G) + Lap(4/n\u0001) ;\nd = \u03bb\u02c6\u03c1n (the target maximum degree) ;\n\u00b5 = \u03bb\u02c6\u03c1 (the target L\u221e norm for \u02c6B) ;\nFor each B and \u03c0, let (cid:91)score(B, \u03c0;\u00b7) denote a nondecreasing Lipschitz extension (from [21]) of\nscore(B, \u03c0;\u00b7) from Gn,d to Gn such that for all matrices A, (cid:91)score(B, \u03c0; A) \u2264 score(B, \u03c0; A), and\nde\ufb01ne\n\n(cid:91)score(B; A) = max\n\n(cid:91)score(B, \u03c0; A)\n\nreturn \u02c6B, sampled from the distribution\n\nPr( \u02c6B = B) \u221d exp\n\n\u03c0\n\n(cid:16) \u0001\n\n4\u2206\n\n(cid:17)\n\n,\n\n(cid:91)score(B; A)\n\nwhere \u2206 =\n\n4d\u00b5\nn2 =\n\n4\u03bb2 \u02c6\u03c12\n\nn\n\nand B ranges over matrices in\n\nB\u00b5 = {B \u2208 [0, \u00b5]k\u00d7k : all entries Bi,j are multiples of 1\nn};\n\nOur main results about the private algorithm are the following lemma and theorem.\nLemma 2. Algorithm 1 is \u0001-node private.\nTheorem 1 (Performance of the Private Algorithm). Let W : [0, 1]2 \u2192 [0, \u039b] be a normalized\ngraphon, let 0 < \u03c1\u039b \u2264 1, let G = Gn(\u03c1W ), \u03bb \u2265 1, and k be an integer. Assume that \u03c1n \u2265 6 log n\nand 8\u039b \u2264 \u03bb \u2264 \u221a\n2 }. Then the Algorithm 1 outputs an approximation\n(\u02c6\u03c1, \u02c6B) such that\n\n2 , e\n\n\u03c1n\n\nn, 2 \u2264 k \u2264 min{n(cid:112) \u03c1\n(cid:17) \u2264 \u0001(O)\n\nk\n\n(cid:16)\n\n\u03b42\n\nW,\n\n1\n\u02c6\u03c1\n\nW [ \u02c6B]\n\n(W ) + 2\u0001n(W ) + OP\n\n\u03bb2 log k\n\n\u03c1n\n\n+ \u03bb\n\nk2 log n\n\nn\u0001\n\n+\n\n\u03bb\nn\u03c1\u0001\n\n(cid:32)\n\n(cid:115)\n\n4\n\n(cid:114)\n\n(cid:33)\n\n.\n\nRemark 1. While Theorem 1 is stated in term of bounds which hold in probability, our proofs yield\nstatements which hold almost surely as n \u2192 \u221e.\nRemark 2. Under additional assumptions on the graphon W , we obtain tighter bounds. For ex-\nample, if we assume that W is H\u00a8older continuous, i.e, there exist constants \u03b1 \u2208 (0, 1] and C < \u221e\nsuch that |W (x, y) \u2212 W (x(cid:48), y(cid:48))| \u2264 C\u03b4\u03b1 whenever |x \u2212 x(cid:48)| + |y \u2212 y(cid:48)| \u2264 \u03b4, then we have that\n\u0001(O)\nk\n\n(W ) = O(k\u2212\u03b1) and \u0001n(W ) = OP (n\u2212\u03b1/2).\n\n7\n\n\fRemark 3. When considering the \u201cbest\u201d block model approximation to W , one might want to\nconsider block models with unequal block sizes; in a similar way, one might want to construct a\nprivate algorithm that outputs a block model with unequal size blocks, and produces a bound in\nterms of this best block model approximation instead of \u0001(O)\n(W ). This can be proved with our\nmethods, with the minimal block size taking the role of 1/k in all our statements.\n\nk\n\n3.4 Non-Private Estimation Algorithm\nWe also analyze a simple, non-private algorithm, which outputs the argmin of \u02c6\u03b42(\u00b7, A) over all\nk \u00d7 k matrices whose entries are bounded by \u03bb\u03c1(G). (Independently of our work, this non-private\nalgorithm was also proposed and analysed in [22].) Our bound (3) refers to this restricted least square\nalgorithm, and does not require any assumptions on the average degree. As in (2), we suppress the\ndependence of the error on \u03bb. To include it, one has to multiply the OP term in (3) by\n\n\u221a\n\n\u03bb.\n\n4 Analysis of the Private and Non-Private Algorithm\n\n(W ) de\ufb01ned in (4).\n\nAt a high level, our proof of Theorem 1 (as well as our new bounds on non-private estimation) fol-\nlow from the fact that for all B and \u03c0, the expected score E[Score(B, \u03c0; G)] is equal to the score\nScore(B, \u03c0; Q), combined with a concentration argument. As a consequence, the maximizer \u02c6B of\nScore(B; G) will approximately minimize the L2-distance \u02c6\u03b42(B, Q), which in turn will approxi-\nmately minimize (cid:107) 1\n\u03c1 W [B]\u2212 W(cid:107)2, thus relating the L2-error of our estimator \u02c6B to the \u201coracle error\u201d\n\u0001(O)\nk\nOur main concentration statement is captured in the following proposition. To state it, we de\ufb01ne,\nfor every symmetric n \u00d7 n matrix Q with vanishing diagonal, Bern0(Q) to be the distribution\nover symmetric matrices A with zero diagonal such that the entries {Aij : i < j} are independent\nBernouilli random variables with EAij = Qij.\nProposition 1. Let \u00b5 > 0, Q \u2208 [0, 1]n\u00d7n be a symmetric matrix with vanishing diagonal, and\n\nA \u223c Bern0(Q). If 2 \u2264 k \u2264 min{n(cid:112)\u03c1(Q), e\u03c1(Q)n} and \u02c6B \u2208 B\u00b5 is such that\n\nScore( \u02c6B; A) \u2265 max\nB\u2208B\u00b5\n\nScore(B; A) \u2212 \u03bd2\n\nfor some \u03bd > 0, then with probability at least 1 \u2212 2e\u2212n,\n\n(cid:32)\n\n(cid:115)\n\n\u02c6\u03b42( \u02c6B, Q) \u2264 min\nB\u2208B\u00b5\n\n\u02c6\u03b42(B, Q) + \u03bd + O\n\n4\n\n\u00b52\u03c1(Q)\n\n(cid:18) k2\n\nn2 +\n\n(cid:19)(cid:33)\n\n.\n\nlog k\n\nn\n\n(8)\n\nMorally, the proposition contains almost all that is needed to establish the bound (3) proving consis-\ntency of the non-private algorithm (which, in fact, only involves the case \u03bd = 0), even though there\nare several additional steps needed to complete the proof.\nThe proposition also contains an extra ingredient which is a crucial input for the analysis of the\nprivate algorithm: it states that if instead of an optimal, least square estimator, we output an estimator\nwhose score is only approximately maximal, then the excess error introduced by the approximation\nis small. To apply the proposition, we then establish a lemma which gives us a lower bound on the\nscore of the output \u02c6B in terms of the maximal score and an excess error \u03bd.\nThere are several steps needed to execute this strategy, the most important ones involving a rigorous\ncontrol of the error introduced by the Lipschitz extension inside the exponential algorithm. We defer\nthe details to the full version.\nAcknowledgments. A.S. was supported by NSF award IIS-1447700 and a Google Faculty Award.\nPart of this work was done while visiting Boston University\u2019s Hariri Institute for Computation and\nHarvard University\u2019s Center for Research on Computation and Society.\n\n8\n\n\fReferences\n[1] E. Abbe and C. Sandon. Recovering communities in the general stochastic block model without knowing\n\nthe parameters. arXiv:1503.00609, 2015.\n\n[2] E. Abbe and C. Sandon. Recovering communities in the general stochastic block model without knowing\n\nthe parameters. Manuscript, 2015.\n\n[3] E. Abbe, A. S. Bandeira, and G. Hall. Exact recovery in the stochastic block model. arXiv:1405.3267,\n\n2014.\n\n[4] P. J. Bickel and A. Chen. A nonparametric view of network models and newman-girvan and other mod-\nularities. Proceedings of the National Academy of Sciences of the United States of America, 106:21068\u2013\n21073, 2009.\n\n[5] P. J. Bickel, A. Chen, and E. Levina. The method of moments and degree distributions for network\n\nmodels. Annals of Statistics, 39(5):2280\u20132301, 2011.\n\n[6] J. Blocki, A. Blum, A. Datta, and O. Sheffet. Differentially private data analysis of social networks via\n\nrestricted sensitivity. In Innovations in Theoretical Computer Science (ITCS), pages 87\u201396, 2013.\n\n[7] B. Bollobas, S. Janson, and O. Riordan. The phase transition in inhomogeneous random graphs. Random\n\nStruct. Algorithms, 31:3\u2013122, 2007.\n\n[8] C. Borgs, J. T. Chayes, L. Lov\u00b4asz, V. S\u00b4os, and K. Vesztergombi. Counting graph homomorphisms. In\nTopics in Discrete Mathematics (eds. M. Klazar, J. Kratochvil, M. Loebl, J. Matousek, R. Thomas, P.Valtr),\npages 315\u2013371. Springer, 2006.\n\n[9] C. Borgs, J. T. Chayes, L. Lov\u00b4asz, V. S\u00b4os, and K. Vesztergombi. Convergent graph sequences I: Subgraph\n\nfrequencies, metric properties, and testing. Advances in Math., 219:1801\u20131851, 2008.\n\n[10] C. Borgs, J. T. Chayes, L. Lov\u00b4asz, V. S\u00b4os, and K. Vesztergombi. Convergent graph sequences II: Multiway\n\ncuts and statistical physics. Ann. of Math., 176:151\u2013219, 2012.\n\n[11] C. Borgs, J. T. Chayes, H. Cohn, and Y. Zhao. An Lp theory of sparse graph convergence I: limits, sparse\n\nrandom graph models, and power law distributions. arXiv:1401.2906, 2014.\n\n[12] C. Borgs, J. T. Chayes, H. Cohn, and Y. Zhao. An Lp theory of sparse graph convergence II: LD conver-\n\ngence, quotients, and right convergence. arXiv:1408.0744, 2014.\n\n[13] S. Chatterjee. Matrix estimation by universal singular value thresholding. Annals of Statistics, 43(1):\n\n177\u2013214, 2015.\n\n[14] S. Chen and S. Zhou. Recursive mechanism: towards node differential privacy and unrestricted joins. In\n\nACM SIGMOD International Conference on Management of Data, pages 653\u2013664, 2013.\n\n[15] D. S. Choi, P. J. Wolfe, and E. M. Airoldi. Stochastic blockmodels with a growing number of classes.\n\nBiometrika, 99:273\u2013284, 2012.\n\n[16] P. Diaconis and S. Janson. Graph limits and exchangeable random graphs. Rendiconti di Matematica, 28:\n\n33\u201461, 2008.\n\n[17] C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis.\n\nIn S. Halevi and T. Rabin, editors, TCC, volume 3876, pages 265\u2013284, 2006.\n\n[18] C. Gao, Y. Lu, and H. H. Zhou. Rate-optimal graphon estimation. arXiv:1410.5837, 2014.\n[19] P. D. Hoff, A. E. Raftery, and M. S. Handcock. Latent space approaches to social network analysis.\n\nJournal of the American Statistical Association, 97(460):1090\u20131098, 2002.\n\n[20] P. Holland, K. Laskey, and S. Leinhardt. Stochastic blockmodels: First steps. Soc Netw, 5:109\u2013137, 1983.\n[21] S. P. Kasiviswanathan, K. Nissim, S. Raskhodnikova, and A. Smith. Analyzing graphs with node-\n\ndifferential privacy. In Theory of Cryptography Conference (TCC), pages 457\u2013476, 2013.\n\n[22] O. Klopp, A. Tsybakov, and N. Verzelen. Oracle inequalities for network models and sparse graphon\n\nestimation. arXiv:1507.04118, 2015.\n\n[23] L. Lov\u00b4asz and B. Szegedy. Limits of dense graph sequences. Journal of Combinatorial Theory, Series B,\n\n96:933\u2013957, 2006.\n\n[24] W. Lu and G. Miklau. Exponential random graph estimation under differential privacy. In 20th ACM\n\nSIGKDD International Conference on Knowledge discovery and data mining, pages 921\u2013930, 2014.\n\n[25] F. McSherry and K. Talwar. Mechanism design via differential privacy. In FOCS, pages 94\u2013103. IEEE,\n\n2007.\n\n[26] S. Raskhodnikova and A. Smith. High-dimensional Lipschitz extensions and node-private analysis of\n\nnetwork data. arXiv:1504.07912, 2015.\n\n[27] K. Rohe, S. Chatterjee, and B. Yu. Spectral clustering and the high-dimensional stochastic blockmodel.\n\nAnn. Statist., 39(4):1878\u20131915, 08 2011.\n\n[28] P. Wolfe and S. C. Olhede. Nonparametric graphon estimation. arXiv:1309.5936, 2013.\n\n9\n\n\f", "award": [], "sourceid": 832, "authors": [{"given_name": "Christian", "family_name": "Borgs", "institution": "Microsoft Research"}, {"given_name": "Jennifer", "family_name": "Chayes", "institution": "Microsoft Research"}, {"given_name": "Adam", "family_name": "Smith", "institution": "Pennsylvania State University"}]}