{"title": "Learning to Discover Social Circles in Ego Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 539, "page_last": 547, "abstract": "Our personal social networks are big and cluttered, and currently there is no good way to organize them. Social networking sites allow users to manually categorize their friends into social circles (e.g. `circles' on Google+, and `lists' on Facebook and Twitter), however they are laborious to construct and must be updated whenever a user's network grows. We define a novel machine learning task of identifying users' social circles. We pose the problem as a node clustering problem on a user's ego-network, a network of connections between her friends. We develop a model for detecting circles that combines network structure as well as user profile information. For each circle we learn its members and the circle-specific user profile similarity metric. Modeling node membership to multiple circles allows us to detect overlapping as well as hierarchically nested circles. Experiments show that our model accurately identifies circles on a diverse set of data from Facebook, Google+, and Twitter for all of which we obtain hand-labeled ground-truth data.", "full_text": "Learning to Discover Social Circles in Ego Networks\n\nJulian McAuley\nStanford, USA\n\njmcauley@cs.stanford.edu\n\nJure Leskovec\nStanford, USA\n\njure@cs.stanford.edu\n\nAbstract\n\nOur personal social networks are big and cluttered, and currently there is no good\nway to organize them. Social networking sites allow users to manually categorize\ntheir friends into social circles (e.g. \u2018circles\u2019 on Google+, and \u2018lists\u2019 on Facebook\nand Twitter), however they are laborious to construct and must be updated when-\never a user\u2019s network grows. We de\ufb01ne a novel machine learning task of identi-\nfying users\u2019 social circles. We pose the problem as a node clustering problem on\na user\u2019s ego-network, a network of connections between her friends. We develop\na model for detecting circles that combines network structure as well as user pro-\n\ufb01le information. For each circle we learn its members and the circle-speci\ufb01c user\npro\ufb01le similarity metric. Modeling node membership to multiple circles allows us\nto detect overlapping as well as hierarchically nested circles. Experiments show\nthat our model accurately identi\ufb01es circles on a diverse set of data from Facebook,\nGoogle+, and Twitter for all of which we obtain hand-labeled ground-truth.\n\n1\n\nIntroduction\n\nOnline social networks allow users to follow streams of posts generated by hundreds of their friends\nand acquaintances. Users\u2019 friends generate overwhelming volumes of information and to cope with\nthe \u2018information overload\u2019 they need to organize their personal social networks. One of the main\nmechanisms for users of social networking sites to organize their networks and the content gener-\nated by them is to categorize their friends into what we refer to as social circles. Practically all\nmajor social networks provide such functionality, for example, \u2018circles\u2019 on Google+, and \u2018lists\u2019 on\nFacebook and Twitter. Once a user creates her circles, they can be used for content \ufb01ltering (e.g. to\n\ufb01lter status updates posted by distant acquaintances), for privacy (e.g. to hide personal information\nfrom coworkers), and for sharing groups of users that others may wish to follow.\nCurrently, users in Facebook, Google+ and Twitter identify their circles either manually, or in a\nna\u00a8\u0131ve fashion by identifying friends sharing a common attribute. Neither approach is particularly\nsatisfactory: the former is time consuming and does not update automatically as a user adds more\nfriends, while the latter fails to capture individual aspects of users\u2019 communities, and may function\npoorly when pro\ufb01le information is missing or withheld.\nIn this paper we study the problem of automatically discovering users\u2019 social circles. In particular,\ngiven a single user with her personal social network, our goal is to identify her circles, each of which\nis a subset of her friends. Circles are user-speci\ufb01c as each user organizes her personal network of\nfriends independently of all other users to whom she is not connected. This means that we can\nformulate the problem of circle detection as a clustering problem on her ego-network, the network\nof friendships between her friends. In Figure 1 we are given a single user u and we form a network\nbetween her friends vi. We refer to the user u as the ego and to the nodes vi as alters. The task then\nis to identify the circles to which each alter vi belongs, as in Figure 1. In other words, the goal is to\n\ufb01nd nested as well as overlapping communities/clusters in u\u2019s ego-network.\nGenerally, there are two useful sources of data that help with this task. The \ufb01rst is the set of edges\nof the ego-network. We expect that circles are formed by densely-connected sets of alters [20].\n\n1\n\n\fFigure 1: An ego-network with labeled circles. This network shows typical behavior that we ob-\nserve in our data: Approximately 25% of our ground-truth circles (from Facebook) are contained\ncompletely within another circle, 50% overlap with another circle, and 25% of the circles have no\nmembers in common with any other circle. The goal is to discover these circles given only the\nnetwork between the ego\u2019s friends. We aim to discover circle memberships and to \ufb01nd common\nproperties around which circles form.\n\nHowever, different circles overlap heavily, i.e., alters belong to multiple circles simultaneously [1,\n21, 28, 29], and many circles are hierarchically nested in larger ones (Figure 1). Thus it is important\nto model an alter\u2019s memberships to multiple circles. Secondly, we expect that each circle is not only\ndensely connected but its members also share common properties or traits [18, 28]. Thus we need\nto explicitly model different dimensions of user pro\ufb01les along which each circle emerges.\nWe model circle af\ufb01liations as latent variables, and similarity between alters as a function of com-\nmon pro\ufb01le information. We propose an unsupervised method to learn which dimensions of pro\ufb01le\nsimilarity lead to densely linked circles. Our model has two innovations: First, in contrast to mixed-\nmembership models [2] we predict hard assignment of a node to multiple circles, which proves\ncritical for good performance. Second, by proposing a parameterized de\ufb01nition of pro\ufb01le similar-\nity, we learn the dimensions of similarity along which links emerge. This extends the notion of\nhomophily [12] by allowing different circles to form along different social dimensions, an idea re-\nlated to the concept of Blau spaces [16]. We achieve this by allowing each circle to have a different\nde\ufb01nition of pro\ufb01le similarity, so that one circle might form around friends from the same school,\nand another around friends from the same location. We learn the model by simultaneously choosing\nnode circle memberships and pro\ufb01le similarity functions so as to best explain the observed data.\nWe introduce a dataset of 1,143 ego-networks from Facebook, Google+, and Twitter, for which we\nobtain hand-labeled ground-truth from 5,636 different circles.1 Experimental results show that by\nsimultaneously considering social network structure as well as user pro\ufb01le information our method\nperforms signi\ufb01cantly better than natural alternatives and the current state-of-the-art. Besides being\nmore accurate our method also allows us to generate automatic explanations of why certain nodes\nbelong to common communities. Our method is completely unsupervised, and is able to automati-\ncally determine both the number of circles as well as the circles themselves.\nFurther Related Work.\nTopic-modeling techniques have been used to uncover \u2018mixed-\nmemberships\u2019 of nodes to multiple groups [2], and extensions allow entities to be attributed with\ntext information [3, 5, 11, 13, 26]. Classical algorithms tend to identify communities based on node\nfeatures [9] or graph structure [1, 21], but rarely use both in concert. Our work is related to [30] in\nthe sense that it performs clustering on social-network data, and [23], which models memberships\nto multiple communities. Finally, there are works that model network data similar to ours [6, 17],\nthough the underlying models do not form communities. As we shall see, our problem has unique\ncharacteristics that require a new model. An extended version of our article appears in [15].\n\n2 A Generative Model for Friendships in Social Circles\n\nWe desire a model of circle formation with the following properties: (1) Nodes within circles should\nhave common properties, or \u2018aspects\u2019. (2) Different circles should be formed by different aspects,\ne.g. one circle might be formed by family members, and another by students who attended the same\nuniversity. (3) Circles should be allowed to overlap, and \u2018stronger\u2019 circles should be allowed to form\nwithin \u2018weaker\u2019 ones, e.g. a circle of friends from the same degree program may form within a circle\n\n1http://snap.stanford.edu/data/\n\n2\n\n\ffrom the same university, as in Figure 1. (4) We would like to leverage both pro\ufb01le information and\nnetwork structure in order to identify the circles. Ideally we would like to be able to pinpoint which\naspects of a pro\ufb01le caused a circle to form, so that the model is interpretable by the user.\nThe input to our model is an ego-network G = (V, E), along with \u2018pro\ufb01les\u2019 for each user v \u2208 V .\nThe \u2018center\u2019 node u of the ego-network (the \u2018ego\u2019) is not included in G, but rather G consists only of\nu\u2019s friends (the \u2018alters\u2019). We de\ufb01ne the ego-network in this way precisely because creators of circles\ndo not themselves appear in their own circles. For each ego-network, our goal is to predict a set of\ncircles C = {C1 . . . CK}, Ck \u2286 V , and associated parameter vectors \u03b8k that encode how each circle\nemerged. We encode \u2018user pro\ufb01les\u2019 into pairwise features \u03c6(x, y) that in some way capture what\nproperties the users x and y have in common. We \ufb01rst describe our model, which can be applied\nusing arbitrary feature vectors \u03c6(x, y), and in Section 5 we describe several ways to construct feature\nvectors \u03c6(x, y) that are suited to our particular application.\nWe describe a model of social circles that treats circle memberships as latent variables. Nodes within\na common circle are given an opportunity to form an edge, which naturally leads to hierarchical and\noverlapping circles. We will then devise an unsupervised algorithm to jointly optimize the latent\nvariables and the pro\ufb01le similarity parameters so as to best explain the observed network data.\nOur model of social circles is de\ufb01ned as follows. Given an ego-network G and a set of K circles\nC = {C1 . . . CK}, we model the probability that a pair of nodes (x, y) \u2208 V \u00d7 V form an edge as\n\np((x, y) \u2208 E) \u221d exp\n\n(cid:104)\u03c6(x, y), \u03b8k(cid:105)\n\n\u03b1k (cid:104)\u03c6(x, y), \u03b8k(cid:105)\n\n.\n\n(1)\n\n(cid:40) (cid:88)\n(cid:124)\n\nCk\u2287{x,y}\n\n(cid:123)(cid:122)\n\n\u2212 (cid:88)\n(cid:124)\n\n(cid:125)\n\nCk(cid:43){x,y}\n\n(cid:123)(cid:122)\n\n(cid:41)\n(cid:125)\n\ncircles containing both nodes\n\nall other circles\n\nFor each circle Ck, \u03b8k is the pro\ufb01le similarity parameter that we will learn. The idea is that\n(cid:104)\u03c6(x, y), \u03b8k(cid:105) is high if both nodes belong to Ck, and low if either of them do not (\u03b1k trades-off\nthese two effects). Since the feature vector \u03c6(x, y) encodes the similarity between the pro\ufb01les of\ntwo users x and y, the parameter vector \u03b8k encodes what dimensions of pro\ufb01le similarity caused the\ncircle to form, so that nodes within a circle Ck should \u2018look similar\u2019 according to \u03b8k.\nConsidering that edges e = (x, y) are generated independently, we can write the probability of G as\n(2)\n\nP\u0398(G;C) = (cid:89)\n\np(e \u2208 E) \u00d7(cid:89)\n\np(e /\u2208 E),\n\nwhere \u0398 = {(\u03b8k, \u03b1k)}k=1...K is our set of model parameters. De\ufb01ning the shorthand notation\n\ndk(e) = \u03b4(e \u2208 Ck) \u2212 \u03b1k\u03b4(e /\u2208 Ck), \u03a6(e) = (cid:88)\n\ndk(e)(cid:104)\u03c6(e), \u03b8k(cid:105)\n\ne\u2208E\n\ne(cid:54)\u2208E\n\nallows us to write the log-likelihood of G:\n\nl\u0398(G;C) =(cid:88)\n\n\u03a6(e) \u2212 (cid:88)\n\ne\u2208E\n\ne\u2208V \u00d7V\n\nCk\u2208C\n\nlog(cid:16)1 + e\u03a6(e)(cid:17)\n\n.\n\n(3)\n\nNext, we describe how to optimize node circle memberships C as well as the parameters of the user\npro\ufb01le similarity functions \u0398 = {(\u03b8k, \u03b1k)} (k = 1 . . . K) given a graph G and user pro\ufb01les.\n\n3 Unsupervised Learning of Model Parameters\nTreating circles C as latent variables, we aim to \ufb01nd \u02c6\u0398 = {\u02c6\u03b8, \u02c6\u03b1} so as to maximize the regularized\nlog-likelihood of (eq. 3), i.e.,\n\nWe solve this problem using coordinate ascent on \u0398 and C [14]:\n\n\u02c6\u0398, \u02c6C = argmax\n\nl\u0398(G;C) \u2212 \u03bb\u2126(\u03b8).\n\n\u0398,C\n\nCt = argmax\n\u0398t+1 = argmax\n\nC\n\nl\u0398t(G;C)\nl\u0398(G;Ct) \u2212 \u03bb\u2126(\u03b8).\n\n(4)\n\n(5)\n\n(6)\n\n\u0398\n\n3\n\n\fNoting that (eq. 3) is concave in \u03b8, we optimize (eq. 6) through gradient ascent, where partial deriva-\ntives are given by\n\n= X\n= X\n\ne\u2208V \u00d7V\n\ne\u2208V \u00d7V\n\n\u2202l\n\u2202\u03b8k\n\n\u2202l\n\u2202\u03b1k\n\n\u2212de(k)\u03b8k\n\ne\u03a6(e)\n\n1 + e\u03a6(e)\n\n\u03b4(e /\u2208 Ck)(cid:104)\u03c6(e), \u03b8k(cid:105)\n\n+X\n1 + e\u03a6(e) \u2212X\n\ne\u2208E\ne\u03a6(e)\n\ndk(e)\u03b8k \u2212 \u2202\u2126\n\u2202\u03b8k\n\ne\u2208E\n\n\u03b4(e /\u2208 Ck)(cid:104)\u03c6(e), \u03b8k(cid:105) .\n\nFor \ufb01xed C \\ Ci we note that solving argmax\noptimization in a pairwise graphical model [4], i.e., it can be written as\n\nl\u0398(G;C \\ Ci) can be expressed as pseudo-boolean\n\nCi\n\nE(x,y)(\u03b4(x \u2208 C), \u03b4(y \u2208 C)).\n\n(7)\n\n(cid:88)\nappear outside of Ck. De\ufb01ning ok(e) =(cid:80)\n\nCk = argmax\n\nC\n\n(x,y)\u2208V \u00d7V\n\nIn words, we want edges with high weight (under \u03b8k) to appear in Ck, and edges with low weight to\n\nEe(0, 0) = Ee(0, 1) = Ee(1, 0) =\n\nEe(1, 1) =\n\nCk\u2208C\\Ci\n\n\uf6be ok(e) \u2212 \u03b1k (cid:104)\u03c6(e), \u03b8k(cid:105) \u2212 log(1 + eok(e)\u2212\u03b1k(cid:104)\u03c6(e),\u03b8k(cid:105)),\n\uf6be ok(e) + (cid:104)\u03c6(e), \u03b8k(cid:105) \u2212 log(1 + eok(e)+(cid:104)\u03c6(e),\u03b8k(cid:105)),\n\ndk(e)(cid:104)\u03c6(e), \u03b8k(cid:105) the energy Ee of (eq. 7) is\ne \u2208 E\ne /\u2208 E\n\n\u2212 log(1 + eok(e)\u2212\u03b1k(cid:104)\u03c6(e),\u03b8k(cid:105)),\n\n\u2212 log(1 + eok(e)+(cid:104)\u03c6(e),\u03b8k(cid:105)),\n\ne \u2208 E\ne /\u2208 E\n\n.\n\nWe regularize (eq. 4) using the (cid:96)1 norm, i.e., \u2126(\u03b8) =(cid:80)K\n\nBy expressing the problem in this form we can draw upon existing work on pseudo-boolean op-\ntimization. We use the publicly-available \u2018QPBO\u2019 software described in [22], which is able to\naccurately approximate problems of the form shown in (eq. 7). We solve (eq. 7) for each Ck in a\nrandom order.\n(cid:80)|\u03b8k|\nThe two optimization steps of (eq. 5) and (eq. 6) are repeated until convergence, i.e., until Ct+1 = Ct.\ni=1 |\u03b8ki|, which leads to sparse (and\nreadily interpretable) parameters. Since ego-networks are naturally relatively small, our algorithm\ncan readily handle problems at the scale required. In the case of Facebook, the average ego-network\nhas around 190 nodes [24], while the largest network we encountered has 4,964 nodes. Note that\nsince the method is unsupervised, inference is performed independently for each ego-network. This\nmeans that our method could be run on the full Facebook graph (for example), as circles are inde-\npendently detected for each user, and the ego-networks typically contain only hundreds of nodes.\nHyperparameter estimation. To choose the optimal number of circles, we choose K so as to\nminimize an approximation to the Bayesian Information Criterion (BIC) [2, 8, 25],\n\nk=1\n\n\u02c6K = argmin\n\nBIC (K; \u0398K)\n\nK\n\n(8)\n\nwhere \u0398K is the set of parameters predicted for a particular number of communities K, and\n\n(9)\nThe regularization parameter \u03bb \u2208 {0, 1, 10, 100} was determined using leave-one-out cross valida-\ntion, though in our experience did not signi\ufb01cantly impact performance.\n\nBIC (K; \u0398K) (cid:39) \u22122l\u0398K (G;C) + |\u0398K| log |E|.\n\n4 Dataset Description\n\nOur goal is to evaluate our unsupervised method on ground-truth data. We expended signi\ufb01cant time,\neffort, and resources to obtain high quality hand-labeled data.2 We were able to obtain ego-networks\nand ground-truth from three major social networking sites: Facebook, Google+, and Twitter.\nFrom Facebook we obtained pro\ufb01le and network data from 10 ego-networks, consisting of 193 cir-\ncles and 4,039 users. To do so we developed our own Facebook application and conducted a survey\nof ten users, who were asked to manually identify all the circles to which their friends belonged. On\naverage, users identi\ufb01ed 19 circles in their ego-networks, with an average circle size of 22 friends.\nExamples of such circles include students of common universities, sports teams, relatives, etc.\n\n2http://snap.stanford.edu/data/\n\n4\n\n\f2666666666666664\n2666664\n\n1 \u2212 \u03c3x,y =\n\n1 \u2212 \u03c3\n\n(cid:48)\nx,y\n\n=\n\n3777777777777775\n3777775\n\n0\n0\n0\n0\n1\n1\n0\n1\n1\n0\n0\n\n0\n0\n1\n1\n1\n1\n\n\ufb01rst name : Dilly\nlast name : Knox\n\ufb01rst name : Alan\nlast name : Turing\nwork : position : Cryptanalyst\nwork : location : GC &CS\nwork : location : Royal Navy\neducation : name : Cambridge\neducation : type : College\neducation : name : Princeton\neducation : type : Graduate School\n\n\ufb01rst name\nlast name\nwork : position\nwork : location\neducation : name\neducation : type\n\nFigure 2: Feature construction. Pro\ufb01les are tree-structured, and we construct features by com-\nparing paths in those trees. Examples of trees for two users x (blue) and y (pink) are shown at\nleft. Two schemes for constructing feature vectors from these pro\ufb01les are shown at right: (1) (top\nright) we construct binary indicators measuring the difference between leaves in the two trees, e.g.\n\u2018work\u2192position\u2192Cryptanalyst\u2019 appears in both trees. (2) (bottom right) we sum over the leaf nodes\nin the \ufb01rst scheme, maintaining the fact that the two users worked at the same institution, but dis-\ncarding the identity of that institution.\nFor the other two datasets we obtained publicly accessible data. From Google+ we obtained data\nfrom 133 ego-networks, consisting of 479 circles and 106,674 users. The 133 ego-networks rep-\nresent all 133 Google+ users who had shared at least two circles, and whose network information\nwas publicly accessible at the time of our crawl. The Google+ circles are quite different to those\nfrom Facebook, in the sense that their creators have chosen to release them publicly, and because\nGoogle+ is a directed network (note that our model can very naturally be applied to both to directed\nand undirected networks). For example, one circle contains candidates from the 2012 republican\nprimary, who presumably do not follow their followers, nor each other. Finally, from Twitter we\nobtained data from 1,000 ego-networks, consisting of 4,869 circles (or \u2018lists\u2019 [10, 19, 27, 31]) and\n81,362 users. The ego-networks we obtained range in size from 10 to 4,964 nodes.\nTaken together our data contains 1,143 different ego-networks, 5,541 circles, and 192,075 users.\nThe size differences between these datasets simply re\ufb02ects the availability of data from each of the\nthree sources. Our Facebook data is fully labeled, in the sense that we obtain every circle that a\nuser considers to be a cohesive community, whereas our Google+ and Twitter data is only partially\nlabeled, in the sense that we only have access to public circles. We design our evaluation procedure\nin Section 6 so that partial labels cause no issues.\n\n5 Constructing Features from User Pro\ufb01les\n\nPro\ufb01le information in all of our datasets can be represented as a tree where each level encodes\nincreasingly speci\ufb01c information (Figure 2, left). From Google+ we collect data from six categories\n(gender, last name, job titles, institutions, universities, and places lived). From Facebook we collect\ndata from 26 categories, including hometowns, birthdays, colleagues, political af\ufb01liations, etc. For\nTwitter, many choices exist as proxies for user pro\ufb01les; we simply collect data from two categories,\nnamely the set of hashtags and mentions used by each user during two-weeks\u2019 worth of tweets.\n\u2018Categories\u2019 correspond to parents of leaf nodes in a pro\ufb01le tree, as shown in Figure 2.\nWe \ufb01rst describe a difference vector to encode the relationship between two pro\ufb01les. A non-technical\ndescription is given in Figure 2. Suppose that users v \u2208 V each have an associated pro\ufb01le tree Tv,\nand that l \u2208 Tv is a leaf in that tree. We de\ufb01ne the difference vector \u03c3x,y between two users x and y\nas a binary indicator encoding the pro\ufb01le aspects where users x and y differ (Figure 2, top right):\n\n\u03c3x,y[l] = \u03b4((l \u2208 Tx) (cid:54)= (l \u2208 Ty)).\n\n(10)\nNote that feature descriptors are de\ufb01ned per ego-network: while many thousands of high schools\n(for example) exist among all Facebook users, only a small number appear among any particular\nuser\u2019s friends.\nAlthough the above difference vector has the advantage that it encodes pro\ufb01le information at a \ufb01ne\ngranularity, it has the disadvantage that it is high-dimensional (up to 4,122 dimensions in the data\n\n5\n\n\ufb01rst namelast nameworkAlanTuringpositionCryptanalystcompanyGC&CSeducationnameCambridgetypeCollegenamePrincetontypeGraduate School\ufb01rst namelast nameworkDillyKnoxpositionCryptanalystcompanyGC&CSeducationpositionCryptanalystcompanyRoyal NavynameCambridgetypeCollege\f[p] =(cid:80)\n\n\u03c3(cid:48)\n\nwe considered). One way to address this is to form difference vectors based on the parents of leaf\nnodes: this way, we encode what pro\ufb01le categories two users have in common, but disregard speci\ufb01c\nvalues (Figure 2, bottom right). For example, we encode how many hashtags two users tweeted in\ncommon, but discard which hashtags they tweeted:\n\nx,y\n\nl\u2208children(p)\u03c3x,y[l].\n\n(11)\nThis scheme has the advantage that it requires a constant number of dimensions, regardless of the\nsize of the ego-network (26 for Facebook, 6 for Google+, 2 for Twitter, as described above).\nBased on the difference vectors \u03c3x,y (and \u03c3(cid:48)\nx,y) we now describe how to construct edge features\n\u03c6(x, y). The \ufb01rst property we wish to model is that members of circles should have common rela-\ntionships with each other:\n\n(12)\nThe second property we wish to model is that members of circles should have common relationships\nto the ego of the ego-network. In this case, we consider the pro\ufb01le tree Tu from the ego user u. We\nthen de\ufb01ne our features in terms of that user:\n\n\u03c61(x, y) = (1;\u2212\u03c3x,y).\n\n\u03c62(x, y) = (1;\u2212(cid:12)(cid:12)\u03c3x,u \u2212 \u03c3y,u\n\n(cid:12)(cid:12))\n\n(13)\n(|\u03c3x,u \u2212 \u03c3y,u| is taken elementwise). These two parameterizations allow us to assess which mecha-\nnism better captures users\u2019 subjective de\ufb01nition of a circle. In both cases, we include a constant fea-\nture (\u20181\u2019), which controls the probability that edges form within circles, or equivalently it measures\nthe extent to which circles are made up of friends. Importantly, this allows us to predict memberships\neven for users who have no pro\ufb01le information, simply due to their patterns of connectivity.\nSimilarly, for the \u2018compressed\u2019 difference vector \u03c3(cid:48)\n\n\u03c81(x, y) = (1;\u2212\u03c3(cid:48)\n\n(14)\nTo summarize, we have identi\ufb01ed four ways of representing the compatibility between different\naspects of pro\ufb01les for two users. We considered two ways of constructing a difference vector (\u03c3x,y\nvs. \u03c3(cid:48)\n\nx,y) and two ways of capturing the compatibility of a pair of pro\ufb01les (\u03c6(x, y) vs. \u03c8(x, y)).\n\ny,u\n\nx,y\n\n), \u03c82(x, y) = (1;\u2212(cid:12)(cid:12)\u03c3(cid:48)\n\nx,y, we de\ufb01ne\n\nx,u \u2212 \u03c3(cid:48)\n\n(cid:12)(cid:12)).\n\n6 Experiments\n\n(cid:17)\n\n(cid:16)|C\\ \u00afC|\n|C| + | \u00afC\\C|\n| \u00afC|\n\nAlthough our method is unsupervised, we can evaluate it on ground-truth data by examining the\nmaximum-likelihood assignments of the latent circles C = {C1 . . . CK} after convergence. Our\ngoal is that for a properly regularized model, the latent variables will align closely with the human\nlabeled ground-truth circles \u00afC = { \u00afC1 . . . \u00afC \u00afK}.\nEvaluation metrics. To measure the alignment between a predicted circle C and a ground-truth\ncircle \u00afC, we compute the Balanced Error Rate (BER) between the two circles [7], BER(C, \u00afC) =\n. This measure assigns equal importance to false positives and false negatives,\nso that trivial or random predictions incur an error of 0.5 on average. Such a measure is preferable to\nthe 0/1 loss (for example), which assigns extremely low error to trivial predictions. We also report\nthe F1 score, which we \ufb01nd produces qualitatively similar results.\nAligning predicted and ground-truth circles. Since we do not know the correspondence between\ncircles in C and \u00afC, we compute the optimal match via linear assignment by maximizing:\n\n1\n2\n\n(cid:88)\n\nC\u2208dom(f )\n\nmax\nf :C\u2192 \u00afC\n\n1\n|f|\n\n(1 \u2212 BER(C, f(C))),\n\n(15)\n\nwhere f is a (partial) correspondence between C and \u00afC. That is, if the number of predicted circles |C|\nis less than the number of ground-truth circles | \u00afC|, then every circle C \u2208 C must have a match \u00afC \u2208 \u00afC,\nbut if |C| > | \u00afC|, we do not incur a penalty for additional predictions that could have been circles\nbut were not included in the ground-truth. We use established techniques to estimate the number of\ncircles, so that none of the baselines suffers a disadvantage by mispredicting \u02c6K = |C|, nor can any\nmethod predict the \u2018trivial\u2019 solution of returning the powerset of all users. We note that removing the\nbijectivity requirement (i.e., forcing all circles to be aligned by allowing multiple predicted circles\nto match a single groundtruth circle or vice versa) lead to qualitatively similar results.\n\n6\n\n\fFigure 3: Performance on Facebook, Google+, and Twitter, in terms of the Balanced Error Rate\n(top), and the F1 score (bottom). Higher is better. Error bars show standard error. The improvement\nof our best features \u03c61 compared to the nearest competitor are signi\ufb01cant at the 1% level or better.\n\nBaselines. We considered a wide number of baseline methods, including those that consider only\nnetwork structure, those that consider only pro\ufb01le information, and those that consider both. First\nwe experimented with Mixed Membership Stochastic Block Models [2], which consider only net-\nwork information, and variants that also consider text attributes [5, 6, 13]. For each node, mixed-\nmembership models predict a stochastic vector encoding partial circle memberships, which we\nthreshold to generate \u2018hard\u2019 assignments. We also considered Block-LDA [3], where we generate\n\u2018documents\u2019 by treating aspects of user pro\ufb01les as words in a bag-of-words model.\nSecondly, we experimented with classical clustering algorithms, such as K-means and Hierarchical\nClustering [9], that form clusters based only on node pro\ufb01les, but ignore the network. Conversely we\nconsidered Link Clustering [1] and Clique Percolation [21], which use network information, but ig-\nnore pro\ufb01les. We also considered the Low-Rank Embedding approach of [30], where node attributes\nand edge information are projected into a feature space where classical clustering techniques can\nbe applied. Finally we considered Multi-Assignment Clustering [23], which is promising in that it\npredicts hard assignments to multiple clusters, though it does so without using the network.\nOf the eight baselines highlighted above we report the three whose overall performance was the best,\nnamely Block-LDA [3] (which slightly outperformed mixed membership stochastic block models\n[2]), Low-Rank Embedding [30], and Multi-Assignment Clustering [23].\nPerformance on Facebook, Google+, and Twitter Data. Figure 3 shows results on our Facebook,\nGoogle+, and Twitter data. Circles were aligned as described in (eq. 15), with the number of circles\n\u02c6K determined as described in Section 3. For non-probabilistic baselines, we chose \u02c6K so as to\nmaximize the modularity, as described in [20]. In terms of absolute performance our best model\n\u03c61 achieves BER scores of 0.84 on Facebook, 0.72 on Google+ and 0.70 on Twitter (F1 scores are\n0.59, 0.38, and 0.34, respectively). The lower F1 scores on Google+ and Twitter are explained by the\nfact that many circles have not been maintained since they were initially created: we achieve high\nrecall (we recover the friends in each circle), but at low precision (we recover additional friends who\nappeared after the circle was created).\nComparing our method to baselines we notice that we outperform all baselines on all datasets by a\nstatistically signi\ufb01cant margin. Compared to the nearest competitors, our best performing features\n\u03c61 improve on the BER by 43% on Facebook, 26% on Google+, and 16% on Twitter (improvements\nin terms of the F1 score are similar). Regarding the performance of the baseline methods, we\nnote that good performance seems to depend critically on predicting hard memberships to multiple\ncircles, using a combination of node and edge information; none of the baselines exhibit precisely\nthis combination, a shortcoming our model addresses.\nBoth of the features we propose (friend-to-friend features \u03c61 and friend-to-user features \u03c62) perform\nsimilarly, revealing that both schemes ultimately encode similar information, which is not surprising,\n\n7\n\nFacebookGoogle+Twitter0.51.0Accuracy(1-BER).77.72.70.84.72.70Accuracyondetectedcommunities(1-BalancedErrorRate,higherisbetter)multi-assignmentclustering(Streich,Frank,etal.)low-rankembedding(Yoshida)block-LDA(BalasubramanyanandCohen)ourmodel(friend-to-friendfeatures\u03c61,eq.12)ourmodel(friend-to-userfeatures\u03c62,eq.13)ourmodel(compressedfeatures\u03c81,eq.14)ourmodel(compressedfeatures\u03c82,eq.14)FacebookGoogle+Twitter0.01.0Accuracy(F1score).40.38.34.59.38.34Accuracyondetectedcommunities(F1score,higherisbetter)multi-assignmentclustering(Streich,Frank,etal.)low-rankembedding(Yoshida)block-LDA(BalasubramanyanandCohen)ourmodel(friend-to-friendfeatures\u03c61,eq.12)ourmodel(friend-to-userfeatures\u03c62,eq.13)ourmodel(compressedfeatures\u03c81,eq.14)ourmodel(compressedfeatures\u03c82,eq.14)\fFigure 4: Three detected circles on a small ego-network from Facebook, compared to three ground-\ntruth circles (BER (cid:39) 0.81). Blue nodes: true positives. Grey: true negatives. Red: false positives.\nYellow: false negatives. Our method correctly identi\ufb01es the largest circle (left), a sub-circle con-\ntained within it (center), and a third circle that signi\ufb01cantly overlaps with it (right).\n\nFigure 5: Parameter vectors of four communities for a particular Facebook user. The top four plots\nshow \u2018complete\u2019 features \u03c61, while the bottom four plots show \u2018compressed\u2019 features \u03c81 (in both\ncases, BER (cid:39) 0.78). For example the former features encode the fact that members of a particular\ncommunity tend to speak German, while the latter features encode the fact that they speak the same\nlanguage. (Personally identi\ufb01able annotations have been suppressed.)\n\nsince users and their friends have similar pro\ufb01les. Using the \u2018compressed\u2019 features \u03c81 and \u03c82 does\nnot signi\ufb01cantly impact performance, which is promising since they have far lower dimension than\nthe full features; what this reveals is that it is suf\ufb01cient to model categories of attributes that users\nhave in common (e.g. same school, same town), rather than the attribute values themselves.\nWe found that all algorithms perform signi\ufb01cantly better on Facebook than on Google+ or Twitter.\nThere are a few explanations: Firstly, our Facebook data is complete, in the sense that survey partici-\npants manually labeled every circle in their ego-networks, whereas in other datasets we only observe\npublicly-visible circles, which may not be up-to-date. Secondly, the 26 pro\ufb01le categories available\nfrom Facebook are more informative than the 6 categories from Google+, or the tweet-based pro\ufb01les\nwe build from Twitter. A more basic difference lies in the nature of the networks themselves: edges\nin Facebook encode mutual ties, whereas edges in Google+ and Twitter encode follower relation-\nships, which changes the role that circles serve [27]. The latter two points explain why algorithms\nthat use either edge or pro\ufb01le information in isolation are unlikely to perform well on this data.\nQualitative analysis. Finally we examine the output of our model in greater detail. Figure 4 shows\nresults of our method on an example ego-network from Facebook. Different colors indicate true-,\nfalse- positives and negatives. Our method is correctly able to identify overlapping circles as well\nas sub-circles (circles within circles). Figure 5 shows parameter vectors learned for four circles for\na particular Facebook user. Positive weights indicate properties that users in a particular circle have\nin common. Notice how the model naturally learns the social dimensions that lead to a social circle.\nMoreover, the \ufb01rst parameter that corresponds to a constant feature \u20181\u2019 has the highest weight; this\nreveals that membership to the same community provides the strongest signal that edges will form,\nwhile pro\ufb01le data provides a weaker (but still relevant) signal.\nAcknowledgements. This research has been supported in part by NSF IIS-1016909, CNS-1010921,\nIIS-1159679, DARPA XDATA, DARPA GRAPHS, Albert Yu & Mary Bechmann Foundation, Boe-\ning, Allyes, Samsung, Intel, Alfred P. Sloan Fellowship and the Microsoft Faculty Fellowship.\n\n8\n\nfeatureindexfor\u03c61i1weight\u03b81,ipeoplewithPhDslivinginS.F.orStanfordfeatureindexfor\u03c61i1weight\u03b82,iGermanswhowenttoschoolin1997featureindexfor\u03c61i1weight\u03b83,iAmericansfeatureindexfor\u03c61i1weight\u03b84,icollegeeducatedpeopleworkingataparticularinstitutefeatureindexfor\u03c81i1weight\u03b81,istudiedthesamedegreespeakthesamelanguagesfeatureindexfor\u03c81i1weight\u03b82,istudiedthesamedegreefeatureindexfor\u03c81i1weight\u03b83,isamelevelofeducationfeatureindexfor\u03c81i1weight\u03b84,iworkedforthesameemployeratthesametime\fReferences\n[1] Y.-Y. Ahn, J. Bagrow, and S. Lehmann. Link communities reveal multiscale complexity in networks.\n\nNature, 2010.\n\n[2] E. Airoldi, D. Blei, S. Fienberg, and E. Xing. Mixed membership stochastic blockmodels. JMLR, 2008.\n[3] R. Balasubramanyan and W. Cohen. Block-LDA: Jointly modeling entity-annotated text and entity-entity\n\nlinks. In SDM, 2011.\n\n[4] E. Boros and P. Hammer. Pseudo-boolean optimization. Discrete Applied Mathematics, 2002.\n[5] J. Chang and D. Blei. Relational topic models for document networks. In AIStats, 2009.\n[6] J. Chang, J. Boyd-Graber, and D. Blei. Connections between the lines: augmenting social networks with\n\ntext. In KDD, 2009.\n\n[7] Y. Chen and C. Lin. Combining SVMs with various feature selection strategies. Springer, 2006.\n[8] M. Handcock, A. Raftery, and J. Tantrum. Model-based clustering for social networks. Journal of the\n\nRoyal Statistical Society Series A, 2007.\n\n[9] S. Johnson. Hierarchical clustering schemes. Psychometrika, 1967.\n[10] D. Kim, Y. Jo, L.-C. Moon, and A. Oh. Analysis of twitter lists as a potential source for discovering latent\n\ncharacteristics of users. In CHI, 2010.\n\n[11] P. Krivitsky, M. Handcock, A. Raftery, and P. Hoff. Representing degree distributions, clustering, and\n\nhomophily in social networks with latent cluster random effects models. Social Networks, 2009.\n\n[12] P. Lazarsfeld and R. Merton. Friendship as a social process: A substantive and methodological analysis.\n\nIn Freedom and Control in Modern Society. 1954.\n\n[13] Y. Liu, A. Niculescu-Mizil, and W. Gryc. Topic-link LDA: joint models of topic and author community.\n\nIn ICML, 2009.\n\n[14] D. MacKay. Information Theory, Inference and Learning Algorithms. Cambrdige University Press, 2003.\n[15] J. McAuley and J. Leskovec. Discovering social circles in ego networks. arXiv:1210.8182, 2012.\n[16] M. McPherson. An ecology of af\ufb01liation. American Sociological Review, 1983.\n[17] A. Menon and C. Elkan. Link prediction via matrix factorization. In ECML/PKDD, 2011.\n[18] A. Mislove, B. Viswanath, K. Gummadi, and P. Druschel. You are who you know: Inferring user pro\ufb01les\n\nin online social networks. In WSDM, 2010.\n\n[19] P. Nasirifard and C. Hayes. Tadvise: A twitter assistant based on twitter lists. In SocInfo, 2011.\n[20] M. Newman. Modularity and community structure in networks. PNAS, 2006.\n[21] G. Palla, I. Derenyi, I. Farkas, and T. Vicsek. Uncovering the overlapping community structure of complex\n\nnetworks in nature and society. Nature, 2005.\n\n[22] C. Rother, V. Kolmogorov, V. Lempitsky, and M. Szummer. Optimizing binary MRFs via extended roof\n\nduality. In CVPR, 2007.\n\n[23] A. Streich, M. Frank, D. Basin, and J. Buhmann. Multi-assignment clustering for boolean data. JMLR,\n\n2012.\n\n[24] J. Ugander, B. Karrer, L. Backstrom, and C. Marlow. The anatomy of the Facebook social graph. preprint,\n\n2011.\n\n[25] C. Volinsky and A. Raftery. Bayesian information criterion for censored survival models. Biometrics,\n\n2000.\n\n[26] D. Vu, A. Asuncion, D. Hunter, and P. Smyth. Dynamic egocentric models for citation networks.\n\nICML, 2011.\n\nIn\n\n[27] S. Wu, J. Hofman, W. Mason, and D. Watts. Who says what to whom on twitter. In WWW, 2011.\n[28] J. Yang and J. Leskovec. Community-af\ufb01liation graph model for overlapping community detection. In\n\nICDM, 2012.\n\n[29] J. Yang and J. Leskovec. De\ufb01ning and evaluating network communities based on ground-truth. In ICDM,\n\n2012.\n\n[30] T. Yoshida. Toward \ufb01nding hidden communities based on user pro\ufb01les. In ICDM Workshops, 2010.\n[31] J. Zhao. Examining the evolution of networks based on lists in twitter. In IMSAA, 2011.\n\n9\n\n\f", "award": [], "sourceid": 272, "authors": [{"given_name": "Jure", "family_name": "Leskovec", "institution": null}, {"given_name": "Julian", "family_name": "Mcauley", "institution": null}]}