{"title": "Link Prediction in Relational Data", "book": "Advances in Neural Information Processing Systems", "page_first": 659, "page_last": 666, "abstract": "", "full_text": "Link Prediction in Relational Data\n\nBen Taskar Ming-Fai Wong Pieter Abbeel Daphne Koller\n\nfbtaskar, mingfai.wong, abbeel, kollerg@cs.stanford.edu\n\nStanford University\n\nAbstract\n\nMany real-world domains are relational in nature, consisting of a set of objects\nrelated to each other in complex ways. This paper focuses on predicting the\nexistence and the type of links between entities in such domains. We apply the\nrelational Markov network framework of Taskar et al. to de\ufb01ne a joint probabilis-\ntic model over the entire link graph \u2014 entity attributes and links. The application\nof the RMN algorithm to this task requires the de\ufb01nition of probabilistic patterns\nover subgraph structures. We apply this method to two new relational datasets,\none involving university webpages, and the other a social network. We show that\nthe collective classi\ufb01cation approach of RMNs, and the introduction of subgraph\npatterns over link labels, provide signi\ufb01cant improvements in accuracy over \ufb02at\nclassi\ufb01cation, which attempts to predict each link in isolation.\n\nIntroduction\n\n1\nMany real world domains are richly structured, involving entities of multiple types that\nare related to each other through a network of different types of links. Such data poses\nnew challenges to machine learning. One challenge arises from the task of predicting\nwhich entities are related to which others and what are the types of these relationships. For\nexample, in a data set consisting of a set of hyperlinked university webpages, we might\nwant to predict not just which page belongs to a professor and which to a student, but also\nwhich professor is which student\u2019s advisor. In some cases, the existence of a relationship\nwill be predicted by the presence of a hyperlink between the pages, and we will have only\nto decide whether the link re\ufb02ects an advisor-advisee relationship. In other cases, we might\nhave to infer the very existence of a link from indirect evidence, such as a large number\nof co-authored papers.\nIn a very different application, we might want to predict links\nrepresenting participation of individuals in certain terrorist activities.\n\nOne possible approach to this task is to consider the presence and/or type of the link\nusing only attributes of the potentially linked entities and of the link itself. For example,\nin our university example, we might try to predict and classify the link using the words on\nthe two webpages, and the anchor words on the link (if present). This approach has the\nadvantage that it reduces to a simple classi\ufb01cation task and we can apply standard machine\nlearning techniques. However, it completely ignores a rich source of information that is\nunique to this task \u2014 the graph structure of the link graph. For example, a strong predictor\nof an advisor-advisee link between a professor and a student is the fact that they jointly\nparticipate in several projects. In general, the link graph typically re\ufb02ects common patterns\nof interactions between the entities in the domain. Taking these patterns into consideration\nshould allow us to provide a much better prediction for links.\n\nIn this paper, we tackle this problem using the relational Markov network (RMN) frame-\nwork of Taskar et al. [14]. We use this framework to de\ufb01ne a single probabilistic model\nover the entire link graph, including both object labels (when relevant) and links between\n\n\fobjects. The model parameters are trained discriminatively, to maximize the probability\nof the (object and) link labels given the known attributes (e.g., the words on the page, hy-\nperlinks). The learned model is then applied, using probabilistic inference, to predict and\nclassify links using any observed attributes and links.\n\n2 Link Prediction\nA relational domain is described by a relational schema, which speci\ufb01es a set of object\ntypes and attributes for them. In our web example, we have a Webpage type, where each\npage has a binary-valued attribute for each word in the dictionary, denoting whether the\npage contains the word. It also has an attribute representing the \u201cclass\u201d of the webpage,\ne.g., a professor\u2019s homepage, a student\u2019s homepage, etc.\n\nTo address the link prediction problem, we need to make links \ufb01rst-class citizens in our\nmodel. Following [5], we introduce into our schema object types that correspond to links\nbetween entities. Each link object \u2018 is associated with a tuple of entity objects (o1; : : : ; ok)\nthat participate in the link. For example, a Hyperlink link object would be associated with\na pair of entities \u2014 the linking page, and the linked-to page, which are part of the link\nde\ufb01nition. We note that link objects may also have other attributes; e.g., a hyperlink object\nmight have attributes for the anchor words on the link.\n\nAs our goal is to predict link existence, we must consider links that exist and links that\ndo not. We therefore consider a set of potential links between entities. Each potential link\nis associated with a tuple of entity objects, but it may or may not actually exist. We denote\nthis event using a binary existence attribute Exists, which is true if the link between the\nassociated entities exists and false otherwise. In our example, our model may contain a\npotential link \u2018 for each pair of webpages, and the value of the variable \u2018:Exists determines\nwhether the link actually exists or not. The link prediction task now reduces to the problem\nof predicting the existence attributes of these link objects.\n\nAn instantiation I speci\ufb01es the set of entities of each entity type and the values of all\nattributes for all of the entities. For example, an instantiation of the hypertext schema is\na collection of webpages, specifying their labels, the words they contain, and which links\nbetween them exist. A partial instantiation speci\ufb01es the set of objects, and values for some\nof the attributes. In the link prediction task, we might observe all of the attributes for all\nof the objects, except for the existence attributes for the links. Our goal is to predict these\nlatter attributes given the rest.\n\n3 Relational Markov Networks\nWe begin with a brief review of the framework of undirected graphical models or Markov\nNetworks [13], and their extension to relational domains presented in [14].\n\nLet V denote a set of discrete random variables and v an assignment of values to V.\nA Markov network for V de\ufb01nes a joint distribution over V. It consists of an undirected\ndependency graph, and a set of parameters associated with the graph. For a graph G, a\nclique c is a set of nodes Vc in G, not necessarily maximal, such that each Vi; Vj 2 Vc\nare connected by an edge in G. Each clique c is associated with a clique potential (cid:30)c(Vc),\nwhich is a non-negative function de\ufb01ned on the joint domain of Vc. Letting C(G) be the\nset of cliques, the Markov network de\ufb01nes the distribution P (v) = 1\nZ Qc2C(G) (cid:30)c(vc),\nwhere Z is the standard normalizing partition function.\n\nA relational Markov network (RMN) [14] speci\ufb01es the cliques and potentials between\nattributes of related entities at a template level, so a single model provides a coherent distri-\nbution for any collection of instances from the schema. RMNs specify the cliques using the\nnotion of a relational clique template, which specify tuples of variables in the instantiation\nusing a relational query language. (See [14] for details.)\n\nFor example, if we want to de\ufb01ne cliques between the class labels of linked pages,\nwe might de\ufb01ne a clique template that applies to all pairs page1,page2 and link of types\n\n\fWebpage, Webpage and Hyperlink, respectively, such that link points from page1 to\npage2. We then de\ufb01ne a potential template that will be used for all pairs of variables\npage1.Category and page2.Category for such page1 and page2.\n\nGiven a particular instantiation I of the schema, the RMN M produces an unrolled\nMarkov network over the attributes of entities in I, in the obvious way. The cliques in the\nunrolled network are determined by the clique templates C. We have one clique for each\nc 2 C(I), and all of these cliques are associated with the same clique potential (cid:30)C.\n\nTaskar et al. show how the parameters of an RMN over a \ufb01xed set of clique templates\ncan be learned from data. In this case, the training data is a single instantiation I, where\nthe same parameters are used multiple times \u2014 once for each different entity that uses\na feature. A choice of clique potential parameters w speci\ufb01es a particular RMN, which\ninduces a probability distribution Pw over the unrolled Markov network.\n\nGradient descent over w is used to optimize the conditional likelihood of the target vari-\nables given the observed variables in the training set. The gradient involves a term which\nis the posterior probability of the target variables given the observed, whose computation\nrequires that we run probabilistic inference over the entire unrolled Markov network. In\nrelational domains, this network is typically large and densely connected, making exact\ninference intractable. Taskar et al. therefore propose the use of belief propagation [13, 17].\n\n4 Subgraph Templates in a Link Graph\n\nThe structure of link graphs has been widely used to infer importance of documents in\nscienti\ufb01c publications [4] and hypertext (PageRank [12], Hubs and Authorities [8]). Social\nnetworks have been extensively analyzed in their own right in order to quantify trends in\nsocial interactions [16]. Link graph structure has also been used to improve document\nclassi\ufb01cation [7, 6, 15].\n\nIn our experiments, we found that the combination of a relational language with a prob-\nabilistic graphical model provides a very \ufb02exible framework for modeling complex patterns\ncommon in relational graphs. First, as observed by Getoor et al. [5], there are often cor-\nrelations between the attributes of entities and the relations in which they participate. For\nexample, in a social network, people with the same hobby are more likely to be friends.\n\nWe can also exploit correlations between the labels of entities and the relation type. For\nexample, only students can be teaching assistants in a course. We can easily capture such\ncorrelations by introducing cliques that involve these attributes. Importantly, these cliques\nare informative even when attributes are not observed in the test data. For example, if we\nhave evidence indicating an advisor-advisee relationship, our probability that X is a faculty\nmember increases, and thereby our belief that X participates in a teaching assistant link\nwith some entity Z decreases.\n\nWe also found it useful to consider richer subgraph templates over the link graph. One\nuseful type of template is a similarity template, where objects that share a certain graph-\nbased property are more likely to have the same label. Consider, for example, a professor\nX and two other entities Y and Z. If X\u2019s webpage mentions Y and Z in the same context, it\nis likely that the X-Y relation and the Y-Z relation are of the same type; for example, if Y\nis Professor X\u2019s advisee, then probably so is Z. Our framework accomodates these patterns\neasily, by introducing pairwise cliques between the appropriate relation variables.\n\nAnother useful type of subgraph template involves transitivity patterns, where the pres-\nence of an A-B link and of a B-C link increases (or decreases) the likelihood of an A-C link.\nFor example, students often assist in courses taught by their advisor. Note that this type\nof interaction cannot be accounted for just using pairwise cliques. By introducing cliques\nover triples of relations, we can capture such patterns as well. We can incorporate even\nmore complicated patterns, but of course we are limited by the ability of belief propagation\nto scale up as we introduce larger cliques and tighter loops in the Markov network.\n\nWe note that our ability to model these more complex graph patterns relies on our use\n\n\f0.95\n\n0.9\n\ny\nc\na\nr\nu\nc\nc\nA\n\n0.85\n\n0.8\n\n0.75\n\n0.7\n\nFlat\nTriad\nSection\nSection & Triad\n\nFlat\nNeigh\n\ny\nc\na\nr\nu\nc\nc\nA\n\n0.85\n\n0.8\n\n0.75\n\n0.7\n\n0.65\n\n0.6\n\nber\n\nmit\n\nsta\n\nave\n\nber\n\nmit\n\nsta\n\nave\n\n \n\ni\n\nt\nn\no\nP\nn\ne\nv\ne\nk\na\ne\nr\nB\nR\nP\n\n \n\n/\n\n0.75\n\n0.7\n\n0.65\n\n0.6\n\n0.55\n\n0.5\n\n0.45\n\nPhased (Flat/Flat)\nPhased (Neigh/Flat)\nPhased (Neigh/Sec)\nJoint+Neigh\nJoint+Neigh+Sec\n\nber\n\nmit\n\nsta\n\nave\n\n(a)\n\n(b)\n\n(c)\n\nFigure 1: (a) Relation prediction with entity labels given. Relational models on average performed\nbetter than the baseline Flat model. (b) Entity label prediction. Relational model Neigh performed\nsigni\ufb01cantly better. (c) Relation prediction without entity labels. Relational models performed better\nmost of the time, even though there are schools that some models performed worse.\n\nof an undirected Markov network as our probabilistic model. In contrast, the approach of\nGetoor et al. uses directed graphical models (Bayesian networks and PRMs [9]) to repre-\nsent a probabilistic model of both relations and attributes. Their approach easily captures\nthe dependence of link existence on attributes of entities. But the constraint that the prob-\nabilistic dependency graph be a directed acyclic graph makes it hard to see how we would\nrepresent the subgraph patterns described above. For example, for the transitivity pattern,\nwe might consider simply directing the correlation edges between link existence variables\narbitrarily. However, it is not clear how we would then parameterize a link existence vari-\nable for a link that is involve in multiple triangles. See [15] for further discussion.\n\n5 Experiments on Web Data\n\nWe collected and manually labeled a new relational dataset inspired by WebKB [2]. Our\ndataset consists of Computer Science department webpages from 3 schools: Stanford,\nBerkeley, and MIT. A total of 2954 of pages are labeled into one of eight categories: faculty,\nstudent, research scientist, staff, research group, research project, course and organization\n(organization refers to any large entity that is not a research group). Owned pages, which\nare owned by an entity but are not the main page for that entity, were manually assigned to\nthat entity. The average distribution of classes across schools is: organization (9%), student\n(40%), research group (8%), faculty (11%), course (16%), research project (7%), research\nscientist (5%), and staff (3%).\n\nWe established a set of candidate links between entities based on evidence of a relation\nbetween them. One type of evidence for a relation is a hyperlink from an entity page or one\nof its owned pages to the page of another entity. A second type of evidence is a virtual\nlink: We assigned a number of aliases to each page using the page title, the anchor text of\nincoming links, and email addresses of the entity involved. Mentioning an alias of a page\non another page constitutes a virtual link. The resulting set of 7161 candidate links were\nlabeled as corresponding to one of \ufb01ve relation types \u2014 Advisor (faculty, student), Mem-\nber (research group/project, student/faculty/research scientist), Teach (faculty/research sci-\nentist/staff, course), TA (student, course), Part-Of (research group, research proj) \u2014 or\n\u201cnone\u201d, denoting that the link does not correspond to any of these relations.\n\nThe observed attributes for each page are the words on the page itself and the \u201cmeta-\nwords\u201d on the page \u2014 the words in the title, section headings, anchors to the page from\nother pages. For links, the observed attributes are the anchor text, text just before the link\n(hyperlink or virtual link), and the heading of the section in which the link appears.\n\nOur task is to predict the relation type, if any, for all the candidate links. We tried two\nsettings for our experiments: with page categories observed (in the test data) and page\ncategories unobserved. For all our experiments, we trained on two schools and tested on\n\n\fthe remaining school.\nObserved Entity Labels. We \ufb01rst present results for the setting with observed page cat-\negories. Given the page labels, we can rule out many impossible relations; the resulting\nlabel breakdown among the candidate links is: none (38%), member (34%), part-of (4%),\nadvisor (11%), teach (9%), TA (5%).\n\nThere is a huge range of possible models that one can apply to this task. We selected a\n\nset of models that we felt represented some range of patterns that manifested in the data.\n\nLink-Flat is our baseline model, predicting links one at a time using multinomial lo-\ngistic regression. This is a strong classi\ufb01er, and its performance is competitive with other\nclassi\ufb01ers (e.g., support vector machines). The features used by this model are the labels of\nthe two linked pages and the words on the links going from one page and its owned pages\nto the other page. The number of features is around 1000.\n\nThe relational models try to improve upon the baseline model by modeling the interac-\ntions between relations and predicting relations jointly. The Section model introduces\ncliques over relations whose links appear consecutively in a section on a page. This\nmodel tries to capture the pattern that similarly related entities (e.g., advisees, members\nof projects) are often listed together on a webpage. This pattern is a type of similarity\ntemplate, as described in Section 4. The Triad model is a type of transitivity template, as\ndiscussed in Section 4. Speci\ufb01cally, we introduce cliques over sets of three candidate links\nthat form a triangle in the link graph. The Section + Triad model includes the cliques of\nthe two models above.\n\nAs shown in Fig. 1(a), both the Section and Triad models outperform the \ufb02at model, and\nthe combined model has an average accuracy gain of 2:26%, or 10:5% relative reduction in\nerror. As we only have three runs (one for each school), we cannot meaningfully analyze\nthe statistical signi\ufb01cance of this improvement.\n\nAs an example of the interesting inferences made by the models, we found a student-\nprofessor pair that was misclassi\ufb01ed by the Flat model as none (there is only a single\nhyperlink from the student\u2019s page to the advisor\u2019s) but correctly identi\ufb01ed by both the Sec-\ntion and Triad models. The Section model utilizes a paragraph on the student\u2019s webpage\ndescribing his research, with a section of links to his research groups and the link to his\nadvisor. Examining the parameters of the Section model clique, we found that the model\nlearned that it is likely for people to mention their research groups and advisors in the same\nsection. By capturing this trend, the Section model is able to increase the con\ufb01dence of the\nstudent-advisor relation. The Triad model corrects the same misclassi\ufb01cation in a different\nway. Using the same example, the Triad model makes use of the information that both the\nstudent and the teacher belong to the same research group, and the student TAed a class\ntaught by his advisor. It is important to note that none of the other relations are observed in\nthe test data, but rather the model bootstraps its inferences.\nUnobserved Entity Labels. When the labels of pages are not known during relations\nprediction, we cannot rule out possible relations for candidate links based on the labels of\nparticipating entities. Thus, we have many more candidate links that do not correspond to\nany of our relation types (e.g., links between an organization and a student). This makes the\nexistence of relations a very low probability event, with the following breakdown among\nthe potential relations: none (71%), member (16%), part-of (2%), advisor (5%), teach (4%),\nTA (2%). In addition, when we construct a Markov network in which page labels are not\nobserved, the network is much larger and denser, making the (approximate) inference task\nmuch harder. Thus, in addition to models that try to predict page entity and relation labels\nsimultaneously, we also tried a two-phase approach, where we \ufb01rst predict page categories,\nand then use the predicted labels as features for the model that predicts relations.\n\nFor predicting page categories, we compared two models. Entity-Flat model is multi-\nnomial logistic regression that uses words and \u201cmeta-words\u201d from the page and its owned\npages in separate \u201cbags\u201d of words. The number of features is roughly 10; 000. The Neigh-\nbors model is a relational model that exploits another type of similarity template: pages\n\n\fflat\ncompatibility\n\n0.75\n\n0.7\n\n0.65\n\n0.6\n\n0.55\n\n0.5\n\n0.45\n\n0.4\n\ni\n\nt\nn\no\np\nn(cid:0)\ne\nv\ne\nk\na\ne\nr\nb\nr\n/\np\ne(cid:0)\nv\na\n\nflat\ncompatibility\n\n0.75\n\n0.7\n\n0.65\n\n0.6\n\n0.55\n\n0.5\n\n0.45\n\n0.4\n\ni\n\nt\nn\no\np\nn(cid:0)\ne\nv\ne\nk\na\ne\nr\nb\nr\n/\np\ne(cid:0)\nv\na\n\n10%(cid:0)observed\n\n25% (cid:0)observed\n\n50% (cid:0)observed\n\nDD\n\nJL\n\nTX\n\n67\n\nFG LM BC SS\n\n(a)\n\n(b)\n\nFigure 2: (a) Average precision/recall breakeven point for 10%, 25%, 50% observed links.\nAverage precision/recall breakeven point for each fold of school residences at 25% observed links.\n\n(b)\n\nwith similar urls often belong to the same category or tightly linked categories (research\ngroup/project, professor/course). For each page, two pages with urls closest in edit dis-\ntance are selected as \u201cneighbors\u201d, and we introduced pairwise cliques between \u201cneighbor-\ning\u201d pages. Fig. 1(b) shows that the Neighbors model clearly outperforms the Flat model\nacross all schools, by an average of 4:9% accuracy gain.\n\nGiven the page categories, we can now apply the different models for link classi\ufb01ca-\ntion. Thus, the Phased (Flat/Flat) model uses the Entity-Flat model to classify the page\nlabels, and then the Link-Flat model to classify the candidate links using the resulting en-\ntity labels. The Phased (Neighbors/Flat) model uses the Neighbors model to classify\nthe entity labels, and then the Link-Flat model to classify the links. The Phased (Neigh-\nbors/Section) model uses the Neighbors to classify the entity labels and then the Section\nmodel to classify the links.\n\nWe also tried two models that predict page and relation labels simultaneously. The\nJoint + Neighbors model is simply the union of the Neighbors model for page categories\nand the Flat model for relation labels given the page categories. The Joint + Neighbors\n+ Section model additionally introduces the cliques that appeared in the Section model\nbetween links that appear consecutively in a section on a page. We train the joint models\nto predict both page and relation labels simultaneously.\n\nAs the proportion of the \u201cnone\u201d relation is so large, we use the probability of \u201cnone\u201d to\nde\ufb01ne a precision-recall curve. If this probability is less than some threshold, we predict\nthe most likely label (other than none), otherwise we predict the most likely label (includ-\ning none). As usual, we report results at the precision-recall breakeven point on the test\ndata. Fig. 1(c) show the breakeven points achieved by the different models on the three\nschools. Relational models, both phased and joint, did better than \ufb02at models on the av-\nerage. However, performance varies from school to school and for both joint and phased\nmodels, performance on one of the schools is worse than that of the \ufb02at model.\n\n6 Experiments on Social Network Data\n\nThe second dataset we used has been collected by a portal website at a large university that\nhosts an online community for students [1]. Among other services, it allows students to\nenter information about themselves, create lists of their friends and browse the social net-\nwork. Personal information includes residence, gender, major and year, as well as favorite\nsports, music, books, social activities, etc. We focused on the task of predicting the \u201cfriend-\nship\u201d links between students from their personal information and a subset of their links. We\nselected students living in sixteen different residences or dorms and restricted the data to\nthe friendship links only within each residence, eliminating inter-residence links from the\ndata to generate independent training/test splits. Each residence has about 15\u201325 students\nand an average student lists about 25% of his or her house-mates as friends.\n\nWe used an eight-fold train-test split, where we trained on fourteen residences and tested\non two. Predicting links between two students from just personal information alone is a\n\n(cid:0)\n(cid:0)\n\fvery dif\ufb01cult task, so we tried a more realistic setting, where some proportion of the links\nis observed in the test data, and can be used as evidence for predicting the remaining links.\nWe used the following proportions of observed links in the test data: 10%, 25%, and 50%.\nThe observed links were selected at random, and the results we report are averaged over\n\ufb01ve folds of these random selection trials.\n\nUsing just the observed portion of links, we constructed the following \ufb02at features: for\neach student, the proportion of students in the residence that list him/her and the proportion\nof students he/she lists; for each pair of students, the proportion of other students they have\nas common friends. The values of the proportions were discretized into four bins. These\nfeatures capture some of the relational structure and dependencies between links: Students\nwho list (or are listed by) many friends in the observed portion of the links tend to have links\nin the unobserved portion as well. More importantly, having friends in common increases\nthe likelihood of a link between a pair of students.\n\nThe Flat model uses logistic regression with the above features as well as personal\ninformation about each user. In addition to individual characteristics of the two people, we\nalso introduced a feature for each match of a characteristic, for example, both people are\ncomputer science majors or both are freshmen.\n\nThe Compatibility model uses a type of similarity template, introducing cliques be-\ntween each pair of links emanating from each person. Similarly to the Flat model, these\ncliques include a feature for each match of the characteristics of the two potential friends.\nThis model captures the tendency of a person to have friends who share many character-\nistics (even though the person might not possess them). For example, a student may be\nfriends with several CS majors, even though he is not a CS major himself. We also tried\nmodels that used transitivity templates, but the approximate inference with 3-cliques often\nfailed to converge or produced erratic results.\n\nFig. 2(a) compares the average precision/recall breakpoint achieved by the different\nmodels at the three different settings of observed links. Fig. 2(b) shows the performance\non each of the eight folds containing two residences each. Using a paired t-test, the Com-\npatibility model outperforms Flat with p-values 0:0036, 0:00064 and 0:054 respectively.\n\n7 Discussion and Conclusions\nIn this paper, we consider the problem of link prediction in relational domains. We focus\non the task of collective link classi\ufb01cation, where we are simultaneously trying to predict\nand classify an entire set of links in a link graph. We show that the use of a probabilistic\nmodel over link graphs allows us to represent and exploit interesting subgraph patterns in\nthe link graph. Speci\ufb01cally, we have found two types of patterns that seem to be bene\ufb01cial\nin several places. Similarity templates relate the classi\ufb01cation of links or objects that share\na certain graph-based property (e.g., links that share a common endpoint). Transitivity\ntemplates relate triples of objects and links organized in a triangle. We show that the use of\nthese patterns signi\ufb01cantly improve the classi\ufb01cation accuracy over \ufb02at models.\n\nRelational Markov networks are not the only method one might consider applying to the\nlink prediction and classi\ufb01cation task. We could, for example, build a link predictor that\nconsiders other links in the graph by converting graph features into \ufb02at features [11], as\nwe did in the social network data. As our experiments show, even with these features, the\ncollective prediction approach work better. Another approach is to use relational classi\ufb01ers\nsuch as variants of inductive logic programming [10]. Generally, however, these methods\nhave been applied to the problem of predicting or classifying a single link at a time. It is\nnot clear how well they would extend to the task of simultaneously predicting an entire link\ngraph. Finally, we could apply the directed PRM framework of [5]. However, as shown\nin [15], the discriminatively trained RMNs perform signi\ufb01cantly better than generatively\ntrained PRMs even on the simpler entity classi\ufb01cation task. Furthermore, as we discussed,\nthe PRM framework cannot represent (in any natural way) the type of subgraph patterns\nthat seem prevalent in link graph data. Therefore, the RMN framework seems much more\n\n\fappropriate for this task.\n\nAlthough the RMN framework worked fairly well on this task, there is signi\ufb01cant room\nfor improvement. One of the key problems limiting the applicability of approach is the\nreliance on belief propagation, which often does not converge in more complex problems.\nThis problem is especially acute in the link prediction problem, where the presence of all\npotential links leads to densely connected Markov networks with many short loops. This\nproblem can be addressed with heuristics that focus the search on links that are plausible\n(as we did in a very simple way in the webpage experiments). A more interesting solution\nwould be to develop a more integrated approximate inference / learning algorithm.\n\nOur results use a set of relational patterns that we have discovered to be useful in the\ndomains that we have considered. However, many other rich and interesting patterns are\npossible. Thus, in the relational setting, even more so than in simpler tasks, the issue of\nfeature construction is critical. It is therefore important to explore the problem of automatic\nfeature induction, as in [3].\n\nFinally, we believe that the problem of modeling link graphs has numerous other ap-\nplications, including: analyzing communities of people and hierarchical structure of orga-\nnizations, identifying people or objects that play certain key roles, predicting current and\nfuture interactions, and more.\nAcknowledgments. This work was supported by ONR Contract F3060-01-2-0564-P00002\nunder DARPA\u2019s EELD program. P. Abbeel was supported by a Siebel Grad. Fellowship.\n\nReferences\n[1] L. Adamic, O. Buyukkokten, and E. Adar.\nhttp://www.hpl.hp.com/shl/papers/social/, 2002.\n\nA social network caught\n\nin the web.\n\n[2] M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam, and S. Slattery.\n\nLearning to extract symbolic knowledge from the world wide web. In Proc. AAAI, 1998.\n\n[3] S. Della Pietra, V. Della Pietra, and J. Lafferty. Inducing features of random \ufb01elds. IEEE Trans.\n\non Pattern Analysis and Machine Intelligence, 19(4):380\u2013393, 1997.\n\n[4] L. Egghe and R. Rousseau. Introduction to Informetrics. Elsevier, 1990.\n[5] L. Getoor, N. Friedman, D. Koller, and B. Taskar. Probabilistic models of relational structure.\n\nIn Proc. ICML, 2001.\n\n[6] L. Getoor, E. Segal, B. Taskar, and D. Koller. Probabilistic models of text and link structure for\n\nhypertext classi\ufb01cation. In IJCAI Workshop on Text Learning: Beyond Supervision, 2001.\n\n[7] R. Ghani, S. Slattery, and Y. Yang. Hypertext categorization using hyperlink patterns and meta\n\ndata. In Proc ICML, 2001.\n\n[8] J. M. Kleinberg. Authoritative sources in a hyperlinked environment. JACM, 46(5):604\u2013632,\n\n1999.\n\n[9] D. Koller and A. Pfeffer. Probabilistic frame-based systems. In Proc. AAAI98, pages 580\u2013587,\n\n1998.\n\n[10] Nada Lavra\u02d8c and Saso D\u02d8zeroski. Inductive Logic Programming: Techniques and Applications.\n\nEllis Horwood, 1994.\n\n[11] J. Neville and D. Jensen. Iterative classi\ufb01cation in relational data. In AAAI Workshop on Learn-\n\ning Statistical Models from Relational Data, 2000.\n\n[12] L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order\n\nto the web. Technical report, Stanford University, 1998.\n\n[13] J. Pearl. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, 1988.\n[14] B. Taskar, P. Abbeel, and D. Koller. Discriminative probabilistic models for relational data. In\n\nProc. UAI, 2002.\n\n[15] B. Taskar, E. Segal, and D. Koller. Probabilistic classi\ufb01cation and clustering in relational data.\n\nIn Proc. IJCAI, pages 870\u2013876, 2001.\n\n[16] S. Wasserman and P. Pattison. Logit models and logistic regression for social networks. Psy-\n\nchometrika, 61(3):401\u2013425, 1996.\n\n[17] J. Yedidia, W. Freeman, and Y. Weiss. Generalized belief propagation. In Proc. NIPS, 2000.\n\n\f", "award": [], "sourceid": 2465, "authors": [{"given_name": "Ben", "family_name": "Taskar", "institution": null}, {"given_name": "Ming-fai", "family_name": "Wong", "institution": null}, {"given_name": "Pieter", "family_name": "Abbeel", "institution": null}, {"given_name": "Daphne", "family_name": "Koller", "institution": null}]}