{"title": "The Mondrian Process", "book": "Advances in Neural Information Processing Systems", "page_first": 1377, "page_last": 1384, "abstract": "We describe a novel stochastic process that can be used to construct a multidimensional generalization of the stick-breaking process and which is related to the classic stick breaking process described by Sethuraman1994 in one dimension. We describe how the process can be applied to relational data modeling using the de Finetti representation for infinitely and partially exchangeable arrays.", "full_text": "The Mondrian Process\n\nDaniel M. Roy\n\nMassachusetts Institute of Technology\n\nYee Whye Teh\n\nGatsby Unit, University College London\n\ndroy@mit.edu\n\nywteh@gatsby.ucl.ac.uk\n\nAbstract\n\nWe describe a novel class of distributions, called Mondrian processes, which\ncan be interpreted as probability distributions over kd-tree data structures. Mon-\ndrian processes are multidimensional generalizations of Poisson processes and this\nconnection allows us to construct multidimensional generalizations of the stick-\nbreaking process described by Sethuraman (1994), recovering the Dirichlet pro-\ncess in one dimension. After introducing the Aldous-Hoover representation for\njointly and separately exchangeable arrays, we show how the process can be used\nas a nonparametric prior distribution in Bayesian models of relational data.\n\n1 Introduction\n\nRelational data are observations of relationships between sets of objects and it is therefore natural\nto consider representing relations1 as arrays of random variables, e.g., (Ri,j), where i and j index\nobjects xi \u2208 X and yj \u2208 Y . Nonrelational data sets (e.g., observations about individual objects in\nX) are simply one-dimensional arrays (Ri) from this viewpoint.\nA common Bayesian approach in the one-dimensional setting is to assume there is cluster structure\nand use a mixture model with a prior distribution over partitions of the objects in X. A similar\napproach for relational data would na\u00a8\u0131vely require a prior distribution on partitions of the product\nspace X \u00d7 Y = {(x, y) | x \u2208 X, y \u2208 Y }. One choice is to treat each pair (x, y) atomically,\nclustering the product space directly, e.g., by placing a Chinese restaurant process (CRP) prior on\npartitions of X \u00d7 Y . An unsatisfactory implication of this choice is that the distribution on partitions\nof (Ri,j) is exchangeable, i.e., invariant to swapping any two entries; this implies that the identity\nof objects is ignored when forming the partition, violating common sense.\nStochastic block models2 place prior distributions on partitions of X and Y separately, which can be\ninterpreted as inducing a distribution on partitions of the product space by considering the product of\nthe partitions. By arranging the rows and columns of (Ri,j) so that clustered objects have adjacent\nindices, such partitions look like regular grids (Figure 1.1). An unfortunate side effect of this form\nof prior is that the \u201cresolution\u201d needed to model \ufb01ne detail in one area of the array necessarily\ncauses other parts of the array to be dissected, even if the data suggest there is no such structure.\nThe annotated hierarchies described by Roy et al. (2007) generate random partitions which are not\nconstrained to be regular grids (Figure 1.2), but the prior is inconsistent in light of missing data.\nMotivated by the need for a consistent distribution on partitions of product spaces with more struc-\nture than classic block models, we de\ufb01ne a class of nonparametric distributions we have named\nMondrian processes after Piet Mondrian and his abstract grid-based paintings. Mondrian processes\nare random partitions on product spaces not constrained to be regular grids. Much like kd-trees,\nMondrian processes partition a space with nested, axis-aligned cuts; see Figure 1.3 for examples.\nWe begin by introducing the notion of partially exchangeable arrays by Aldous (1981) and Hoover\n(1979), a generalization of exchangeability on sequences appropriate for modeling relational data.\n\n1We consider binary relations here but the ideas generalize easily to multidimensional relations.\n2Holland et al. (1983) introduced stochastic block models. Recent variations (Kemp et al., 2006; Xu et al.,\n\n2006; Roy et al., 2007) descend from Wasserman and Anderson (1987) and Nowicki and Snijders (2001).\n\n1\n\n\fWe then de\ufb01ne the Mondrian process, highlight a few of its elegant properties, and describe two\nnonparametric models for relational data that use the Mondrian process as a prior on partitions.\n\n2 Exchangeable Relational Data\n\nThe notion of exchangeability3, that the probability of a sequence of data items does not depend on\nthe ordering of the items, has played a central role in hierarchical Bayesian modeling (Bernardo and\nSmith, 1994). A classic result by de Finetti (1931), later extended by Ryll-Nardzewski (1957), states\nthat if x1, x2, ... is an exchangeable sequence, then there exists a random parameter \u03b8 such that the\nsequence is conditionally iid given \u03b8:\n\nZ\n\nnY\n\np(x1, ..., xn) =\n\np\u03b8(\u03b8)\n\npx(xi|\u03b8)d\u03b8\n\n(1)\n\ni=1\n\nThat is, exchangeable sequences arise as a mixture of iid sequences, where the mixing distribution\nis p(\u03b8). The notion of exchangeability has been generalized to a wide variety of settings. In this\nsection we describe notions of exchangeability for relational data originally proposed by Aldous\n(1981) and Hoover (1979) in the context of exchangeable arrays. Kallenberg (2005) signi\ufb01cantly\nexpanded on the concept, and Diaconis and Janson (2007) showed a strong correspondence between\nsuch exchangeable relations and a notion of limits on graph structures (Lov\u00b4asz and Szegedy, 2006).\nHere we shall only consider binary relations\u2014those involving pairs of objects. Generalizations to\nrelations with arbitrary arity can be gleaned from Kallenberg (2005). For i, j = 1, 2, ... let Ri,j\ndenote a relation between two objects xi \u2208 X and yj \u2208 Y from possibly distinct sets X and Y . We\nsay that R is separately exchangeable if its distribution is invariant to separate permutations on its\nrows and columns. That is, for each n, m \u2265 1 and each pair of permutations \u03c0 \u2208 Sn and \u03c3 \u2208 Sm,\n(2)\n\np(R1:n,1:m) = p(R\u03c0(1:n),\u03c3(1:m))\n\nin MATLAB notation. Aldous (1981) and Hoover (1979) showed that separately exchangeable\nrelations can always be represented in the following way: each object i (and j) has a latent represen-\ntation \u03bei (\u03b7j) drawn iid from some distribution p\u03be (p\u03b7); independently let \u03b8 be an additional random\nparameter. Then,\n\nZ\n\np\u03b8(\u03b8)Y\n\np\u03be(\u03bei)Y\n\np\u03b7(\u03b7j)Y\n\np(R1:n,1:m) =\n\npR(Ri,j|\u03b8, \u03bei, \u03b7j)d\u03b8d\u03be1:nd\u03b71:m\n\n(3)\n\ni\n\nj\n\ni,j\n\nAs opposed to (1), the variables \u03bei and \u03b7j capture additional dependencies speci\ufb01c to each row and\ncolumn. If the two sets of objects are in fact the same, i.e. X = Y , then the relation R is a square\narray. We say R is jointly exchangeable if it is invariant to jointly permuting rows and columns; that\nis, for each n \u2265 1 and each permutation \u03c0 \u2208 Sn we have\n\np(R1:n,1:n) = p(R\u03c0(1:n),\u03c0(1:n))\n\n(4)\n\nSuch jointly exchangeable relations also have a form similar to (3). The differences are that we have\none latent variable \u03bei for to each object xi, and that Ri,j, Rj,i need not be independent anymore:\n\np(R1:n,1:n) =\n\npR(Ri,j, Rj,i|\u03b8, \u03bei, \u03bej)d\u03b8d\u03be1:n\n\n(5)\n\nZ\n\np\u03b8(\u03b8)Y\n\np\u03be(\u03bei)Y\n\ni\n\ni\u2264j\n\nIn (5) it is important that pR(s, t|\u03b8, \u03bei, \u03bej) = pR(t, s|\u03b8, \u03bej, \u03bei) to ensure joint exchangeability. The\n\ufb01rst impression from (5) is that joint exchangeability implies a more restricted functional form than\nseparately exchangeable (3). In fact, the reverse holds\u2014(5) means that the latent representations\nof row i and column i need not be independent, and that Ri,j and Rj,i need not be conditionally\nindependent given the row and column representations, while (3) assumes independence of both.\nFor example, a symmetric relation, i.e. Ri,j = Rj,i, can only be represented using (5).\nThe above Aldous-Hoover representation serves as the theoretical foundation for hierarchical\nBayesian modeling of exchangeable relational data, just as de Finetti\u2019s representation serves as a\nfoundation for the modeling of exchangeable sequences. In Section 5, we cast the In\ufb01nite Relational\nModel (Kemp et al., 2006) and a model based on the Mondrian process into this representation.\n\n2\n\n\fAnowadya (IRM)\n\nAnowadya\n\n1.0\n\n0.5\n\n0.0\n\nFigure 1: (1) Stochastic block models like the In\ufb01nite Relational model (Kemp et al., 2006) induce regular par-\ntitions on the product space, introducing structure where the data do not support it. (2) Axis- aligned partitions,\nlike those produced by annotated hierarchies and the Mondrian process provide (a posteriori) resolution only\nwhere it is needed. (3) Mondrian process on unit square, [0, 1]2. (4) We can visualize the sequential hierarchical\nprocess by spreading the cuts out over time. The third dimension is \u03bb. (5) Mondrian process with beta L\u00b4evy\nmeasure, \u00b5(dx) = x\u22121dx on [0, 1]2. (6) 10x zoom of 5 at origin. (7) Mondrian on [\u0001, 1]3 with beta measure.\n\n3 The Mondrian Process\n\nThe Mondrian process can be expressed as a recursive generative process that randomly makes axis-\naligned cuts, partitioning the underlying product space in a hierarchical fashion akin to decision\ntrees or kd- trees. The distinguishing feature of this recursive stochastic process is that it assigns\nprobabilities to the various events in such a way that it is consistent (in a sense we make precise\nlater). The implication of consistency is that we can extend the Mondrian process to in\ufb01nite spaces\nand use it as a nonparametric prior for modeling exchangeable relational data.\n\n3.1 The one dimensional case\n\nThe simplest space to introduce the Mondrian process is the unit interval [0, 1]. Starting with an\ninitial \u201cbudget\u201d \u03bb, we make a sequence of cuts, splitting the interval into subintervals. Each cut\ncosts a random amount, eventually exhausting the budget and resulting in a \ufb01nite partition m of the\nunit interval. The cost, EI, to cut an interval I is exponentially distributed with inverse mean given\nby the length of the interval. Therefore, the \ufb01rst cut costs E[0,1] \u223c Exp(1). Let \u03bb0 = \u03bb \u2212 E[0,1].\nIf \u03bb0 < 0, we make no cuts and the process returns the trivial partition m = {[0, 1]}. Otherwise,\nwe make a cut uniformly at random, splitting the unit interval into two subintervals A and B. The\nprocess recurses independently on A and B, with independent budgets \u03bb0, producing partitions mA\nand mB, which are then combined into a partition m = mA\nThe resulting cuts can be shown to be a Poisson (point) process. Unlike the standard description of\nthe Poisson process, the cuts in this \u201cbreak and branch\u201d process are organized in a hierarchy. As the\nPoisson process is a fundamental building block for random measures such as the Dirichlet process\n(DP), we will later exploit this relationship to build various multidimensional generalizations.\n\nS mB of [0, 1].\n\n3.2 Generalizations to higher dimensions and trees\nWe begin in two dimensions by describing the generative process for a Mondrian process m \u223c\nMP(\u03bb, (a, A), (b, B)) on the rectangle (a, A)\u00d7(b, B). Again, let \u03bb0 = \u03bb\u2212E, where E \u223c Exp(A\u2212\na+B\u2212b) is drawn from an exponential distribution with rate the sum of the interval lengths. If \u03bb0 <\n0, the process halts, and returns the trivial partition {(a, A) \u00d7 (b, B)}. Otherwise, an axis- aligned\ncut is made uniformly at random along the combined lengths of (a, A) and (b, B); that is, the cut\nlies along a particular dimension with probability proportional to its length, and is drawn uniformly\nwithin that interval. W.l.o.g., a cut x \u2208 (a, A) splits the interval into (a, x) and (x, A). The process\nthen recurses, generating independent Mondrian processes with diminished rate parameter \u03bb0 on\nboth sides of the cut: m< \u223c MP(\u03bb0, (a, x), (b, B)) and m> \u223c MP(\u03bb0, (x, A), (b, B)). The partition\non (a, A)\u00d7(b, B) is then m<\nthe number of cuts, with the process more likely to cut rectangles with large perimeters.\nThe process can be generalized in several ways.\nIn higher dimensions, the cost E to make an\nadditional cut is exponentially distributed with rate given by the sum over all dimensions of the\ninterval lengths. Similarly, the cut point is chosen uniformly at random from all intervals, splitting\nonly that interval in the recursion. Like non- homogeneous Poisson processes, the cut point need not\n\nS m>. Like the one- dimensional special case, the \u03bb parameter controls\n\n3In this paper we shall always mean in\ufb01nite exchangeability when we state exchangeability.\n\n3\n\n\fbe chosen uniformly at random, but can instead be chosen according to a non-atomic rate measure\n\u00b5d associated with each dimension. In this case, lengths (A \u2212 a) become measures \u00b51(a, A).\nThe process can also be generalized beyond products of intervals. The key property of intervals\nthat the Mondrian process relies upon is that any point cuts the space into one-dimensional, simply-\nconnected pieces. Trees also have this property: a cut along an edge splits a tree into two trees.\nWe denote a Mondrian process m with rate \u03bb on a product of one-dimensional, simply-connected\ndomains \u03981\u00d7\u00b7\u00b7\u00b7\u00d7\u0398D by m \u223c MP(\u03bb, \u03981, ..., \u0398D), with the dependence on \u00b51, ..., \u00b5D left implicit.\nA description of the recursive generative model for the conditional Mondrian (see Section 4) is given\nin Algorithm 1.\n\n4 Properties of the Mondrian Process\n\nThis section describes a number of interesting properties of the Mondrian process. The most im-\nportant properties of the Mondrian is its self-consistency. Instead of representing a draw from a\nMondrian as an unstructured partition of \u03981 \u00d7 \u00b7\u00b7\u00b7\u00d7 \u0398D, we will represent the whole history of the\ngenerative process. Thus a draw from the Mondrian process is either a trivial partition or a tuple\nm = hd, x, \u03bb0, m<, m>i, representing a cut at x along the d\u2019th dimension \u0398d, with nested Mondri-\nans m< and m> on either side of the cut. Therefore, m is itself a tree of axis-aligned cuts (a kd-tree\ndata structure), with the leaves of the tree forming the partition of the original product space.\nConditional Independencies: The generative process for the Mondrian produces a tree of cuts,\nwhere each subtree is itself a draw from a Mondrian. The tree structure precisely re\ufb02ects the condi-\ntional independencies of the Mondrian; e.g., the two subtrees m< and m> are conditional indepen-\ndent given \u03bb0, d and x at the \ufb01rst cut.\nConsistency: The Mondrian process satis\ufb01es an important self-consistency property: given a draw\nfrom a Mondrian on some domain, the partition on any subdomain has the same distribution as if\nwe sampled a Mondrian process directly on that subdomain.\nMore precisely, let m \u223c MP(\u03bb, \u03981, ..., \u0398D) and, for each dimension d, let \u03a6d be a connected\nsubdomain of \u0398d. The restriction \u03c1(m, \u03a61, ..., \u03a6D) of m to \u03a61 \u00d7 \u00b7\u00b7\u00b7 \u00d7 \u03a6D is the subtree of\ncuts within \u03a61 \u00d7 \u00b7\u00b7\u00b7 \u00d7 \u03a6D. We de\ufb01ne restrictions inductively: If there are no cuts in m, i.e.\nm = \u03981\u00d7\u00b7\u00b7\u00b7\u00d7\u0398D, then \u03c1(m, \u03a61, ..., \u03a6D) is simply \u03a61\u00d7\u00b7\u00b7\u00b7\u00d7\u03a6D. Otherwise m = hd, x, \u03bb, m<, m>i\nfor some d, x, and \u03bb, and where m< and m> are the two subtrees. Let \u0398 respectively. If x 6\u2208 \u03a6d this implies that \u03a6d must be on exactly one side of x\n(because \u03a6d and \u0398d are connected). W.l.o.g., assume \u03a6d \u2282 \u0398, \u03a61, ..., \u03a6d \u2229 \u0398>x\n\nIf x \u2208 \u03a6d then both \u0398x\n\nd\n\nand \u0398>x\n\nd\n\nd\n\n4\n\n\fFigure 2: Modeling a Mondrian with a Mondrian: A posterior sample given relational data created from an\nactual Mondrian painting. (from left) (1) Composition with Large Blue Plane, Red, Black, Yellow, and Gray\n(1921). (2) Raw relational data, randomly shuf\ufb02ed. These synthetic data were generated by \ufb01tting a regular\n6 \u00d7 7 point array over the painting (6 row objects, 7 column objects), and using the blocks in the painting\nto determine the block structure of these 42 relations. We then sampled 18 relational arrays with this block\nstructure. (3) Posterior sample of Mondrian process on unit square. The colors are for visual effect only as the\npartitions are contiguous rectangles. The small black dots are the embedding of the pairs (\u03bei, \u03b7j) into the unit\nsquare. Each point represents a relation Ri,j; each row of points are the relations (Ri,\u00b7) for an object \u03bei, and\nsimilarly for columns. Relations in the same block are clustered together. (4) Induced partition on the (discrete)\nrelational array, matching the painting. (5) Partitioned and permuted relational data showing block structure.\n\nbe computed easily, and amounts to drawing an exponential sample E \u223c Exp(P\n\nbreak-and-branch generative process for a Poisson process, any one dimensional slice of a Mondrian\ngives a Poisson point process.\nConditional Mondrians: Using the consistency property, we can derive the conditional distribution\nof a Mondrian m with rate \u03bb on \u03981 \u00d7 \u00b7\u00b7\u00b7\u00d7 \u0398D given its restriction \u03c1 = \u03c1(m, \u03a61, ..., \u03a6D). To do\nso, we have to consider three possibilities: when m contains no cuts, when the \ufb01rst cut of m is in\n\u03c1, and when the \ufb01rst cut of m is above \u03c1. Fortunately the probabilities of each of these events can\nd \u00b5d(\u0398d \\ \u03a6d)),\nand comparing it against the diminished rate after the \ufb01rst cut in \u03c1. Pseudocode for generating from\na conditional Mondrian is given in Algorithm 1. When every domain of \u03c1 has zero measure, i.e.,\n\u00b5d(\u03a6d) = 0 for all d, the conditional Mondrian reduces to an unconditional Mondrian.\nAlgorithm 1 Conditional Mondrian m \u223c MP(\u03bb, \u03981, ..., \u0398D | \u03c1)\n\n\u03c1 = \u03c6d = \u2205 is unconditioned\n\n1. let \u03bb0 \u2190 \u03bb \u2212 E where E \u223c Exp(PD\n\nd=1 \u00b5d(\u0398d \\ \u03a6d)).\n2. if \u03c1 has no cuts then \u03bb00 \u2190 0 else hd0, x0, \u03bb00, \u03c1<, \u03c1>i \u2190 \u03c1.\n3. if \u03bb0 < \u03bb00 then take root form of \u03c1\nif \u03c1 has no cut then\n4.\nreturn m \u2190 \u03981 \u00d7 \u00b7\u00b7\u00b7\u00d7 \u0398D.\n5.\nelse (d0, x0) is the \ufb01rst cut in m\n6.\nreturn m \u2190 hd0, x0, \u03bb00, MP(\u03bb00, \u03981, . . . , \u0398x0\nd0\n\n8. else \u03bb00 < \u03bb0 and there is a cut in m above \u03c1\n9.\n10. return m \u2190 hd, x, \u03bb0, MP(\u03bb0, \u03981, . . . , \u0398x\n\ndraw a cut (d, x) outside \u03c1, i.e., p(d) \u221d \u00b5d(\u0398d \\ \u03a6d), x|d \u223c\nd , . . . , \u0398D | \u03c1),\nd , . . . , \u0398D)i.\n\nwithout loss of generality suppose \u03a6d \u2282 \u0398)i.\n\n\u00b5d\n\n\u00b5d(\u0398d\\\u03a6d)\n\nPartition Structure: The Mondrian is simple enough that we can characterize a number of its other\nproperties. As an example, the expected number of slices along each dimension of (0, A)\u00d7(0, B) is\n\u03bbA and \u03bbB, while the expected total number of partitions is (1 + \u03bbA)(1 + \u03bbB). Interestingly, this is\nalso the expected number of partitions in a biclustering model where we \ufb01rst have two independent\nPoisson processes with rate \u03bb partition (0, A) and (0, B), and then form the product partition of\n(0, A) \u00d7 (0, B).\n\n5\n\n0.1.510.513562146471325\fposterior sample of Mondrian on\u00010,1\u00012\n\ncrude materials\n\nbasic goods\n\nfood\u0001animals\n\nminerals\u0001fuels\n\ndiplomats\n\nUK\nJAPAN\nSPAIN\nUSA\nYUGOS\nSWITZ\nFINLA\nISRAE\nEGYPT\nALGER\nCZECH\nSYRIA\nPAKIS\nNEWZE\nTHAIL\nINDON\nCHINA\nARGEN\nECUAD\nMADAG\nBRAZL\nETHIO\nHONDU\nLIBER\n\ncrude materials\n\nbasic goods\n\nfood\u0001animals\n\nminerals\u0001fuels\n\ndiplomats\n\nI\n\nA\nR\nY\nS\n\nG\nA\nD\nA\nM\n\nI\n\nR\nE\nB\nL\n\nO\nH\nT\nE\n\nI\n\nU\nD\nN\nO\nH\n\nN\nA\nP\nA\nJ\n\nK\nU\n\nI\n\nI\n\nZ\nT\nW\nS\n\nN\nA\nP\nS\n\nA\nN\nH\nC\n\nI\n\nE\nZ\nW\nE\nN\n\nT\nP\nY\nG\nE\n\nI\n\nL\nA\nH\nT\n\nN\nE\nG\nR\nA\n\nH\nC\nE\nZ\nC\n\nI\n\nA\nL\nN\nF\n\nS\nK\nA\nP\n\nI\n\nL\nZ\nA\nR\nB\n\nS\nO\nG\nU\nY\n\nN\nO\nD\nN\n\nI\n\nA\nS\nU\n\nE\nA\nR\nS\n\nI\n\nD\nA\nU\nC\nE\n\nR\nE\nG\nL\nA\n\nFigure 3: Trade and Diplomacy relations between 24 countries in 1984. Rij = 1 (black squares) implies\nthat country i imports R from country j. The colors are for visual effect only as the partitions are contiguous\nrectangles.\n\nX\n\nX\n\nX\n\nX\n\nX\n\n5\n\n10\n\n3\n\n1\n\n4\n\n9\n\n2\n\n6\n\n7\n\n8\n\nX\n\nLearning the latent tree\n\n [Kingman 82]\n\n2\n4\n3\n1\n\n2\n4\n3\n1\n\n2\n4\n3\n1\n\n1 3 4 5 2\n\n2\n4\n3\n1\n\n2\n4\n3\n1\n\n2\n4\n3\n1\n\n1 3 4 5 2\n\n2\n4\n3\n1\n\n2\n4\n3\n1\n\n2\n4\n3\n1\n\n1 3 4 5 2\n\nJanitor A\n\nJanitor B\n\nJanitor C\n\nStudent A\n\nStudent B\n\nStudent C\n\nProf. C\n\nProf. B\n\nProf. A\n\nJanitor A\n\nJanitor B\n\nJanitor C\n\nProf. A\n\nProf. B\n\nProf. C\n\nStudent A\n\nStudent B\n\nStudent C\n\nJanitor A\n\nJanitor B\n\nJanitor C\n\nStudent A\n\nStudent B\n\nStudent C\n\nProf. C\n\nProf. B\n\nProf. A\n\nA\n\nr\no\nt\ni\nn\na\nJ\n\nB\n\nr\no\nt\ni\nn\na\nJ\n\nC\n\nr\no\nt\ni\nn\na\nJ\n\nA\n\n.\nf\no\nr\nP\n\nB\n\n.\nf\no\nr\nP\n\nC\n\n.\nf\no\nr\nP\n\nA\n\nt\nn\ne\nd\nu\nt\n\nS\n\nB\n\nt\nn\ne\nd\nu\nt\nS\n\nC\n\nt\nn\ne\nd\nu\nt\nS\n\nA\n\nr\no\nt\ni\nn\na\nJ\n\nB\n\nr\no\nt\ni\nn\na\nJ\n\nC\n\nr\no\nt\ni\nn\na\nJ\n\nA\n\n.\nf\no\nr\nP\n\nB\n\n.\nf\no\nr\nP\n\nC\n\n.\nf\no\nr\nP\n\nA\n\nt\nn\ne\nd\nu\nt\n\nS\n\nB\n\nt\nn\ne\nd\nu\nt\nS\n\nC\n\nt\nn\ne\nd\nu\nt\nS\n\nA\n\nr\no\nt\ni\nn\na\nJ\n\nB\n\nr\no\nt\ni\nn\na\nJ\n\nC\n\nr\no\nt\ni\nn\na\nJ\n\nA\n\n.\nf\no\nr\nP\n\nB\n\n.\nf\no\nr\nP\n\nC\n\n.\nf\no\nr\nP\n\nA\n\nt\nn\ne\nd\nu\nt\n\nS\n\nB\n\nt\nn\ne\nd\nu\nt\nS\n\nC\n\nt\nn\ne\nd\nu\nt\nS\n\nJanitor A\n\nJanitor B\n\nJanitor C\n\nProf. A\n\nProf. B\n\nProf. C\n\nStudent A\n\nStudent B\n\nStudent C\n\nJanitor A\n\nJanitor B\n\nJanitor C\n\nProf. A\n\nProf. B\n\nProf. C\n\nStudent A\n\nStudent B\n\nStudent C\n\nFriends?\n\nWorks With?\n\nGives orders to?\n\nJ\na\nn\ni\nt\no\nr\n\nA\n\nJ\na\nn\ni\nt\no\nr\n\nB\n\nJ\na\nn\ni\nt\no\nr\n\nC\n\nS\nt\nu\nd\ne\nn\nt\n\nA\n\nS\nt\nu\nd\ne\nn\nt\n\nB\n\nS\nt\nu\nd\ne\nn\nt\n\nC\n\nP\nr\no\nf\n.\n\nC\n\nP\nr\no\nf\n.\n\nB\n\nP\nr\no\nf\n.\n\nA\n\nJ\na\nn\ni\nt\no\nr\n\nA\n\nJ\na\nn\ni\nt\no\nr\n\nB\n\nJ\na\nn\ni\nt\no\nr\n\nC\n\nS\nt\nu\nd\ne\nn\nt\n\nA\n\nS\nt\nu\nd\ne\nn\nt\n\nB\n\nS\nt\nu\nd\ne\nn\nt\n\nC\n\nP\nr\no\nf\n.\n\nC\n\nP\nr\no\nf\n.\n\nB\n\nP\nr\no\nf\n.\n\nA\n\nJ\na\nn\ni\nt\no\nr\n\nA\n\nJ\na\nn\ni\nt\no\nr\n\nB\n\nJ\na\nn\ni\nt\no\nr\n\nC\n\nS\nt\nu\nd\ne\nn\nt\n\nA\n\nS\nt\nu\nd\ne\nn\nt\n\nB\n\nS\nt\nu\nd\ne\nn\nt\n\nC\n\nP\nr\no\nf\n.\n\nC\n\nP\nr\no\nf\n.\n\nB\n\nP\nr\no\nf\n.\n\nA\n\nJanitor A\n\nJanitor B\n\nJanitor C\n\nStudent A\n\nStudent B\n\nStudent C\n\nProf. C\n\nProf. B\n\nProf. A\n\nJanitor A\n\nJanitor B\n\nJanitor C\n\nStudent A\n\nStudent B\n\nStudent C\n\nProf. C\n\nProf. B\n\nProf. A\n\nFigure 4: (clockwise from bottom left) (1) Nine samples from the Mondrian process on Kingman coalescents\nwith rate \u03bb = 0.25, 0.5, and 1, respectively. As the rate increases, partitions become \ufb01ner. Note that partitions\nare not necessarily contiguous; we use color to identify partitions. The partition structure is related to the\nannotated hierarchies model (Roy et al., 2007).\n(2) Kingman (1982a,b) describes the relationship between\nrandom trees and the DP, which we exploit to de\ufb01ne a nonparametric, hierarchical block model. (3) A sequence\nof cuts; each cut separates a subtree. (4) Posterior trees and Mondrian processes on a synthetic social network.\n\n5 Relational Modeling\n\nTo illustrate how the Mondrian process can be used to model relational data, we describe two non-\nparametric block models for exchangeable relations. While we will only consider binary data and\nassume that each block is conditionally iid, the ideas can be extended to many likelihood models.\nRecall the Aldous- Hoover representation (\u03b8, \u03bei, \u03b7j, pR) for exchangeable arrays. Using a Mondrian\nprocess with beta L\u00b4evy measure \u00b5(dx) =\u03b1x \u22121dx, we \ufb01rst sample a random partition of the unit\nsquare into blocks and assign each block a probability:\n\nM \u223c MP(\u03bb, [0, 1], [0, 1])\n\n\u03c6S | M \u223c Beta(a0, a1), \u2200S \u2208 M.\n\n(6)\n(7)\nThe pair (M, \u03c6) plays the role of \u03b8 in the Aldous- Hoover representation. We next sample row and\ncolumn representations (\u03bei and \u03b7j, respectively), which have a geometrical interpretation as x,y-\ncoordinates (\u03bei, \u03b7j) in the unit square:\n\nslices up unit square into blocks\neach block S gets a probability \u03c6S\n\n\u03bei \u223c U[0, 1], i \u2208 {1, . . . , n}\n\u03b7j \u223c U[0, 1], j \u2208 {1, . . . , n}.\n\nshared x coordinate for each row\nshared y coordinate for each column\n\n(8)\n(9)\n\nLet Sij be the block S \u2208 M such that (\u03bei, \u03b7j) \u2208 S. We \ufb01nally sample the array R of relations:\n\nRij | \u03be, \u03b7, \u03c6, M \u223c Bernoulli(\u03c6Sij ), i, j \u2208 {1, . . . , n}.\n\nRij is true w.p. \u03c6Sij\n\n(10)\n\n6\n\n\fThis model clusters relations together whose (\u03bei, \u03b7j) pairs fall in the same blocks in the Mondrian\npartition and models each cluster with a beta-binomial likelihood model. By mirroring the Aldous-\nHoover representation, we guarantee that R is exchangeable and that there is no order dependence.\nThis model is closely related to the IRM (Kemp et al., 2006) and IHRM (Xu et al., 2006), where\nrows and columns are \ufb01rst clustered using a CRP prior, then each relation Rij is conditionally\nindependent from others given the clusters that row i and column j belong to. In particular, if we\nreplace Eq. (6) with\n\nM \u223c MP(\u03bb, [0, 1]) \u00d7 MP(\u03bb, [0, 1]),\n\nproduct of partitions of unit intervals\n\n(11)\nthen we recover the same marginal distribution over relations as the IRM/IHRM. To see this, recall\nthat a Mondrian process in one-dimension produces a partition whose cut points follow a Poisson\npoint process. Teh et al. (2007) show that the stick lengths (i.e., partitions) induced by a Poisson\npoint process on [0, 1] with the beta L\u00b4evy measure have the same distribution as those in the stick-\nbreaking construction of the DP. Therefore, (11) is the product of two stick-breaking priors.\nIn\ncomparison, any one dimensional slice of (6), e.g., each column or row of the relation, is marginally\ndistributed as a DP, but is more \ufb02exible than the product of one-dimensional Mondrian processes.\nWe can also construct an exchangeable variant of the Annotated Hierarchies model (a hierarchical\nblock model) by moving from the unit square to a product of random trees drawn from Kingman\u2019s\ncoalescent prior (Kingman, 1982a). Let \u00b5d be Lebesgue measure.\n\nTd \u223c KC(\u03bb),\u2200d \u2208 {1, . . . , D}\n\nM | T \u223c MP(2\u03b1, T1, . . . , TD)\n\u03c6S | M \u223c Beta(a0, a1), \u2200S \u2208 M.\n\nfor each dimension, sample a tree\npartition the cross product of trees\neach block S gets a probability \u03c6S\n\n(12)\n(13)\n(14)\n\nLet Sij be the subset S \u2208 M where leaves (i, j) fall in S. Then\n\nRij | \u03c6, M \u223c Bernoulli(\u03c6Sij ), i, j \u2208 {1, . . . , n}.\n\n(15)\nFigure 4 shows some samples from this prior. Again, this model is related to the DP. Kingman shows\nthat the partition on the leaves of a coalescent tree when its edges are cut by a Poisson point process\nis the same as that of a DP (Figure 4). Therefore, the partition structure along every row and column\nis marginally the same as a DP. Both the unit square and product of random trees models give DP\ndistributed partitions on each row and column, but they have different inductive biases.\n\nRij is true w.p. \u03c6Sij\n\n6 Experiments\n\nThe \ufb01rst data set was synthetically created using an actual painting by Piet Mondrian, whose grid-\nbased paintings were the inspiration for the name of this process. Using the model de\ufb01ned by (10)\nand a uniform rate measure, we performed a Markov chain Monte Carlo (MCMC) simulation of\nthe posterior distribution over the Mondrian, \u03be\u2019s, \u03b7\u2019s, and hyperparameters. We employed a number\nof Metropolis-Hastings (MH) proposals that rotated, scaled, \ufb02ipped, and resampled portions of the\nMondrian. It can be shown that the conditional distribution of each \u03bei and \u03b7j is piecewise constant;\ngiven the conjugacy of the beta-binomial, we can Gibbs sample the \u03be\u2019s and \u03b7\u2019s. Figure 2 shows a\nsample after 1500 iterations (starting from a random initialization) where the partition on the array\nis exactly recovered. This was a typical attractor state for random initializations. While the data\nare suf\ufb01cient to recover the partition on the array, they are not suf\ufb01cient to recover the underlying\nMondrian process. It is an open question as to its identi\ufb01ability in the limit of in\ufb01nite data.\nWe next analyzed the classic Countries data set from the network analysis literature (Wasserman\nand Faust, 1994), which reports trade in 1984 between 24 countries in food and live animals; crude\nmaterials; minerals and fuels; basic manufactured goods; and exchange of diplomats. We applied\nthe model de\ufb01ned by (10). Figure 3 illustrates the type of structure the model uncovers during\nMCMC simulation; it has recognized several salient groups of countries acting in blocs; e.g., Japan,\nthe UK, Switzerland, Spain and China export to nearly all countries, although China behaves more\nlike the other Paci\ufb01c Rim countries as an importer. The diplomats relation is nearly symmetric, but\nthe model does not represent symmetry explicitly and must redundantly learn the entire relation.\nRe\ufb02ecting the Mondrian about the line y = x is one way to enforce symmetry in the partition.\nIn our \ufb01nal experiment, we analyzed a synthetic social network consisting of nine university em-\nployees: 3 janitors, 3 professors and 3 students. Given three relations (friends, works-with, and\n\n7\n\n\fgives-orders-to), the maximum a posteriori Mondrian process partitions the relations into homoge-\nneous blocks. Tree structures around the MAP clustered the janitors, professors and students into\nthree close-knit groups, and preferred to put the janitors and students more closely together in the\ntree. Inference in this model is particularly challenging given the large space of trees and partitions.\n\n7 Discussion\n\nWhile the Mondrian process has many elegant properties, much more work is required to determine\nits usefulness for relational modeling. Just as effective inference procedures preceded the popularity\nof the Dirichlet process, a similar leap in inference sophistication will be necessary to assess the\nMondrian process on large data sets. We are currently investigating improved MCMC sampling\nschemes for the Mondrian process, as well as working to develop a combinatorial representation of\nthe distribution on partitions induced by the Mondrian process. Such a representation is of prac-\ntical interest (possibly leading to improved inference schemes) and of theoretical interest, being a\nmultidimensional generalization of Chinese restaurant processes.\nThe axis-aligned partitions of [0, 1]n produced by the Mondrian process have been studied exten-\nsively in combinatorics and computational geometry, where they are known as guillotine partitions.\nGuillotine partitions have wide ranging applications including circuit design, approximation algo-\nrithms and computer graphics. However, the question of consistent stochastic processes over guillo-\ntine partitions, i.e. the question addressed here, has not, to our knowledge, been studied before.\nAt a high level, we believe that developing nonparametric priors on complex data structures from\ncomputer science may successfully bridge the gap between old-fashioned Arti\ufb01cial Intelligence and\nmodern statistical approaches. Developing representations for these typically recursive structures\nwill require us to go beyond graphical models; stochastic lambda calculus is an appealing option.\nReferences\nD. J. Aldous. Representations for Partially Exchangeable Arrays of Random Variables. Journal of Multivariate\n\nAnalysis, 11:581\u2013598, 1981.\n\nJ. M. Bernardo and A. F. M. Smith. Bayesian theory. John Wiley & Sons, 1994.\nB. de Finetti. Funzione caratteristica di un fenomeno aleatorio. Atti della R. Academia Nazionale dei Lincei,\n\nSerie 6. Memorie, Classe di Scienze Fisiche, Mathematice e Naturale, 4:251299, 1931.\n\nP. Diaconis and S. Janson. Graph limits and exchangeable random graphs. arXiv:0712.2749v1, 2007.\nP. W. Holland, K. B. Laskey, and S. Leinhardt. Stochastic blockmodels: First steps. Social Networks, 5(2):109\n\n\u2013 137, 1983.\n\nD. Hoover. Relations on probability spaces and arrays of random variables. Technical report, Preprint, Institute\n\nfor Advanced Study, Princeton, NJ, 1979.\n\nO. Kallenberg. Probabilistic Symmetries and Invariance Principles. Springer, 2005.\nC. Kemp, J. Tenenbaum, T. Grif\ufb01ths, T. Yamada, and N. Ueda. Learning systems of concepts with an in\ufb01nite\n\nrelational model. In Proceedings of the 21st National Conference on Arti\ufb01cial Intelligence, 2006.\n\nJ. F. C. Kingman. On the genealogy of large populations. Journal of Applied Probability, 19:27\u201343, 1982a.\nJ. F. C. Kingman. The coalescent. Stochastic Processes and their Applications, 13:235\u2013248, 1982b.\nL. Lov\u00b4asz and B. Szegedy. Limits of dense graph sequences. J. Comb. Theory B, 96:933957, 2006.\nK. Nowicki and T. A. B. Snijders. Estimation and prediction for stochastic blockstructures. Journal of the\n\nAmerican Statistical Association, 96:1077\u20131087(11), 2001.\n\nD. M. Roy, C. Kemp, V. Mansinghka, and J. B. Tenenbaum. Learning annotated hierarchies from relational\n\ndata. In Advances in Neural Information Processing Systems 19, 2007.\n\nC. Ryll-Nardzewski. On stationary sequences of random variables and the de Finetti\u2019s equivalence. Colloq.\n\nMath., 4:149\u2013156, 1957.\n\nJ. Sethuraman. A Constructive de\ufb01nition of Dirichlet priors. Statistica Sinica, 4:639\u2013650, 1994.\nY. W. Teh, D. G\u00a8or\u00a8ur, and Z. Ghahramani. Stick-breaking construction for the Indian buffet process. In Pro-\n\nceedings of the International Conference on Arti\ufb01cial Intelligence and Statistics, volume 11, 2007.\n\nS. Wasserman and C. Anderson. Stochastic a posteriori blockmodels: Construction and assessment. Social\n\nNetworks, 9(1):1 \u2013 36, 1987.\n\nS. Wasserman and K. Faust. Social Network Analysis: Methods and Applications, pages 64\u201365. Cambridge\n\nUniversity Press, 1994.\n\nZ. Xu, V. Tresp, K. Yu, and H.-P. Kriegel. In\ufb01nite Hidden Relational Models. In Proceedings of the 22nd\n\nConference on Uncertainty in Arti\ufb01cial Intelligence, 2006.\n\n8\n\n\f", "award": [], "sourceid": 849, "authors": [{"given_name": "Daniel", "family_name": "Roy", "institution": null}, {"given_name": "Yee", "family_name": "Teh", "institution": null}]}