{"title": "Rectangular Bounding Process", "book": "Advances in Neural Information Processing Systems", "page_first": 7620, "page_last": 7630, "abstract": "Stochastic partition models divide a multi-dimensional space into a number of rectangular regions, such that the data within each region exhibit certain types of homogeneity. Due to the nature of their partition strategy, existing partition models may create many unnecessary divisions in sparse regions when trying to describe data in dense regions. To avoid this problem we introduce a new parsimonious partition model -- the Rectangular Bounding Process (RBP) -- to efficiently partition multi-dimensional spaces, by employing a bounding strategy to enclose data points within rectangular bounding boxes. Unlike existing approaches, the RBP possesses several attractive theoretical properties that make it a powerful nonparametric partition prior on a hypercube. In particular, the RBP is self-consistent and as such can be directly extended from a finite hypercube to infinite (unbounded) space. We apply the RBP to regression trees and relational models as a flexible partition prior. The experimental results validate the merit of the RBP {in rich yet parsimonious expressiveness} compared to the state-of-the-art methods.", "full_text": "Rectangular Bounding Process\n\nXuhui Fan\n\nSchool of Mathematics & Statistics\n\nUniversity of New South Wales\nxuhui.fan@unsw.edu.au\n\nBin Li\n\nlibin@fudan.edu.cn\n\nSchool of Computer Science\n\nFudan University\n\nScott A. Sisson\n\nSchool of Mathematics & Statistics\n\nUniversity of New South Wales\n\nscott.sisson@unsw.edu.au\n\nAbstract\n\nStochastic partition models divide a multi-dimensional space into a number of\nrectangular regions, such that the data within each region exhibit certain types of\nhomogeneity. Due to the nature of their partition strategy, existing partition models\nmay create many unnecessary divisions in sparse regions when trying to describe\ndata in dense regions. To avoid this problem we introduce a new parsimonious\npartition model \u2013 the Rectangular Bounding Process (RBP) \u2013 to ef\ufb01ciently partition\nmulti-dimensional spaces, by employing a bounding strategy to enclose data points\nwithin rectangular bounding boxes. Unlike existing approaches, the RBP possesses\nseveral attractive theoretical properties that make it a powerful nonparametric\npartition prior on a hypercube. In particular, the RBP is self-consistent and as such\ncan be directly extended from a \ufb01nite hypercube to in\ufb01nite (unbounded) space. We\napply the RBP to regression trees and relational models as a \ufb02exible partition prior.\nThe experimental results validate the merit of the RBP in rich yet parsimonious\nexpressiveness compared to the state-of-the-art methods.\n\n1\n\nIntroduction\n\nStochastic partition processes on a product space have found many real-world applications, such\nas regression trees [5, 18, 22], relational modeling [17, 2, 21], and community detection [26, 16].\nBy tailoring a multi-dimensional space (or multi-dimensional array) into a number of rectangular\nregions, the partition model can \ufb01t data using these \u201cblocks\u201d such that the data within each block\nexhibit certain types of homogeneity. As one can choose an arbitrarily \ufb01ne resolution of partition, the\ndata can be \ufb01tted reasonably well.\nThe cost of \ufb01ner data \ufb01tness is that the partition model may induce unnecessary dissections in sparse\nregions. Compared to the regular-grid partition process [17], the Mondrian process (MP) [32] is\nmore parsimonious for data \ufb01tting due to a hierarchical partition strategy; however, the strategy of\nrecursively cutting the space still cannot largely avoid unnecessary dissections in sparse regions.\nConsider e.g. a regression tree on a multi-dimensional feature space: as data usually lie in some local\nregions of the entire space, a \u201cregular-grid\u201d or \u201chierarchical\u201d (i.e. kd-tree) partition model would\ninevitably produce too many cuts in regions where data points rarely locate when it tries to \ufb01t data\nin dense regions (see illustration in the left panel of Figure 1). It is accordingly challenging for a\npartition process to balance \ufb01tness and parsimony.\nInstead of this cutting-based strategy, we propose a bounding-based partition process \u2013 the Rectangular\nBounding Process (RBP) \u2013 to alleviate the above limitation. The RBP generates rectangular bounding\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fFigure 1: [Left] (a) Regular-grid partition; (b) hierarchical partition; (c) RBP-based partition. [Right]\nGeneration of a bounding box (Step (2) of the generative process for the RBP).\n\nboxes to enclose data points in a multi-dimensional space. In this way, signi\ufb01cant regions of the\nspace can be comprehensively modelled. Each bounding box can be ef\ufb01ciently constructed by an\nouter product of a number of step functions, each of which is de\ufb01ned on a dimension of the feature\nspace, with a segment of value \u201c1\u201d in a particular interval and \u201c0\u201d otherwise. As bounding boxes\nare independently generated, the layout of a full partition can be quite \ufb02exible, allowing for a simple\ndescription of those regions with complicated patterns. As a result, the RBP is able to use fewer\nblocks (thereby providing a more parsimonious expression of the model) than those cutting-based\npartition models while achieving a similar modelling capability.\nThe RBP has several favourable properties that make it a powerful nonparametric partition prior:\n(1) Given a budget parameter and the domain size, the expected total volume of the generated\nbounding boxes is a constant. This is helpful to set the process hyperparameters to prefer few large\nbounding boxes or many small bounding boxes. (2) Each individual data point in the domain has\nequal probability to be covered by a bounding box. This gives an equal tendency towards all possible\ncon\ufb01gurations of the data. (3) The process is self-consistent in the sense that, when restricting a\ndomain to its sub-domain, the probability of the resulting partition on the sub-domain is the same as\nthe probability of generating the same partition on the sub-domain directly. This important property\nenables the RBP to be extendable from a hypercube to the in\ufb01nite multi-dimensional space according\nto the Kolmogorov extension theorem [6]. This is extremely useful for the domain changing problem\nsetting, such as regression problems over streaming data.\nThe RBP can be a bene\ufb01cial tool in many applications. In this paper we speci\ufb01cally investigate (1)\nregression trees, where bounding boxes can be viewed as local regions of certain labels, and (2)\nrelational modelling, where bounding boxes can be viewed as communities in a complex network.\nWe develop practical and ef\ufb01cient MCMC methods to infer the model structures. The experimental\nresults on a number of synthetic and real-world data sets demonstrate that the RBP can achieve\nparsimonious partitions with competitive performance compared to the state-of-the-art methods.\n\n2 Related Work\n\n2.1 Stochastic Partition Processes\n\nStochastic partition processes divide a multi-dimensional space (for continuous data) or a multi-\ndimensional array (for discrete data) into blocks such that the data within each region exhibit certain\ntypes of homogeneity. In terms of partitioning strategy, state-of-the-art stochastic partition processes\ncan be roughly categorized into regular-grid partitions and \ufb02exible axis-aligned partitions (Figure 1).\nA regular-grid stochastic partition process is constituted by D separate partition processes on each\ndimension of the D-dimensional array. The resulting orthogonal interactions between two dimensions\nproduce regular grids. Typical regular-grid partition models include the in\ufb01nite relational model\n(IRM) [17] and mixed-membership stochastic blockmodels [2]. Bayesian plaid models [24, 15, 4]\ngenerate \u201cplaid\u201d like partitions, however they neglect the dependence between latent feature values\nand more importantly, they are restricted to discrete arrays (in contrast to our continuous space\nmodel). Regular-grid partition models are widely used in real-world applications for modeling graph\ndata [14, 33].\nTo our knowledge, only the Mondrian process (MP) [32, 31] and the rectangular tiling process\n(RTP) [25] can produce \ufb02exible axis-aligned partitions on a product space. The MP recursively\ngenerates axis-aligned cuts on a unit hypercube and partitions the space in a hierarchical fashion\nknown as a kd-tree. Compared to the hierarchical partitioning strategy, the RTP generates a \ufb02at\n\n2\n\n(a)(b)(c)d=1d=2l(1)=0.5l(2)=0.55s(1)=0.1s(2)=0.45step 2.(a)step 2.(b)\fpartition structure on a two-dimensional array by assigning each entry to an existing block or a new\nblock in sequence, without violating the rectangular restriction on the blocks.\n\n2.2 Bayesian Relational Models\n\nMost stochastic partition processes are developed to target relational modelling [17, 32, 25, 36, 9, 10].\nA stochastic partition process generates a partition on a multi-dimensional array, which serves as a\nprior for the underlying communities in the relational data (e.g. social networks in the 2-dimensional\ncase). After generating the partition, a local model is allocated to each block (or polygon) of the\npartition to characterize the relation type distribution in that block. For example, the local model can\nbe a Bernoulli distribution for link prediction in social networks or a discrete distribution for rating\nprediction in recommender systems. Finally, row and column indexes are sampled to locate a block\nin the partition and use the local model to further generate the relational data.\nCompared to the existing stochastic partition processes in relational modelling, the RBP introduces\na very different partition approach: the RBP adopts a bounding-based strategy while the others are\nbased on cutting-based strategies. This unique feature enables the RBP to directly capture important\ncommunities without wasting model parameters on unimportant regions. In addition, the bounding\nboxes are independently generated by the RBP. This parallel strategy is much more ef\ufb01cient than the\nhierarchical strategy [32, 9] and the entry-wise growing strategy [25].\n\n2.3 Bayesian Regression Trees\n\nThe ensemble of regression/decision trees [3, 11] is a popular tool in regression tasks due to its\ncompetitive and robust performance against other models. In the Bayesian framework, Bayesian\nadditive regression trees (BART) [5] and the Mondrian forest [18, 19] are two representative methods.\nBART adopts an ensemble of trees to approximate the unknown function from the input space to\nthe output label. Through prior regularization, BART can keep the effects of individual trees small\nand work as a \u201cweak learner\u201d in the boosting literature. BART has shown promise in nonparametric\nregression; and several variants of BART have been developed to focus on different scenarios,\nincluding heterogeneous BART [29] allowing for various observational variance in the space, parallel\nBART [30] enabling parallel computing for BART, and Dirichlet additive regression trees [22]\nimposing a sparse Dirichlet prior on the dimensions to address issues of high-dimensionality.\nThe Mondrian forest (MF) is built on the idea of an ensemble of Mondrian processes (kd-trees) to\npartition the space. The MF is distinct from BART in that the MF may be more suitable for streaming\ndata scenarios, as the distribution of trees sampled from the MP stays invariant even if the data\ndomain changes over time.\n\n3 The Rectangular Bounding Process\n\nThe goal of the Rectangular Bounding Process (RBP) is to partition the space by attaching rectan-\ngular bounding boxes to signi\ufb01cant regions, where \u201csigni\ufb01cance\u201d is application dependent. For a\nhypercubical domain X \u2282 RD with L(d) denoting the length of the d-th dimension of X, a budget\nparameter \u03c4 \u2208 R+ is used to control the expected number of generated bounding boxes in X and a\nlength parameter \u03bb \u2208 R+ is used to control the expected size of the generated bounding boxes. The\ngenerative process of the RBP is de\ufb01ned as follows:\n\n1. Sample the number of bounding boxes K\u03c4 \u223c Poisson(\u03c4(cid:81)D\n\n(cid:2)1 + \u03bbL(d)(cid:3));\n\nd=1\n\n2. For k = 1, . . . , K\u03c4 , d = 1, . . . , D, sample the initial position s(d)\nk\n\nk-th bounding box (denoted as (cid:3)k) in the d-th dimension:\n(a) Sample the initial position s(d)\nk\n\nas\n\nand the length l(d)\n\nk of the\n\n(cid:40)\n\ns(d)\n1+\u03bbL(d) ;\nk = 0,\nk \u223c Uniform(0, L(d)], with probability \u03bbL(d)\ns(d)\n1+\u03bbL(d) ;\n\nwith probability\n\n1\n\n3\n\n\f(cid:40)\n\n(b) Sample the length l(d)\nas\nk\nk = L(d) \u2212 s(d)\nl(d)\nk ,\nk \u223c Trun-Exp(\u03bb, L(d) \u2212 s(d)\nl(d)\n\nwith probability e\u2212\u03bb(L(d)\u2212s(d)\nk );\n\nk ), with probability 1 \u2212 e\u2212\u03bb(L(d)\u2212s(d)\nk ).\n\n3. Sample K\u03c4 i.i.d. time points uniformly in (0, \u03c4 ] and index them to satisfy t1 < . . . < tK\u03c4 .\n\nSet the cost of (cid:3)k as mk = tk \u2212 tk\u22121 (t0 = 0).\n\nk , s(d)\n\nd=1 u(d)\nk + l(d)\n\nis a step function de\ufb01ned on [0, L(d)], taking value of \u201c1\u201d in [s(d)\n\nk ) refers to an exponential distribution with rate \u03bb, truncated at L(d)\u2212s(d)\nHere Trun-Exp(\u03bb, L(d)\u2212s(d)\nk .\nThe RBP is de\ufb01ned in a measurable space (\u2126X ,BX ), where X \u2208 F(RD) denotes the domain and\nF(RD) denotes the collection of all \ufb01nite boxes in RD. Each element in \u2126X denotes a partition\n(cid:1)X of X, comprising a collection of rectangular bounding boxes {(cid:3)k}k, where k \u2208 N indexes the\nk ([0, L(d)]),\nk ] and \u201c0\u201d\n\nbounding boxes in (cid:1)X. A bounding box is de\ufb01ned by an outer product (cid:3)k :=(cid:78)D\n\nwhere u(d)\nk\notherwise (see right panel of Figure 1).\nGiven a domain X, hyperparameters \u03c4 and \u03bb, a random partition sampled from the RBP can\nbe represented as: (cid:1)X \u223c RBP(X, \u03c4, \u03bb). We assume that the costs of bounding boxes are i.i.d.\nsampled from the same exponential distribution, which implies there exists a homogeneous Poisson\nprocess on the time (cost) line. The generating time of each bounding box is uniform in (0, \u03c4 ]\nand the number of bounding boxes has a Poisson distribution. We represent a random partition as\n(cid:1)X := {(cid:3)k, mk}K\u03c4\n\nk=1 \u2208 \u2126X.\n3.1 Expected Total Volume\nProposition 1. Given a hypercubical domain X \u2282 RD with L(d) denoting the length of the d-th\ndimension of X and the value of \u03c4, the expected total volume of the bounding boxes (i.e. expected\nd=1 L(d).\n\nnumber of boxes \u00d7 expected volume of a box) in (cid:1)X sampled from a RBP is a constant \u03c4 \u00b7(cid:81)D\nof \u03c4 \u00b7(cid:81)D\nboxes for a given budget \u03c4 and a domain X is \u03c4 \u00b7(cid:81)D\n\n(cid:2)1 + \u03bbL(d)(cid:3) bounding boxes in (cid:1)X. Thus, the expected total volume of the bounding\n\nThe expected length of the interval in u(d)\n1+\u03bbL(d) .\nAccording to the de\ufb01nition of the RBP (Poisson distribution in Step 1), we have an expected number\n\nk with value \u201c1\u201d is E(|u(d)\n\nk |) = E(l(d)\n\nk ) = L(d)\n\nd=1\n\nd=1 L(d).\n\nProposition 1 implies that, given \u03c4 and X, the RBP generates either many small-sized bounding boxes\nor few large-sized bounding boxes. This provides a practical guidance on how to choose appropriate\nvalues of \u03bb and \u03c4 when implementing the RBP. Given the lengths {L(d)}d of X, an estimate of the\nlengths of the bounding boxes can help to choose \u03bb (i.e. \u03bb = L(d)\u2212E(|u(d)\nk |)\n). An appropriate value of\nL(d)E(|u(d)\nk |)\n\u03c4 can then be chosen to determine the expected number of bounding boxes.\n\nk\n\n1\n\nk + l(d)\n\nis a constant\n\nk ] on u(d)\n\n3.2 Coverage Probability\nProposition 2. For any data point x \u2208 X (including the boundaries of X), the probability of x(d)\nfalling in the interval of [s(d)\nk , s(d)\n1+\u03bbL(d) (and does not depend on x(d)).\nAs the step functions {u(d)\nk }k for constructing the k-th bounding box (cid:3)k in (cid:1)X are independent, x is\ncovered by (cid:3)k with a constant probability.\nThe property of constant coverage probability is particularly suitable for regression problems. Propo-\nsition 2 implies there is no biased tendency to favour different data regions in X. All data points have\nequal probability to be covered by a bounding box in a RBP partition (cid:1)X.\nAnother interesting observation can be seen from this property: Although we have speci\ufb01ed a\ndirection for generating the d-th dimension of (cid:3)k in the generative process (i.e. the initial position\ns(d)\nk ), the probability of generating u(d) is the same if\nk\nwe reverse the direction of the d-th dimension, which is p(s(d)\n0\n\n4\n\n\fif we reverse the direction of the d-th dimension. Direction is therefore only de\ufb01ned for notation\nconvenience \u2013 it will not affect the results of any analysis.\n\n3.3 Self-Consistency\n\nThe RBP has the attractive property of self-consistency. That is, while restricting an RBP(Y, \u03c4, \u03bb)\non D-dimensional hypercube Y , to a sub-domain X, X \u2282 Y \u2208 F(RD), the resulting bounding\nboxes restricted to X are distributed as if they are directly generated on X through an RBP(X, \u03c4, \u03bb)\n(see Figure 2 for an illustration). Typical application scenarios are regression/classi\ufb01cation tasks on\nstreaming data, where new data points may be observed outside the current data domain X, say falling\nin Y /X, X \u2282 Y , where Y represents the data domain (minimum bounding box of all observed data)\nin the future (left panel of Figure 3). Equipped with the self-consistency property, the distribution of\nthe RBP partition on X remains invariant as new data points come and expand the data domain.\nThe self-consistency property can be veri\ufb01ed in three steps: (1) the distribution of the number of\nbounding boxes is self-consistent; (2) the position distribution of a bounding box is self-consistent;\n(3) the RBP is self-consistent. In the following, we use \u03c0Y,X to denote the projection that restricts\n(cid:1)Y \u2208 \u2126Y to X by keeping (cid:1)Y \u2019s partition in X unchanged and removing the rest.\nProposition 3. While restricting the RBP(Y, \u03c4, \u03bb) to X, X \u2282 Y \u2208 F(RD), we have the following\nresults:\n\n(cid:16)\n\n1. The time points of bounding boxes crossing into X from Y follow the same Poisson\nprocess for generating the time points of bounding boxes in a RBP(X, \u03c4, \u03bb). That is\nP Y\n\n= P X\n\n\u03c0\u22121\n\n\u03c4 ,{mX\nK X\n\nk }KX\n\nY,X\n\nk=1\n\n.\n\n\u03c4\n\nK\u03c4 ,{mk}K\u03c4\n\nk=1\n\n2. The marginal probability of the pre-images of a bounding box (cid:3)X in Y (given the bounding\nbox in Y would cross into X) equals the probability of (cid:3)X being directly sampled from a\nRBP(X, \u03c4, \u03bb). That is, P Y(cid:3)\n\n= P X(cid:3) ((cid:3)X ).\n\n\u03c0\u22121\n\n(cid:16)\n\n\u03c4\n\nk=1\n\n(cid:17)(cid:17)\n\nk }KX\n\n(cid:16)\nY,X ((cid:3)X )(cid:12)(cid:12) \u03c0Y,X ((cid:3)Y ) (cid:54)= \u2205(cid:17)\n\nK\u03c4 ,{mk}K\u03c4\n\nk=1\n\n\u03c4 ,{mX\nK X\n(cid:16)\n\n(cid:17)\n\n(cid:16)\n\n(cid:17)\n\u03c0\u22121\nY,X ((cid:1)X )\n\n= P X(cid:1) ((cid:1)X ).\n\n3. Combining 1 and 2 leads to the self-consistency of the RBP: P Y(cid:1)\n\nThe generative process provides a way of de\ufb01ning the RBP on a multidimensional \ufb01nite space\n(hypercube). According to the Kolmogorov extension theorem ([6]), we can further extend RBP to\nthe multidimensional in\ufb01nite space RD.\nTheorem 1. The probability measure P X(cid:1) on the measurable space (\u2126X ,BX ) of the RBP, X \u2208\nRD(cid:1) on (\u2126RD ,BRD ) as the projective limit measurable space.\nF(RD), can be uniquely extended to P\n\n4 Application to Regression Trees\n\n4.1 RBP-Regression Trees\n\nWe apply the RBP as a space partition prior to the regression-tree problem. Given the feature and\nn=1, (xn, yn) \u2208 RD \u00d7 R, the bounding boxes {(cid:3)k}k sampled from an RBP\nlabel pairs {(xn, yn)}N\non the domain X of the feature data are used for modelling the latent variables that in\ufb02uence the\nobserved labels. Since the {(cid:3)k}k allow for a partition layout of overlapped bounding boxes, each\nsingle feature point can be covered by more than one bounding box, whose latent variables together\nform an additive effect to in\ufb02uence the corresponding label.\nThe generative process for the RBP-regression tree is as follows:\n{(cid:3)k}k \u223c RBP(X, \u03c4, \u03bb); \u03c9k \u223c N (\u00b5\u03c9, \u00012\n\u03c9);\n\n\u03c32 \u223c IG(1, 1);\n\n\u03c9k \u00b7 111x\u2208(cid:3)k (xn), \u03c32).\n\nyn \u223c N (\n\n(cid:88)\n\nk\n\n5\n\n\fFigure 2: Self-consistency of the Rectangular Bounding Process: P Y(cid:1)\n\n(cid:16)\n\n(cid:17)\n\u03c0\u22121\nY,X ((cid:1)X )\n\n= P X(cid:1) ((cid:1)X ).\n\nFigure 3: A toy regression-tree example: (left) Debt \u223c f (Income, Expense), for some some proba-\nbility density function, f. When entering November from October, more observed records exceeding\nIncome of $10K or Expense of $5K are observed, where X \u2282 Y denotes the data domain in October\nand Y the data domain in November. (right) The RBP regression-tree predicts the original data well,\nparticularly in dense and complex regions (top left and bottom right).\n\n4.2 Sampling for RBP-Regression Tree\nThe joint probability of the label data {yn}N\nrelated to the bounding boxes {\u03c9k, s(1:D)\nP ({yn}n, K\u03c4 ,{\u03c9k, s(1:D)\n\n, l(1:D)\n}k, \u03c32|\u03bb, \u03c4, \u00b5\u03c9, \u00012\n\n, l(1:D)\n\nk\n\nk\n\nk\n\nk\n\n\u00b7P (K\u03c4|\u03c4, \u03bb, L(1:D))K\u03c4 !P (\u03c32)\n\nn=1, the number of bounding boxes K\u03c4 , the variables\n\n}K\u03c4\nk=1, and the error variance \u03c32 is\n(cid:89)\n\u03c9, X, L(1:D)) =\n\n(cid:89)\n\u03c9) \u00b7(cid:89)\n\nP (\u03c9k|\u00b5\u03c9, \u00012\n\nP (yn|K\u03c4 ,{\u03c9k, s(1:D)\n\nP (s(d)\n\nn\n\nk\n\nk\n\nk,d\n\n, l(1:D)\n\nk\n\n}k, \u03c32, X)\n\nk |\u03bb, L(d))P (l(d)\n\nk |s(d)\n\nk , \u03bb, L(d))\n\nWe adopt MCMC methods to iteratively sampling from the resulting posterior distribution.\nSample K\u03c4 : We use a similar strategy to [1] for updating K\u03c4 . We accept the addition or removal of\na bounding box with an acceptance probability of min(1, \u03b1add) or min(1, \u03b1del) respectively, where\n\n(cid:81)\n(cid:81)\nn PK\u03c4 +1(yn) \u00b7 \u03c4 \u03bb\u2217(1 \u2212 P0)\nn PK\u03c4 (yn) \u00b7 (K\u03c4 + 1)P0\n\n(cid:2)1 + \u03bbL(d)(cid:3), PK\u03c4 (yn) = P (yn|K\u03c4 ,{\u03c9k, s(d)\n\n\u03b1add =\n\n, \u03b1del =\n\n(cid:81)\n(cid:81)\nn PK\u03c4\u22121(yn) \u00b7 K\u03c4 P0\nn PK\u03c4 (yn) \u00b7 \u03c4 \u03bb\u2217(1 \u2212 P0)\nk }k, \u03c32,{xn}n), and P0 = 1\n\n,\n\n\u03bb\u2217 =(cid:81)\n\nd\n\n2 (or 1\u2212 P0)\n\nk , l(d)\n\ndenotes the probability of proposing to add (or remove) a bounding box.\nSample \u03c32,{\u03c9k}k: Both \u03c32 and {\u03c9k}k can be sampled through Gibbs sampling:\n\n(cid:18)\nwhere \u00b5\u2217 = (\u03c32)\u2217(cid:18)\n\n\u03c32 \u223c IG\n\n(cid:80)\nn(yn \u2212(cid:80)\n(cid:16)\nyn\u2212(cid:80)\n\n2\nk(cid:48)(cid:54)=k \u03c9k(cid:48)\u00b7111x\u2208(cid:3)\n\n(cid:80)\n\nn\n\n\u03c32\n\n1 +\n\n, 1 +\n\nN\n2\n\n\u00b5\u03c9\n\u00012\n\u03c9\n\n+\n\n(cid:19)\n\n(cid:19)\n\n(cid:17)\n\nk \u03c9k \u00b7 111x\u2208(cid:3)k (xn))2\n\n, \u03c9\u2217\n\nk \u223c N (\u00b5\u2217, (\u03c32)\u2217),\n\n(cid:16) 1\n\n\u00012\n\u03c9\n\n(cid:17)\u22121\n\nk(cid:48) (xn)\n\n, (\u03c32)\u2217 =\n\n+ Nk\n\u03c32\n\n, and Nk is the\n\nnumber of data points belonging to the k-th bounding box.\nSample {s(d)\nk we use the Metropolis-Hastings algorithm, generating the\nproposed s(d)\nk ) using the RBP generative process (Step 2.(a)(b)). Thus, the\nacceptance probability is based purely on the likelihoods for the proposed and current con\ufb01gurations.\n\nk }k,d: To update u(d)\n(which determine u(d)\n\nk , l(d)\nk , l(d)\n\nk\n\n4.3 Experiments\n\nWe empirically test the RBP regression tree (RBP-RT) for the regression task. We compare the\nRBP-RT with several state-of-the-art methods: (1) a Bayesian additive regression tree (BART) [5];\n\n6\n\nPX()=PY()+PY()+PY()+PY()+1K10KIncome1K5KExpenseXYUntil OctoberNovemberDecember1K10KIncome1K5KExpense42024Debt Record0.00.20.40.60.81.00.00.20.40.60.81.0Original Data420240.00.20.40.60.81.00.00.20.40.60.81.0Trained Result42024\fData Sets\nProtein\nNaval Plants\nPower Plants\nConcrete Data\nAirfoil self-noise\n\nTable 1: Regression task comparison results (RMAE\u00b1std)\n4.79 \u00b1 0.07\n0.37 \u00b1 0.10\n5.03 \u00b1 0.12\n6.21 \u00b1 0.46\n1.10 \u00b1 0.21\n\n4.50 \u00b1 0.05\n0.46 \u00b1 0.07\n4.36 \u00b1 0.11\n5.95 \u00b1 0.56\n1.21 \u00b1 0.24\n\n3.20 \u00b1 0.03\n0.35 \u00b1 0.01\n4.03 \u00b1 0.08\n4.18 \u00b1 0.31\n1.06 \u00b1 0.06\n\nBART\n\nERT\n\n3.04 \u00b1 0.03\n0.13 \u00b1 0.01\n3.65 \u00b1 0.08\n3.80 \u00b1 0.33\n0.74 \u00b1 0.08\n\nRF\n\nMF\n\nRBP-RT\n4.87 \u00b1 0.10\n0.53 \u00b1 0.17\n4.78 \u00b1 0.17\n6.34 \u00b1 0.78\n1.25 \u00b1 0.33\n\n(2) a Mondrian forest (MF) [18, 19]; (3) a random forest (RF) [3]; and (4) an extremely randomized\ntree (ERT) [12]. For BART, we implement a particle MCMC strategy to infer the structure of each\ntree in the model; for the MF, we directly use the existing code provided by the author1; for the RF\nand ERT, we use the implementations in the scikit-learn toolbox [28].\n\nSynthetic data: We \ufb01rst test the RBP-RT on a simple synthetic data set. A total of 7 bounding\nboxes are assigned to the unit square [0, 1]2, each with its own mean intensity \u03c9k. From each\nbounding box 50 \u223c 80 points are uniformly sampled, with the label data yi generated from a normal\ndistribution, with mean the sum of the intensities of the bounding boxes covering the data point, and\nstandard deviation \u03c3 = 0.1. In this way, a total of 400 data points are generated in [0, 1]2.\nTo implement the RBP-TR we set the total number of iterations to 500, \u03bb = 2 (i.e. the expected box\nlength is 1\n3) and \u03c4 = 1 (i.e. the expected number of bounding boxes is 90). It is worth noting that\n90 is a relatively small number of blocks when compared to the other state-of-the-art methods. For\ninstance, BART usually sets the number of trees to be at least 50 and there are typically more than 16\nblocks in each tree (i.e. at least 800 blocks in total). The right panel of Figure 3 shows a visualization\nof the data \ufb01tting result of the RBP regression-tree.\n\nReal-world data: We select several real-world data sets to compare the RBP-RT and the other\nstate-of-the-art methods: Protein Structure [8] (N = 45, 730, D = 9), Naval Plants [7] (N =\n11, 934, D = 16), Power Plants [34] (N = 9, 569, D = 4), Concrete [37] (N = 1, 030, D = 8),\nand Airfoil Self-Noise [8] (N = 1, 503, D = 8). Here, we \ufb01rst use PCA to select the 4 largest\ncomponents and then normalize them so that they lie in the unit hypercube for ease of implementation.\nAs before, we set the total number of iterations to 500, \u03bb = 2 and this time set \u03c4 = 2 (i.e. the expected\nnumber of bounding boxes is 180).\nThe resulting Residual Mean Absolute Errors (RMAE) are reported in Table 1. In general, the three\nBayesian tree ensembles perform worse than the random forests. This may in part be due to the\nmaturity of development of the RF algorithm toolbox. While the RBP-RT does not perform as well as\nthe random forests, its performance is comparable to that of BART and MF (sometimes even better),\nbut with many fewer bounding boxes (i.e. parameters) used in the model, clearly demonstrating its\nparsimonious construction.\n\n5 Application to Relational Modeling\n\n5.1 The RBP-Relational Model\n\nAnother possible application of the RBP is in relational modeling. Given relational data as an asym-\nmetric matrix R \u2208 {0, 1}N\u00d7N , with Rij indicating the relation from node i to node j, the bounding\nboxes {(cid:3)k}k with rates {\u03c9k}k belonging to a partition (cid:1) may be used to model communities with\ndifferent intensities of relations.\nThe generative process of an RBP relational model is as follows: (1) Generate a partition (cid:1) on\n[0, 1]2; (2) for k = 1, . . . , K\u03c4 , generate rates \u03c9k \u223c Exp(1); (3) for i, j = 1, . . . , N, generate\nthe row and column coordinates {\u03bei}i,{\u03b7j}j; (4) for i, j = 1, . . . , N, generate relational data\n1+exp(\u2212x) is the logistic function,\nmapping the aggregated relation intensity from R to (0, 1). While here we implement a RBP relational\nmodel with binary interactions (i.e. the Bernoulli likelihood), other types of relations (e.g. categorical\nlikelihoods) can easily be accommodated.\n\nRij \u223c Bernoulli(\u03c3((cid:80)K\u03c4\n\nk=1 \u03c9k \u00b7 111(\u03be,\u03b7)\u2208(cid:3)k (\u03bei, \u03b7j)), where \u03c3(x) =\n\n1\n\n1https://github.com/balajiln/mondrianforest\n\n7\n\n\fTable 2: Relational modeling (link prediction) comparison results (AUC\u00b1std)\n\nData Sets\nDigg\nFlickr\nGplus\nFacebook\nTwitter\n\nIRM\n\n0.80 \u00b1 0.01\n0.88 \u00b1 0.01\n0.86 \u00b1 0.01\n0.87 \u00b1 0.01\n0.87 \u00b1 0.01\n\nLFRM\n\n0.81 \u00b1 0.03\n0.89 \u00b1 0.01\n0.86 \u00b1 0.01\n0.91 \u00b1 0.02\n0.88 \u00b1 0.02\n\nMP-RM\n0.79 \u00b1 0.02\n0.88 \u00b1 0.01\n0.86 \u00b1 0.01\n0.89 \u00b1 0.03\n0.88 \u00b1 0.06\n\nBSP-RM\n0.82 \u00b1 0.02\n0.93 \u00b1 0.02\n0.89 \u00b1 0.02\n0.93 \u00b1 0.02\n0.90 \u00b1 0.01\n\nMTA-RM\n0.83 \u00b1 0.01\n0.90 \u00b1 0.01\n0.86 \u00b1 0.01\n0.91 \u00b1 0.01\n0.88 \u00b1 0.01\n\nRBP-RM\n0.83 \u00b1 0.01\n0.92 \u00b1 0.01\n0.88 \u00b1 0.01\n0.92 \u00b1 0.02\n0.90 \u00b1 0.02\n\nTogether, the RBP and the mapping function \u03c3(\u00b7) play the role of the random function W (\u00b7) de\ufb01ned\nin the graphon literature [27]. Along with the uniformly generated coordinates for each node, the\nRBP relational model is expected to uncover homogeneous interactions in R as compact boxes.\n\n5.2 Sampling for RBP-Relational Model\nThe joint probability of the label data {Rij}i,j, the number of bounding boxes K\u03c4 , the variables\nrelated to the bounding boxes {\u03c9k, s(1:D)\nk=1, and the coordinates {\u03ben, \u03b7n}n for the nodes\n}K\u03c4\n(cid:89)\n(with L(1) = . . . = L(D) = 1 in the RBP relational model) is\nP (R, K\u03c4 ,{\u03c9k, s(1:D)\n(cid:89)\n(cid:89)\n\nP (Rn1,n2|K\u03c4 ,{\u03c9k, s(1:D)\n\n}k,{\u03ben, \u03b7n}n|\u03bb, \u03c4 ) =\n\n}k, \u03ben1 , \u03b7n2 )\n\n(cid:89)\n\n(cid:89)\n\n, l(1:D)\n\n, l(1:D)\n\n, l(1:D)\n\nn1,n2\n\nk\n\nk\n\nk\n\nk\n\nk\n\nk\n\nP (s(d)\n\nk |\u03bb)P (l(d)\n\nk |s(d)\n\nk , \u03bb).\n\n\u00b7P (K\u03c4|\u03c4, \u03bb)K\u03c4 !\n\nP (\u03ben1 )\n\nP (\u03b7n2 )\n\nP (\u03c9k)\n\nn1\n\nn2\n\nk\n\nk,d\n\nn1,n2\n\n\u03b1add =\n\n(cid:81)\n(cid:81)\n\nWe adopt MCMC methods to iteratively sample from the resulting posterior distribution.\nSample K\u03c4 : We use a similar strategy to [1] for updating K\u03c4 . We accept the addition or removal of\na bounding box with an acceptance probability of min(1, \u03b1add) or min(1, \u03b1del) respectively, where\n\nPK\u03c4 +1(Rn1,n2) \u00b7 \u03c4 \u03bb\u2217(1 \u2212 P0)\nPK\u03c4 (Rn1,n2) \u00b7 (K\u03c4 + 1)P0\n\n(cid:81)\n(cid:81)\nwhere \u03bb\u2217 = (1 + \u03bb)2, PK\u03c4 (Rn1,n2) = P (Rn1,n2|K\u03c4 ,{\u03c9k, s(1:D)\n(or 1 \u2212 P0) denotes the probability of proposing to add (or remove) a bounding box.\nSample {\u03c9k}k: For the k-th box, k \u2208 {1,\u00b7\u00b7\u00b7 , K\u03c4}, a new \u03c9\u2217\ndistribution Exp(1). We then accept \u03c9\u2217\n\nPK\u03c4\u22121(Rn1,n2) \u00b7 K\u03c4 P0\nPK\u03c4 (Rn1,n2 ) \u00b7 \u03c4 \u03bb\u2217(1 \u2212 P0)\n, l(1:D)\n\n}k, \u03ben1, \u03b7n2 ) and P0 = 1\n\nk is sampled from the proposal\n\nk with probability min(1, \u03b1), where\n\n, \u03b1del =\n\nn1,n2\n\nn1,n2\n\nn1,n2\n\nk\n\nk\n\n2\n\n,\n\n(cid:89)\n\nn1,n2\n\n\u03b1 =\n\nP (Rn1,n2|K\u03c4 ,{\u03c9k(cid:48)}k(cid:48)(cid:54)=k, \u03c9\u2217\n\nP (Rn1,n2|K\u03c4 ,{\u03c9k, s(1:D)\n\nk\n\nk,{s(1:D)\nk(cid:48)\n, l(1:D)\n\nk\n\n}k(cid:48), \u03ben1 , \u03b7n2)\n\n, l(1:D)\nk(cid:48)\n}k, \u03ben1, \u03b7n2 )\n\n.\n\nk }k,d: This update is the same as for the RBP-regression tree sampler (Section 4.2).\nSample {u(d)\nSample {\u03ben1}n1,{\u03b7n2}n2 We propose new \u03ben1, \u03b7n2 values from the uniform prior distribution.\nThus, the acceptance probability is purely based on the likelihoods of the proposed and current\ncon\ufb01gurations.\n\n5.3 Experiments\n\nWe empirically test the RBP relational model (RBP-RM) for link prediction. We compare the RBP-\nRM with four state-of-the-art relational models: (1) IRM [17] (regular grids); (2) LFRM [24] (plaid\ngrids); (3) MP relational model (MP-RM) [32] (hierarchical kd-tree); (4) BSP-tree relational model\n(BSP-RM) [9]; (5) Matrix tile analysis relational model (MTA-RM) [13] (noncontiguous tiles). For\ninference on the IRM and LFRM, we adopt collapsed Gibbs sampling algorithms, for MP-RM we use\nreversible-jump MCMC [35], for BSP-RM we use particle MCMC, and for MTA-RM we implement\nthe iterative conditional modes algorithm used in [13].\nData Sets: Five social network data sets are used: Digg, Flickr [38], Gplus [23], Facebook and\nTwitter [20]. We extract a subset of nodes (the top 1000 active nodes based on their interactions with\nothers) from each data set for constructing the relational data matrix.\n\n8\n\n\fFigure 4: Visualisation of the RBP relational model partition structure for \ufb01ve relational data sets:\n(left to right) Digg, Flickr, Gplus, Facebook and Twitter.\n\nExperimental Setting: The hyper-parameters for each method are set as follows: In IRM, we let\nthe concentration parameter \u03b1 be sampled from a gamma \u0393(1, 1) prior, and the row and column\npartitions be sampled from two independent Dirichlet processes; In LFRM, we let \u03b1 be sampled from\na gamma \u0393(2, 1) prior. As the budget parameter for MP-RM and BSP-RM is hard to sample [19], we\nset it to 3, implying that around (3 + 1) \u00d7 (3 + 1) blocks would be generated. For the parametric\nmodel MTA-RM, we simply set the number of tiles to 16; In RBP-RM, we set \u03bb = 0.99 and \u03c4 = 3,\nwhich leads to an expectation of 12 boxes. The reported performance is averaged over 10 randomly\nselected hold-out test sets (Train:Test = 9:1).\nResults: Table 2 reports the link prediction performance comparisons for each method and datasets.\nWe see that the RBP-RM achieves competitive performance against the other methods. Even on the\ndata sets it does not obtain the best score, its performance is comparable to the best. The overall results\nvalidate that the RBP-RM is effective in relational modelling due to its \ufb02exible and parsimonious\nconstruction, attaching bounding boxes to dense data regions.\nFigure 4 visually illustrates the RBP-RM partitions patterns for each dataset. As is apparent, the\nbounding-based RBP-RM method indeed describing dense regions of relational data matrices with\nrelatively few bounding boxes (i.e. parameters). An interesting observation from this partition\nformat, is that the overlapping bounding boxes are very useful for describing inter-community inter-\nactions (e.g. overlapping bounding boxes in Digg, Flickr, and Gplus) and community-in-community\ninteractions (e.g. upper-right corner in Flickr and Gplus). Thus, in addition to competitive and\nparsimonious performance, the RBP-RM also produces intuitively interpretable and meaningful\npartitions (Figure 4).\n\n6 Conclusion\n\nWe have introduced a novel and parsimonious stochastic partition process \u2013 the Rectangular Bounding\nProcess (RBP). Instead of the typical cutting-based strategy of existing partition models, we adopt\na bounding-based strategy to attach rectangular bounding boxes to model dense data regions in\na multi-dimensional space, which is able to avoid unnecessary dissections in sparse data regions.\nThe RBP was demonstrated in two applications: regression trees and relational modelling. The\nexperimental results on these two applications validate the merit of the RBP, that is, competitive\nperformance against existing state-of-the-art methods, while using fewer bounding boxes (i.e. fewer\nparameters).\n\nAcknowledgements\n\nXuhui Fan and Scott A. Sisson are supported by the Australian Research Council through the Aus-\ntralian Centre of Excellence in Mathematical and Statistical Frontiers (ACEMS, CE140100049), and\nScott A. Sisson through the Discovery Project Scheme (DP160102544). Bin Li is supported by Fudan\nUniversity Startup Research Grant and Shanghai Municipal Science & Technology Commission\n(16JC1420401).\n\nReferences\n[1] Ryan P. Adams, Iain Murray, and David J.C. MacKay. Tractable nonparametric Bayesian\ninference in Poisson processes with Gaussian process intensities. In ICML, pages 9\u201316, 2009.\n\n9\n\n\f[2] Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg, and Eric P. Xing. Mixed membership\n\nstochastic blockmodels. In NIPS, pages 33\u201340, 2009.\n\n[3] Leo Breiman. Random forests. Machine Learning, 45(1):5\u201332, 2001.\n\n[4] Jos\u00e9 Caldas and Samuel Kaski. Bayesian biclustering with the plaid model. In MLSP 2008.\n\nIEEE Workshop on, pages 291\u2013296. IEEE, 2008.\n\n[5] Hugh A. Chipman, Edward I. George, and Robert E. McCulloch. Bart: Bayesian additive\n\nregression trees. The Annals of Applied Statistics, 4(1):266\u2013298, 2010.\n\n[6] Kai-Lai Chung. A Course in Probability Theory. Academic Press, 2001.\n\n[7] Andrea Coraddu, Luca Oneto, Aessandro Ghio, Stefano Savio, Davide Anguita, and Massimo\nFigari. Machine learning approaches for improving condition-based maintenance of naval\npropulsion plants. Proceedings of the Institution of Mechanical Engineers, Part M: Journal of\nEngineering for the Maritime Environment, 230(1):136\u2013153, 2016.\n\n[8] Dua Dheeru and E\ufb01 Karra Taniskidou. UCI machine learning repository, 2017.\n\n[9] Xuhui Fan, Bin Li, and Scott A. Sisson. The binary space partitioning-tree process. In AISTATS,\n\nvolume 84, pages 1859\u20131867. PMLR, 2018.\n\n[10] Xuhui Fan, Bin Li, Yi Wang, Yang Wang, and Fang Chen. The Ostomachion Process. In AAAI,\n\npages 1547\u20131553, 2016.\n\n[11] Yoav Freund, Robert Schapire, and Naoki Abe. A short introduction to boosting. Journal-\n\nJapanese Society For Arti\ufb01cial Intelligence, 14(771-780):1612, 1999.\n\n[12] Pierre Geurts, Damien Ernst, and Louis Wehenkel. Extremely randomized trees. Machine\n\nlearning, 63(1):3\u201342, 2006.\n\n[13] Inmar Givoni, Vincent Cheung, and Brendan Frey. Matrix tile analysis. In UAI, pages 200\u2013207,\n\n2006.\n\n[14] Katsuhiko Ishiguro, Tomoharu Iwata, Naonori Ueda, and Joshua B. Tenenbaum. Dynamic\nin\ufb01nite relational model for time-varying relational data analysis. In NIPS, pages 919\u2013927,\n2010.\n\n[15] Katsuhiko Ishiguro, Issei Sato, Masahiro Nakano, Akisato Kimura, and Naonori Ueda. In\ufb01nite\n\nplaid models for in\ufb01nite bi-clustering. In AAAI, pages 1701\u20131708, 2016.\n\n[16] Brian Karrer and Mark E.J. Newman. Stochastic blockmodels and community structure in\n\nnetworks. Physical Review E, 83(1):016107, 2011.\n\n[17] Charles Kemp, Joshua B. Tenenbaum, Thomas L. Grif\ufb01ths, Takeshi Yamada, and Naonori Ueda.\nLearning systems of concepts with an in\ufb01nite relational model. In AAAI, volume 3, pages\n381\u2013388, 2006.\n\n[18] Balaji Lakshminarayanan, Daniel M. Roy, and Yee Whye Teh. Mondrian forests: Ef\ufb01cient\n\nonline random forests. In NIPS, pages 3140\u20133148, 2014.\n\n[19] Balaji Lakshminarayanan, Daniel M. Roy, and Yee Whye Teh. Mondrian forests for large-scale\n\nregression when uncertainty matters. In AISTATS, pages 1478\u20131487, 2016.\n\n[20] Jure Leskovec, Daniel Huttenlocher, and Jon Kleinberg. Predicting positive and negative links\n\nin online social networks. In WWW, pages 641\u2013650, 2010.\n\n[21] Bin Li, Qiang Yang, and Xiangyang Xue. Transfer learning for collaborative \ufb01ltering via a\n\nrating-matrix generative model. In ICML, pages 617\u2013624, 2009.\n\n[22] Antonio R Linero. Bayesian regression trees for high-dimensional prediction and variable\n\nselection. Journal of the American Statistical Association, pages 1\u201311, 2018.\n\n[23] Julian J. McAuley and Jure Leskovec. Learning to discover social circles in ego networks. In\n\nNIPS, pages 548\u2013556, 2012.\n\n10\n\n\f[24] Kurt Miller, Michael I. Jordan, and Thomas L. Grif\ufb01ths. Nonparametric latent feature models\n\nfor link prediction. In NIPS, pages 1276\u20131284, 2009.\n\n[25] Masahiro Nakano, Katsuhiko Ishiguro, Akisato Kimura, Takeshi Yamada, and Naonori Ueda.\n\nRectangular tiling process. In ICML, pages 361\u2013369, 2014.\n\n[26] Krzysztof Nowicki and Tom A.B. Snijders. Estimation and prediction for stochastic block\n\nstructures. Journal of the American Statistical Association, 96(455):1077\u20131087, 2001.\n\n[27] Peter Orbanz and Daniel M Roy. Bayesian models of graphs, arrays and other exchangeable\nrandom structures. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(2):437\u2013\n461, 2015.\n\n[28] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel,\nP. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher,\nM. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine\nLearning Research, 12:2825\u20132830, 2011.\n\n[29] Matthew Pratola, Hugh Chipman, Edward George, and Robert McCulloch. Heteroscedastic\n\nbart using multiplicative regression trees. arXiv preprint arXiv:1709.07542, 2017.\n\n[30] Matthew T Pratola, Hugh A Chipman, James R Gattiker, David M Higdon, Robert McCulloch,\nand William N Rust. Parallel bayesian additive regression trees. Journal of Computational and\nGraphical Statistics, 23(3):830\u2013852, 2014.\n\n[31] Daniel M. Roy. Computability, Inference and Modeling in Probabilistic Programming. PhD\n\nthesis, MIT, 2011.\n\n[32] Daniel M. Roy and Yee Whye Teh. The Mondrian process. In NIPS, pages 1377\u20131384, 2009.\n\n[33] Mikkel N. Schmidt and Morten M\u00f8rup. Nonparametric Bayesian modeling of complex networks:\n\nAn introduction. IEEE Signal Processing Magazine, 30(3):110\u2013128, 2013.\n\n[34] P\u0131nar T\u00fcfekci. Prediction of full load electrical power output of a base load operated combined\ncycle power plant using machine learning methods. International Journal of Electrical Power\n& Energy Systems, 60:126 \u2013 140, 2014.\n\n[35] Pu Wang, Kathryn B. Laskey, Carlotta Domeniconi, and Michael I. Jordan. Nonparametric\n\nBayesian co-clustering ensembles. In SDM, pages 331\u2013342, 2011.\n\n[36] Yi Wang, Bin Li, Yang Wang, and Fang Chen. Metadata dependent Mondrian processes. In\n\nICML, pages 1339\u20131347, 2015.\n\n[37] I. Cheng Yeh. Modeling of strength of high-performance concrete using arti\ufb01cial neural\n\nnetworks. Cement and Concrete research, 28(12):1797\u20131808, 1998.\n\n[38] Reza Zafarani and Huan Liu. Social computing data repository at ASU, 2009.\n\n11\n\n\f", "award": [], "sourceid": 3779, "authors": [{"given_name": "Xuhui", "family_name": "Fan", "institution": "University of New South Wales"}, {"given_name": "Bin", "family_name": "Li", "institution": "Fudan University"}, {"given_name": "Scott", "family_name": "SIsson", "institution": null}]}