{"title": "Learning Signed Determinantal Point Processes through the Principal Minor Assignment Problem", "book": "Advances in Neural Information Processing Systems", "page_first": 7365, "page_last": 7374, "abstract": "Symmetric determinantal point processes (DPP) are a class of probabilistic models that encode the random selection of items that have a repulsive behavior. They have attracted a lot of attention in machine learning, where returning diverse sets of items is sought for. Sampling and learning these symmetric DPP's is pretty well understood. In this work, we consider a new class of DPP's, which we call signed DPP's, where we break the symmetry and allow attractive behaviors. We set the ground for learning signed DPP's through a method of moments, by solving the so called principal assignment problem for a class of matrices $K$ that satisfy $K_{i,j}=\\pm K_{j,i}$, $i\\neq j$, in polynomial time.", "full_text": "Learning Signed Determinantal Point Processes\nthrough the Principal Minor Assignment Problem\n\nVictor-Emmanuel Brunel\nDepartment of Mathematics\n\nMassachusetts Institute of Technology\n\nCambridge, MA 02139\nvebrunel@mit.edu\n\nAbstract\n\nSymmetric determinantal point processes (DPP) are a class of probabilistic models\nthat encode the random selection of items that have a repulsive behavior. They\nhave attracted a lot of attention in machine learning, where returning diverse sets\nof items is sought for. Sampling and learning these symmetric DPP\u2019s is pretty\nwell understood. In this work, we consider a new class of DPP\u2019s, which we call\nsigned DPP\u2019s, where we break the symmetry and allow attractive behaviors. We\nset the ground for learning signed DPP\u2019s through a method of moments, by solving\nthe so called principal assignment problem for a class of matrices K that satisfy\n\nKi,j=\u00b1Kj,i, i\u2260 j, in polynomial time.\n\n1\n\nIntroduction\n\nRandom point processes on \ufb01nite spaces are probabilistic distributions that allow to model random\nselections of sets of items from a \ufb01nite collection. For example, the basket of a random customer\nin a store is a random subset of items selected from that store. In some contexts, random point\nprocesses are encoded as random binary vectors, where the 1 coordinates correspond to the selected\nitems. A very famous subclass of random point processes, much used in statistical mechanics, is\ncalled the Ising model, where the log-likelihood function is a quadratic polynomial in the coordinates\nof the binary vector. More generally, Markov random \ufb01elds encompass models of random point\nprocesses where stochastic dependence between the coordinates of the random vector is encoded in\nan undirected graph. In recent years, a different family of random point processes has attracted a lot\nof attention, mainly for its computational tractability: determinantal point processes (DPP\u2019s). DPP\u2019s\nwere \ufb01rst studied and used in statistical mechanics [19]. Then, following the seminal work [15],\ndiscrete DPP\u2019s have been used increasingly in various applications such as recommender systems\n[10, 11], document and timeline summarization [18, 27], image search [15, 1] and segmentation [17],\naudio signal processing [26], bioinformatics [5] and neuroscience [24].\nA DPP on a \ufb01nite space is a random subset of that space whose inclusion probabilities are determined\nby the principal minors of a given matrix. More precisely, encode the \ufb01nite space with labels\n\n[N]={1, 2, . . . , N}, where N is the size of the space. A DPP is a random subset Y \u2286[N] such that\nP[J\u2286 Y]= det(KJ), for all \ufb01xed J\u2286[N], where K is an N\u00d7 N matrix with real entries, called\nthe kernel of the DPP, and KJ=(Ki,j)i,j\u2208J is the square submatrix of K associated with the set J.\nprincipal minors of the matrix L= K(I\u2212 K)1, where I is the N\u00d7 N identity matrix. DPP\u2019s with\n\nIn the applications cited above, it is assumed that K is a symmetric matrix. In that case, it is shown\n(e.g., see [16]) that a suf\ufb01cient and necessary condition for K to be the kernel of a DPP is that all\nits eigenvalues are between 0 and 1. If, in addition, 1 is not an eigenvalue of K, then the DPP with\nkernel K is also known as an L-ensemble, where the probability mass function is proportional to the\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fsymmetric kernels, which we refer to as symmetric DPP\u2019s, model repulsive interactions: Indeed, they\nimply a strong negative dependence between items, called negative association [7].\nRecently, symmetric DPP\u2019s have become popular in recommender systems, e.g., automatized systems\nthat seek for good recommendations for users on online shopping websites [10]. The main idea is to\nmodel a random basket as a DPP and learn the kernel K based on previous observations. Then, for\na new customer, predict which items are the most likely to be selected next, given his/her current\n\nbasket, by maximizing the conditional probability P[J\u222a{i}\u2286 YJ\u2286 Y] over all items i that are\n\nnot yet in the current basket J. One very attractive feature of DPP\u2019s is that if the \ufb01nal basket Y of a\nrandom user is modeled as a DPP, the latter conditional probability is tractable and can be computed\nin a polynomial time in N. However, if the kernel K is symmetric, this procedure enforces diversity\nin the baskets that are modeled, because of the negative association property. However, in general,\nnot all items should be modeled as repelling each other. For instance, say, on a website that sells\nhousehold goods, grounded coffee and coffee \ufb01lters should rather be modeled as attracting each\nother, since a user who buys grounded coffee is more likely to also buy coffee \ufb01lters. In this work,\nwe extend the class of symmetric DPP\u2019s in order to account for possible attractive interactions, by\nconsidering nonsymmetric kernels. In the learning prospective, this extended model poses a question:\nHow to estimate the kernel, based on past observations? In the case of symmetric kernels, this\nproblem has been tackled in several works [12, 1, 20, 4, 9, 10, 11, 21, 8, 25]. Here, we assume that\nK is nonparametric, i.e., it is not parametrized by a low dimensional parameter. As explained in\n[8] in the symmetric case, the maximum likelihood approach requires to solve a highly non convex\noptimization problem, and even though some algorithms have been proposed such as \ufb01xed point\nalgorithms [21], Expectation-Maximisation [12], MCMC [1], neither computational nor statistical\nguarantees are given. The method of moments proposed in [25] provides a polynomial time algorithm\nbased on the estimation of a small number of principal minors of K, and \ufb01nding a symmetric matrix\n\u02c6K whose principal minors approximately match the estimated ones. This algorithm is closely related\nto the principal minor assignment problem. Here, we are interested in learning a nonsymmetric\nkernel given available estimates of its principal minors; In order to simplify the exposition, we always\nassume that the available list of principal minors is exact, not approximate.\nIn Section 2, we recall the de\ufb01nition of DPP\u2019s, we de\ufb01ne a new class of nonsymmetric kernels, that\nwe call signed kernels and we characterize the set of admissible kernels under lack of symmetry. We\npose the questions of identi\ufb01ability of the kernel of a signed DPP and show that this question, together\nwith the problem of learning the kernel, are related to the principal minor assignment problem. In\nSection 3, we propose a solution to the principal minor assignment problem for signed kernels, which\nyields a polynomial time learning algorithm for the kernel of a signed DPP.\n\n2 Determinantal Point Processes\n\n2.1 De\ufb01nitions\n\nfollowing holds:\n\nDe\ufb01nition 1 (Discrete Determinantal Point Process). A Determinantal Point Process (DPP) on the\n\nP[J\u2286 Y]= det(KJ), \u2200J\u2286[N],\n\n(1)\nwhere KJ is the submatrix of K obtained by keeping the columns and rows of K whose indices are\n\n\ufb01nite set[N] is a random subset Y \u2286[N] for which there exists a matrix K\u2208 RN\u00d7N such that the\nin J. The matrix K is called the kernel of the DPP, and we write Y \u223c DPP(K).\nNote that not all matrices K\u2208 RN\u00d7N give rise to a DPP since, for instance, the numbers det(KJ)\nfrom (1) must all lie in[0, 1], and be nonincreasing with the set J. We call a matrix K\u2208 RN\u00d7N\nfollowing proposition where, for all J \u2286[N], we denote by IJ the diagonal matrix whose j-th\ndiagonal entry is 1 if j\u2208 J, 0 otherwise.\nProposition 1. A matrix K\u2208 RN\u00d7N is admissible if and only if(\u22121)J det(K\u2212 IJ)\u2265 0, for all\nJ\u2286[N].\nProof. By [16], if Y \u223c DPP(K), then, necessarily, 0\u2264 P[Y = J]=(\u22121)N\u2212J det(K\u2212 I \u00afJ) for\nall J \u2286 [N]. Conversely, assume(\u22121)J det(K\u2212 IJ) \u2265 0 for all J \u2286 [N]. Denote by pJ =\n\nIn short, the inclusion probabilities of a DPP are given by the principal minors of some matrix K.\n\nadmissible if there exists a DPP with kernel K. As a simple consequence of [16], we have the\n\n2\n\n\fadmissible if and only if L is a P0-matrix, i.e., all its principal minors are nonnegative. If, in addition,\nK is invertible, then it is admissible if and only if L is a P -matrix, i.e., all its principal minors are\n\nJ\u2286[N] pJ = 1. Hence, one can\n(\u22121) \u00afJ det(K\u2212 I \u00afJ), for all J \u2286[N]. By a standard computation, Q\nde\ufb01ne a random subset Y \u2286 [N] with P[Y = J] = pJ for all J \u2286 [N]. A simple application\nof the inclusion-exclusion principle yields that P[J \u2286 Y] = det(KJ) for all J \u2286 [N], hence,\nY \u223c DPP(K).\nLet K\u2208 RN\u00d7N . Assume that I\u2212 K is invertible and let L= K(I\u2212 K)\u22121. Then, I+ L=(I\u2212 K)\u22121\nis invertible and by [16], det(LJ)~ det(I+ L)=(\u22121) \u00afJ det(K\u2212 I \u00afJ) for all J\u2286[N]. Hence, K is\npositive, if and only if T K+(I\u2212 T)(I\u2212 K) is invertible for all diagonal matrices T with entries in\n[0, 1] (see [14, Theorem 3.3]). Hence, it is easy to see that any matrix K of the form D+ \u00b5A, where\nD is a diagonal matrix with Di,i\u2208[\u03bb, 1\u2212 \u03bb], i= 1, . . . , N, for some \u03bb\u2208(0, 1~2), A\u2208[\u22121, 1]N\u00d7N\nand 0\u2264 \u00b5< \u03bb~(N\u2212 1), is admissible.\ncase, it is well known ([16]) that admissibility is equivalent to lie in the intersectionS of two copies of\nthe cone of positive semide\ufb01nite matrices: K\u0002 0 and I\u2212 K\u0002 0. Such processes possess a very strong\nproperty of negative dependence: negative association. A simple observation is that if Y \u223c DPP(K)\nfor some symmetric K\u2208S, then cov(1i\u2208Y , 1j\u2208Y)=\u2212K 2\ni,j\u2264 0, for all i, j\u2208[N], i\u2260 j. Moreover, if\nJ, J\u2032 are two disjoint subsets of[N], then cov(1J\u2286Y , 1J\u2032\u2286Y)= det(KJ\u222aJ\u2032)\u2212 det(KJ) det(K\u2032\nJ)\u2264\n0. Negative association is the property that, more generally, cov(f(Y \u2229 J), g(Y \u2229 J))\u2264 0 for\nall disjoint subsets J, J\u2032 \u2286 [N] and for all nondecreasing functions f, g \u2236 P([N]) \u2192 R (i.e.,\nf(J1)\u2264 f(J2),\u2200J1\u2286 J2\u2286[N]), whereP([N]) is the power set of[N]. We refer to [6] for more\n\nSymmetric DPP\u2019s Most commonly, DPP\u2019s are de\ufb01ned with a real symmetric kernel K. In that\n\ndetails on the account of negative association. For their computational appeal, it is very tempting\nto apply DPP\u2019s in order to model interactions, e.g., as an alternative to Ising models. However, the\nnegative association property of DPP\u2019s with symmetric kernels is unreasonably restrictive in several\ncontexts, for it forces repulsive interactions between items. Next, we extend the class of DPP\u2019s with\nsymmetric kernels in a simple way which is yet also allowing for attractive interactions.\n\nSigned DPP\u2019s We introduce the classT of signed kernels, i.e., matrices K\u2208 RN\u00d7N such that for\nall i, j\u2208[N] with i\u2260 j, Kj,i=\u00b1Ki,j, i.e., Kj,i= \u0001i,jKi,j for some \u0001i,j\u2208{\u22121, 1}. We call a signed\nDPP any DPP with kernel K\u2208T . As of particular interest, one can also consider signed block DPP\u2019s,\nwith kernels K\u2208T , where there is a partition of[N] into pairwise disjoint, nonempty groups such\nthat Kj,i=\u2212Ki,j if i and j are in the same group (hence, i and j attract each other), Kj,i= Ki,j if i\n\nand j are in different groups (hence, i and j repel each other).\n\n2.2 Learning DPP\u2019s\n\nThe main purpose of this work is to understand how to learn the kernel of a nonsymmetric DPP,\ngiven i.i.d. copies of that DPP. Namely, if Y1, . . . , Yn\nto estimate K from the observation of Y1, . . . , Yn? First comes the question of identi\ufb01ability of K:\n\ni.i.d.\u223c DPP(K) for some unknown K\u2208T , how\ntwo matrices K, K\u2032\u2208T can give rise to the same DPP. To be more speci\ufb01c, DPP(K)= DPP(K\u2032)\nif and only if K and K\u2032 have the same list of principal minors. Hence, the kernel of a DPP is not\n\nnecessarily unique. It is actually easy to see that it is unique if and only if it is diagonal. A \ufb01rst\nnatural question that arises in learning the kernel of a DPP is the following:\n\n\u201cWhat is the collection of all matrices K\u2208T that produce a given DPP?\"\n\nGiven that the kernel of Y1 is not uniquely de\ufb01ned, the goal is no longer to estimate K exactly, but\none possible kernel that would give rise to the same DPP as K. The route that we follow is similar to\nthat followed by [25], which is based on a method of moments. However, lack of symmetry of K\nrequires signi\ufb01cantly different ideas. The idea is based on the fact that only few principal minors\nof K are necessary in order to completely recover K up to identi\ufb01ability. Moreover, each principal\n\nminor \u2206J\u2236= det(KJ) can be estimated from the samples by \u02c6\u2206J= n\u22121\u2211n\ni=1\n\n1J\u2286Yi. Since this last\n\nstep is straightforward, we only focus on the problem of complete recovery of K, up to identi\ufb01ability,\ngiven a list of few of its principal minors. In other words, we will ask the following question:\n\n3\n\n\f\u201cGiven an available list of prescribed principal minors, how to recover a matrix K\u2208T whose\n\nprincipal minors are given by that list, using as few queries from that list as possible?\"\n\nThis question, together with the one we asked for identi\ufb01ability, is known as the principal minor\nassignment problem, which we state precisely in the next section.\n\ntwo questions:\n\n2.3 The principal minor assignment problem\n\nThe principal minor assignment problem (PMA) is a well known problem in linear algebra that\n\nconsists of \ufb01nding a matrix with a prescribed list of principal minors [23]. LetH\u2286 CN\u00d7N be a\ncollection of matrices. Typically,H is the set of Hermitian matrices, or real symmetric matrices or, in\nthis work,H=T . Given a list(aJ)J\u2286[N],J\u2260\u089d of 2N\u2212 1 complex numbers, (PMA) asks the following\n(PMA1) Find a matrix K\u2208H such that det(KJ)= aJ,\u2200J\u2286[N], J\u2260\u089d.\nsolution exists, i.e., the list(aJ)J\u2286[N],J\u2260\u089d is a valid list of prescribed principal minors, and we aim\n\nA third question, which we do not address here, is to decide whether (PMA1) has a solution. It is\nknown that this would require the aJ\u2019s to satisfy polynomial equations [22]. Here, we assume that a\n\n(PMA2) Describe the set of all solutions of (PMA1).\n\nto answer (PMA1) ef\ufb01ciently, i.e., output a solution in polynomial time in the size N of the problem,\nand to answer (PMA2) at a purely theoretical level. In the framework of DPP\u2019s, (PMA1) is related to\nthe problem of estimating K by a method of moments and (PMA2) concerns the identi\ufb01ability of K.\n\n3 Solving the principal minor assignment problem for nonsymmetric DPP\u2019s\n\n3.1 Preliminaries: PMA for symmetric matrices\n\nHere, we brie\ufb02y describe the PMA problem for symmetric matrices, i.e.,H=S, the set of real\nsymmetric N\u00d7 N matrices. This will give some intuition for the next section.\nThe adjacency graph GK =([N], EK) of a matrix a matrix K\u2208S is the undirected graph on N\nvertices, where, for all i, j\u2208[N],{i, j}\u2208 EK \u21d0\u21d2 Ki,j\u2260 0. As a consequence of Fact 1, we have:\n\nFact 1. The principal minors of order one and two of a symmetric matrix completely determine its\ndiagonal entries and the magnitudes of its off diagonal entries.\n\nFact 2. The adjacency graph of any symmetric solution of (PMA1) can be learned by querying the\nprincipal minors of order one and two. Moreover, any two symmetric solutions of (PMA1) have the\nsame adjacency graph.\n\nThen, the signs of the off diagonal entries of a symmetric solution of (PMA1) should be determined\nusing queries of higher order principal minors, and the idea is based on the next fact. For a matrix\n\nK\u2208S and a cycle C in GK, denote by \u03c0K(C) the product of entries of K along the cycle C, i.e.,\n\u03c0K(C)= M{i,j}\u2208C\u2236i<j\nFact 3. For all matrices K\u2208S and all J\u2286[N], det(KJ) only depends on the diagonal entries of\nKJ, the magnitude of its off diagonal entries and the \u03c0K(C), for all cycles C in the subgraph of GK\nwhere all vertices j\u2209 J have been deleted.\n\nKi,j.\n\ndet(KJ)= Q\n\u03c3\u2208SJ\n\n(\u22121)\u03c3M\nj\u2208J\n\nKj,\u03c3(j),\n\nFact 3 is a simple consequence of the fundamental formula:\n\nas a product of cyclic permutations. Finally, every undirected graph has a cycle basis made of induced\n\nwhere SJ is the group of permutations of J, Moreover, every permutation \u03c3\u2208 S can be decomposed\ncycles, i.e., there is a small familyB of induced cycles such that every cycle (seen as a collection\nof edges) in the graph can be decomposed as the symmetric difference of cycles that belong toB.\nThen, it is easy to see that for all cycles C in the graph GK, \u03c0K(C) can be written as the product\nof some \u03c0K( \u02dcC), for some cycles \u02dcC\u2208B and of some K 2\ni,j\u2019s, i\u2260 j. Moreover, for all induced cycles\n\n(2)\n\n4\n\n\f\u0001i,j be the product of the \u0001i,j\u2019s along\n\ndet(K{i,j})= Ki,iKj,j\u2212 \u0001i,jK 2\n\ni,j.\n\nFact 2, GK can be learned, what remains is to \ufb01nd a cycle basis of GK, made of induced cycles only,\nwhich can be performed in polynomial time (see [13, 2]) and, for each cycle C in that basis, query the\n\nC in the aforementioned basis. Finding such a sign assignment consists of solving a linear system in\nGF2 (see Section 1 in the Supplementary Material).\n\nC in GK, \u03c0K(C) can be determined from det(KJ), where J is the set of vertices of C. Since, by\ncorresponding principal minor of K in order to learn \u03c0K(C). Finally, in order to determine the signs\nof the off diagonal entries of K, \ufb01nd a sign assignment that matches with the signs of the \u03c0K(C), for\n3.2 PMA whenH=T , general case\nWe now turn to the caseH=T . First, as in the symmetric case, the diagonal entries of any matrix\nK\u2208T are given by its principal minors of order 1. Now, let i< j and consider the principal minor of\nK corresponding to J={i, j}:\nHence,Ki,j and \u0001i,j can be learned from the principal minors of K corresponding to the sets\n{i},{j} and{i, j}.\nNote that if K \u2208T , one can still de\ufb01ne its adjacency graph GK as in the symmetric case, since\nKi,j\u2260 0 \u21d0\u21d2 Kj,i\u2260 0, for all i\u2260 j. Recall that we identify a cycle of a graph with its edge set. For\nall K\u2208T and for all cycles C in GK, let \u0001K(C)= M{i,j}\u2208C\u2236i<j\nthe edges of C, where \u0001i,j\u2208{\u22121, 1} is such that Ki,j= \u0001i,jKj,i. Note that the condition \u201ci< j\" in\nthe de\ufb01nition of \u0001K(C) is only to ensure no repetition in the product. Now, unlike in the symmetric\ncase, we need to be more careful when de\ufb01ning \u03c0K(C), for a cycle C of GK, since the direction in\nDe\ufb01nition 2. A signed graph is an undirected graph([N], E) where each edge is assigned a sign\n\u22121 or+1.\nIn the sequel, we make the adjacency graph GK of any matrix K\u2208T signed by assigning \u0001i,j to each\nedge{i, j} of the graph. As we noticed above, the signed adjacency graph of K can be learned from\nof G whose vertex set coincides with that of C. The set of travelings of C is denoted by T(C).\nIn Figure 1, the cycle C= 1\u2194 2\u2194 3\u2194 4\u2194 1 has six travelings:\n\u2014\u2192\n\u2014\u2192\nC3= 1\u2192 2\u2192\nC1= 1\u2192 2\u2192 3\u2192 4\u2192 1,\n\u2014\u2192\nC5= 1\u2192 4\u2192 2\u2192 3\u2192 1 and\n4\u2192 3\u2192 1,\n\u2014\u2192\nC6= 1\u2192 3\u2192 2\u2192 4\u2192 1.\nFormally, while we identify a cycle with its edge set (e.g., C =\n{{1, 2},{2, 3},{3, 4},{1, 4}}, we identify its travelings with sets\n\u2014\u2192\nC1=\n{(1, 2),(2, 3),(3, 4),(4, 1)}). Also, for simplicity, we always de-\nnote oriented cycles using the symbol\u2014\u2192\u22c5\nDe\ufb01nition 4. Let K\u2208T and C be a cycle in GK. We denote by \u03c0K(C)= Q\u2014\u2192\n\nits principal minors of orders one and two. Unlike in the symmetric case, induced cycles might be of\nno help to determine the signs of the off diagonal entries of K.\n\nFor instance, an induced cycle has exactly two travelings, corre-\nsponding to the two possible orientations of C.\n\n\u2014\u2192\nC2= 1\u2192 4\u2192 3\u2192 2\u2192 1,\n\n\u2014\u2192\nC4= 1\u2192 3\u2192 4\u2192 2\u2192 1,\n\nDe\ufb01nition 3. Let G be an undirected graph and C a cycle of G. A traveling of C is an oriented cycle\n\nof ordered pairs corresponding to their oriented edges (e.g.,\n\n(e.g.,\n\nC as opposed to C, which would stand for an\n\nwhich C is traveled matters.\n\nunoriented cycle).\n\nFigure 1:\n\nA signed graph\n\nC\u2208T(C) M(i,j)\u2208\u2014\u2192\n\nC\n\nKi,j.\n\n\u2014\u2192\n\n5\n\n\fwhere the oriented cycles\n\nC3)\u0002 K1,2K2,4K4,3K3,1\n\nFor example, if the graph in Figure 1 is the adjacency graph of some K \u2208T and C is the cycle\nC= 1\u2194 2\u2194 3\u2194 4\u2194 1, then,\n\u03c0K(C)= K1,2K2,3K3,4K4,1+ K1,4K4,3K3,2K2,1+ K1,2K2,4K4,3K3,1+ K1,3K3,4K4,2K2,1\n+ K1,4K4,2K2,3K3,1+ K1,3K3,2K2,4K4,1\n=\u00021+ \u0001K(\u2014\u2192\nC1)\u0002 K1,2K2,3K3,4K4,1+\u00021+ \u0001K(\u2014\u2192\n+\u00021+ \u0001K(\u2014\u2192\nC5)\u0002 K1,4K4,2K2,3K3,1\n= 2K1,3K3,2K2,4K4,1.\nC5 are given above, and where we use the shortcut \u0001K(\u2014\u2192\n\u2014\u2192\n\u2014\u2192\n\u2014\u2192\nCj)\n(j= 1, 3, 5) to denote \u0001K(Cj), where Cj is the unoriented version of\nIn the same example, there are only two triangles T (i.e., cycles of size 3) that satisfy \u03c0K(T)\u2260 0:\n1\u2194 3\u2194 4\u2194 1 and 2\u2194 3\u2194 4\u2194 2.\nLemma 1. For all J \u2286 [N], det(KJ) can be written as a function of the Ki,i, K 2\ni, j\u2208 J, i\u2260 j and \u03c0K(C)\u2019s, for all cycles C in GKJ , the subgraph of GK where all vertices j\u2209 J\nProof. Write a permutation \u03c3\u2208 SJ as a product of cyclic permutations \u03c3= \u03c31\u25cb\u03c32\u25cb. . .\u25cb\u03c3p. For each\n\u2014\u2192\nj= 1, . . . , p, assume that \u03c3j correspond to an oriented cycle\np where, for all j= 1, . . . , p, \u03c3\u2032\n1, . . . , \u03c3\u2032\ncan be decomposed as a product of p cyclic permutations \u03c3\u2032\n\nCj of GK, otherwise the contribution of\n\u03c3 to the sum (2) is zero. Then, the lemma follows by grouping all permutations in the sum (2) that\nj\n\nThe following result, yet a simple consequence of (2), is fundamental.\n\ni,j, \u0001i,j\u2019s, for\n\nare removed.\n\nhas the same support as \u03c3j.\n\nC1,\n\nC3 and\n\n\u2014\u2192\n\nCj.\n\nAs a consequence, we note that unlike in the symmetric case, the signs of the off diagonal entries\ncan no longer be determined using a cycle basis of induced cycles, since such a basis may contain\nonly cycles which have no contribution to the principal minors of K. In the same example as above,\nthe only induced cycles of GK are triangles, and any cycle basis should contain at least three cycles.\nHowever, there are only four triangles in that graph and two of them have a zero contribution to\nthe principal minors of K. Hence, in that case, it is necessary to query principal minors that do not\ncorrespond to induced cycles in order to \ufb01nd a solution to (PMA1).\nIn order to summarize, we state the following theorem.\n\nTheorem 1. Let H, K\u2208T . The following statements are equivalent.\n\n\u2022 H and K have the same list of principal minors.\n\n\u2022 Hi,i= Ki,i andHi,j=Ki,j, for all i, j\u2208[N] with i\u2260 j, H and K have the same signed\nadjacency graph and, for all cycles C in that graph, \u03c0K(C)= \u03c0H(C).\n\ncase), is an open problem. However, in the next section, we re\ufb01ne this result for a smaller class of\nnonsymmetric kernels.\n\nTheorem 1 does not provide any insight on how to solve (PMA2) ef\ufb01ciently, since the number of\ncycles in a graph can be exponentially large in the size of the graph. A re\ufb01nement of this theorem,\nwhere we would characterize a minimal set of cycles, that could be found ef\ufb01ciently and that would\n\ncharacterize the principal minors of K \u2208T (such as a basis of induced cycles, in the symmetric\n3.3 PMA whenH=T , dense case\nIn this section, we only consider matrices K\u2208T such that for all i, j\u2208[N] with i\u2260 j, Ki,j\u2260 0. The\nWe also assume that for all pairwise distinct i, j, k, l\u2208[N] and all \u03b71, \u03b72, \u03b73\u2208{\u22121, 0, 1},\n\u03b71Ki,jKj,kKk,lKl,i+ \u03b72Ki,jKj,lKl,kKk,i+ \u03b73Ki,kKk,jKj,lKl,i= 0\u21d2 \u03b71= \u03b72= \u03b73= 0.\n\n(3)\nNote that Condition (3) only depends on the magnitudes of the entries of K. Hence, if one solution of\n(PMA1) satis\ufb01es (3), then all the solutions must satisfy it too. Condition (3) is not a strong condition:\nIndeed, any generic matrix with rank at least 4 is very likely to satisfy it.\n\nadjacency graph of such a matrix is a signed version of the complete graph, which we denote by GN .\n\n6\n\n\fFor the sake of simplicity, we restate (PMA1) and (PMA2) in the following way. Let K\u2208T be a\n(PMA\u20191) Find a matrix H\u2208T such that det(HJ)= det(KJ),\u2200J\u2286[N], J\u2260\u089d.\n\nground kernel satisfying the two conditions above (i.e., K is dense and satis\ufb01es Condition 3), and\nassume that K is unknown, but its principal minors are available.\n\n(PMA\u20192) Describe the set of all solutions of (PMA\u20191).\n\nwith vertex set J.\nThe main result of this section is stated in the following theorem.\n\nMoreover, recall that we would like to \ufb01nd a solution to (PMA\u20191) that uses few queries from\nthe available list of principal minors of K, in order to design an algorithm that is not too costly\ncomputationally.\n\nSince K is assumed to be dense, every subset J\u2286[N] of size at least 3 is the vertex set of a cycle.\nMoreover, for all cycles C of GN , \u03c0K(C) only depends on the vertex set of C, not its edge set.\nTherefore, in the sequel, for the ease of notation, we denote by \u03c0K(J)= \u03c0K(C) for any cycle C\nTheorem 2. A matrix H \u2208 T is a solution of (PMA\u20191) if and only if it satis\ufb01es the following\n= Ki,j\n\n\u2022 Hi,i= Ki,i andHi,j=Ki,j, for all i, j\u2208[N] with i\u2260 j;\n\u2022 H has the same signed adjacency graph as K, i.e., GH= GK= GN and\nall i\u2260 j;\n\u2022 \u03c0H(J)= \u03c0K(J), for all J\u2286[N] of size 3 or 4.\n\nrequirements:\n\nHi,j\nHj,i\n\n, for\n\nKj,i\n\ndet(HJ)= det(KJ),\n\nProof sketch Here, we only give a sketch of the proof of Theorem 2. All the details of the proof\ncan be found in the Supplementary Material.\nThe left to right implication follows directly from Theorem 1, which was a consequence of the whole\ndiscussion in Section 3.2. Now, let H satisfy the four requirements; We want to prove that\n\nLet us introduce some new notation for the rest of this proof sketch. For all oriented cycles\n\nC\nleast 3. In the sequel, for each unoriented cycle C with vertex set J, let\n\nfor all J\u2286[N]. If J has size 1 or 2, (4) is straightforward, by the \ufb01rst three requirements. If J has\nsize 3 or 4, it is easy to see that det(HJ) only depends on Hi,i, H 2\ni,j, i, j\u2208 J and \u03c0H(S), S\u2286 J,\nhence, (4) is also granted. Now, let J\u2286[N] have size at least 5. By Lemma 1, it is enough to check\n\u03c0H(S)= \u03c0K(S),\nfor all S\u2286 J of size at least 3.\n\u2014\u2192\nKi,j and\u2014\u2192\u03c0 H(\u2014\u2192\nGN , we denote by\u2014\u2192\u03c0 K(\u2014\u2192\nC)=\u220f(i,j)\u2208\u2014\u2192\nHi,j. Let J\u2286[N] of size at\nC)=\u220f(i,j)\u2208\u2014\u2192\n\u2014\u2192\norientations of C, chosen arbitrarily. Denote by T+(J) the set of unoriented cycles C with vertex set\nJ, such that \u0001K(C)=+1. It is clear that\n\u2014\u2192\u03c0 H(\u2014\u2192\n\u03c0H(J)= 2 Q\nC),\nC\u2208T+(J)\nand the same holds for K. Now, letJ+={(i, j, k)\u2286[N]\u2236 i\u2260 j, i\u2260 k, j\u2260 k, \u0001i,j\u0001j,k\u0001i,k=+1} be\nthe principal minors of K. The requirements on H ensure that Hi,jHj,kHi,k= Ki,jKj,kKi,k for all\nC)=\u2014\u2192\u03c0 K(\u2014\u2192\n(i, j, k)\u2208J+ and, by Condition (3), using (6), that\u2014\u2192\u03c0 H(\u2014\u2192\nC), for all cycles C of length 4\nwith \u0001K(C)= 1 (where, we recall that C is the unoriented version of the oriented cycle\nC)=\u2014\u2192\u03c0 K(\u2014\u2192\nLet p be the size of S. By (6), it is enough to check that\u2014\u2192\u03c0 H(\u2014\u2192\nC) for all positive oriented\n\u2014\u2192\nC of length p with \u0001K(C)=+1. Let us prove\n\nthe set of positive triangles, i.e., the set of triples that de\ufb01ne triangles in GN that do contribute to\n\nC of length p, i.e., for all oriented cycles\n\nC be any of the two possible\n\n\u2014\u2192\n\nC ).\n\n\u2014\u2192\n\ncycles\n\n(4)\n\n(5)\n\nC in\n\n(6)\n\nthat\n\nC\n\n7\n\n\fAlgorithm 1 Find a solution H to (PMA\u20191)\n\nC), for all\n\n\u2014\u2192\nC \u2208 T+(S)\n\nFind an sign assignment for the off diagonal entries of H that matches all the signs found in the\nprevious step, by Gaussian elimination in GF2.\n\nInput: List{aJ\u2236 J\u2286[N]}.\nSet Hi,i= a{i} for all i= 1, . . . , N.\nSetHi,j=\u0001a{i}a{j}\u2212 a{i,j}\u0001 for all i\u2260 j.\nSet \u0001i,j= sign\u0001a{i}a{j}\u2212 a{i,j}\u0001 for all i\u2260 j.\nFind the setJ+ of all triples(i, j, j) of pairwise distinct indices such that \u0001i,j\u0001i,k\u0001j,k= 1 and \ufb01nd\nthe sign of Hi,jHj,kHi,k for all(i, j, k)\u2208J+, using aJ , J\u2286 i, j, k.\nFor all S\u2286[N] of size 4, \ufb01nd \u03c0H(S) and deduce the sign of\u2014\u2192\u03c0 K(\u2014\u2192\nthis statement by induction on p. If p= 3 or 4, (5) is granted by the requirement imposed on H.\nLet p= 5. Let\n\u2014\u2192\nC = 1\u2192 2\u2192 3\u2192 4\u2192 5\u2192 1. Since it is positive, it can have either 0, 2 or 4 negative edges.\nSuppose it has 0 negative edges, i.e., all its edges are positive (i.e., satisfy \u0001i,j =+1). We call a\nsince GH = GK is the complete graph, all cycles have chords. If C has a positive chord, i.e., if\nthere are two vertices i\u2260 j with j\u2260 i\u00b1 1 (mod5) and \u0001i,j =+1, then C can be decomposed as\nC)=\u2014\u2192\u03c0 H(\u2014\u2192\n\u2014\u2192\u03c0 H(\u2014\u2192\nC). If C has no positive chord, then\n\u2014\u2192\u03c0 H(\u2014\u2192\nC)=\u2014\u2192\u03c0 K(\u2014\u2192\nIf p\u2265 6, a similar argument is employed: By distinguishing several cases, one can show that C can\n\nC1)\u2014\u2192\u03c0 H(\u2014\u2192\nC1)\u2014\u2192\u03c0 K(\u2014\u2192\nC2)\nC2)\nC). A similar argument is used when\n\nthe symmetric difference of two positive cycles C1 and C2, one of length 3, one of length 4, with\n\nchord of the cycle C any edge between two vertices of C, that is not an edge of C. Recall that\n\nwe show that it can be decomposed as the symmetric difference of three positive cycles, also yielding\n\n=\u2014\u2192\u03c0 K(\u2014\u2192\n\nH 2\ni,j\n\nK 2\ni,j\n\n\u2014\u2192\n\nthat\n\nC be a positive oriented cycle of length 5. Without loss of generality, let us assume\n\n=\u2014\u2192\u03c0 K(\u2014\u2192\n\u2014\u2192\n\nC has 2 or 4 negative edges.\n\nalways be decomposed as the symmetric difference of smaller positive cycles and use induction. (cid:3)\nFinally, we provide an algorithm that \ufb01nds a solution to (PMA\u20191) in polynomial time.\nTheorem 3. Algorithm 1 \ufb01nds a solution of (PMA\u20191) in polynomial time in N.\n\nProof. The fact that Algorithm 1 \ufb01nds a solution of (PMA\u20191) is a straightforward consequence of\nTheorem 2. Its complexity is of the order of that of Gaussian elimination for a linear system of\n\nat most O(N 4) equations, corresponding to cycles of size at most 4 and with O(N 2) variables,\n\ncorresponding to the entries of H.\n\n4 Conclusions\n\nWe have introduced signed DPP\u2019s, which allow for both repulsive and attractive interactions. By\nsolving the PMA problem, we have characterized identi\ufb01cation of the kernel in the dense case\n\n(Theorem 2) and we have given an algorithm that \ufb01nds a dense matrix H \u2208 T with prescribed\n\nprincipal minors, in polynomial time in the size N of the unknown matrix. In practice, these principal\nminors are unknown, but they can be estimated from observed samples from a DPP. As long as\nthe adjacency graph can be recovered exactly from the samples, which would be granted with high\nprobability for a large number of observations, and if all entries of H are bounded away from zero by\nsome known constant (that depends on N), solving the PMA problem amounts in \ufb01nding the signs\nof the entries of H, up to identi\ufb01ability, which can also be done exactly with high probability, if\nthe number of observed samples is large (see, e.g., [25]). However, extending classical symmetric\nDPP\u2019s to non symmetric kernels poses some questions: We do not know how to sample a signed\nDPP ef\ufb01ciently, since the strongly Rayleigh property is no longer valid (see [3]) and the role of\nthe eigenvalues of the kernel is not clear (in the symmetric case, a spectral decomposition of the\nkernel can be used for sampling, see [16]), even though Lemma 1 in the Supplementary Material, for\ninstance, shows that they still determine the distribution of the size of the DPP.\n\n8\n\n\fReferences\n[1] Raja Ha\ufb01z Affandi, Emily B. Fox, Ryan P. Adams, and Benjamin Taskar. Learning the\nparameters of determinantal point process kernels. In Proceedings of the 31th International\nConference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014, pages 1224\u2013\n1232, 2014.\n\n[2] Edoardo Amaldi, Claudio Iuliano, and Romeo Rizzi. Ef\ufb01cient deterministic algorithms for\n\ufb01nding a minimum cycle basis in undirected graphs. In International Conference on Integer\nProgramming and Combinatorial Optimization, pages 397\u2013410. Springer, 2010.\n\n[3] Nima Anari, Shayan Oveis Gharan, and Alireza Rezaei. Monte carlo markov chain algorithms\nfor sampling strongly rayleigh distributions and determinantal point processes. In Conference\non Learning Theory, pages 103\u2013115, 2016.\n\n[4] R\u00e9mi Bardenet and Michalis Titsias.\n\nInference for determinantal point processes without\nspectral knowledge. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett,\neditors, Advances in Neural Information Processing Systems 28, pages 3393\u20133401. Curran\nAssociates, Inc., 2015.\n\n[5] Nematollah Kayhan Batmanghelich, Gerald Quon, Alex Kulesza, Manolis Kellis, Polina Gol-\nland, and Luke Bornn. Diversifying sparsity using variational determinantal point processes.\nCoRR, abs/1411.6307, 2014.\n\n[6] Julius Borcea, Petter Br\u00e4nd\u00e9n, and Thomas Liggett. Negative dependence and the geometry of\n\npolynomials. Journal of the American Mathematical Society, 22(2):521\u2013567, 2009.\n\n[7] Julius Borcea, Petter Br\u00e4nd\u00e9n, and Thomas M. Liggett. Negative dependence and the geometry\n\nof polynomials. J. Amer. Math. Soc., 22(2):521\u2013567, 2009.\n\n[8] Victor-Emmanuel Brunel, Ankur Moitra, Philippe Rigollet, and John Urschel. Rates of estima-\n\ntion for determinantal point processes. In Conference On Learning Theory, 2017.\n\n[9] Christophe Dupuy and Francis Bach. Learning determinantal point processes in sublinear time.\n\narXiv:1610.05925, 2016.\n\n[10] Mike Gartrell, Ulrich Paquet, and Noam Koenigstein. Bayesian low-rank determinantal point\nprocesses. In Proceedings of the 10th ACM Conference on Recommender Systems, RecSys \u201916,\npages 349\u2013356, New York, NY, USA, 2016. ACM.\n\n[11] Mike Gartrell, Ulrich Paquet, and Noam Koenigstein. Low-rank factorization of determinantal\n\npoint processes for recommendation. arXiv:1602.05436, 2016.\n\n[12] Jennifer Gillenwater, Alex Kulesza, Emily Fox, and Ben Taskar. Expectation-maximization for\nlearning determinantal point processes. In Proceedings of the 27th International Conference on\nNeural Information Processing Systems, NIPS\u201914, pages 3149\u20133157, Cambridge, MA, USA,\n2014. MIT Press.\n\n[13] Joseph Douglas Horton. A polynomial-time algorithm to \ufb01nd the shortest cycle basis of a graph.\n\nSIAM Journal on Computing, 16(2):358\u2013366, 1987.\n\n[14] Charles R Johnson and Michael J Tsatsomeros. Convex sets of nonsingular and p\u2013matrices.\n\nLinear and Multilinear Algebra, 38(3):233\u2013239, 1995.\n\n[15] Alex Kulesza and Ben Taskar. k-DPPs: Fixed-size determinantal point processes. In Proceedings\nof the 28th International Conference on Machine Learning, ICML 2011, Bellevue, Washington,\nUSA, June 28 - July 2, 2011, pages 1193\u20131200, 2011.\n\n[16] Alex Kulesza and Ben Taskar. Determinantal Point Processes for Machine Learning. Now\n\nPublishers Inc., Hanover, MA, USA, 2012.\n\n[17] Donghoon Lee, Geonho Cha, Ming-Hsuan Yang, and Songhwai Oh.\n\nIndividualness and\ndeterminantal point processes for pedestrian detection. In Computer Vision - ECCV 2016 - 14th\nEuropean Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part\nVI, pages 330\u2013346, 2016.\n\n9\n\n\f[18] Hui Lin and Jeff A. Bilmes. Learning mixtures of submodular shells with application to\ndocument summarization. In Proceedings of the Twenty-Eighth Conference on Uncertainty in\nArti\ufb01cial Intelligence, Catalina Island, CA, USA, August 14-18, 2012, pages 479\u2013490, 2012.\n\n[19] Odile Macchi. The coincidence approach to stochastic point processes. Advances in Appl.\n\nProbability, 7:83\u2013122, 1975.\n\n[20] Zelda Mariet and Suvrit Sra. Fixed-point algorithms for learning determinantal point processes.\nIn Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pages\n2389\u20132397, 2015.\n\n[21] Zelda E. Mariet and Suvrit Sra. Kronecker determinantal point processes.\n\nIn D. D. Lee,\nM. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information\nProcessing Systems 29, pages 2694\u20132702. Curran Associates, Inc., 2016.\n\n[22] Luke Oeding. Set-theoretic de\ufb01ning equations of the variety of principal minors of symmetric\n\nmatrices. Algebra Number Theory, 5(1):75\u2013109, 2011.\n\n[23] Justin Rising, Alex Kulesza, and Ben Taskar. An ef\ufb01cient algorithm for the symmetric principal\n\nminor assignment problem. Linear Algebra and its Applications, 473:126 \u2013 144, 2015.\n\n[24] Jasper Snoek, Richard S. Zemel, and Ryan Prescott Adams. A determinantal point process\nlatent variable model for inhibition in neural spiking data. In Advances in Neural Information\nProcessing Systems 26: 27th Annual Conference on Neural Information Processing Systems\n2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States.,\npages 1932\u20131940, 2013.\n\n[25] John Urschel, Victor-Emmanuel Brunel, Ankur Moitra, and Philippe Rigollet. Learning deter-\n\nminantal point processes with moments and cycles. In ICML, 2017.\n\n[26] Haotian Xu and Haotian Ou. Scalable discovery of audio \ufb01ngerprint motifs in broadcast streams\nwith determinantal point process based motif clustering. IEEE/ACM Trans. Audio, Speech &\nLanguage Processing, 24(5):978\u2013989, 2016.\n\n[27] Jin-ge Yao, Feifan Fan, Wayne Xin Zhao, Xiaojun Wan, Edward Y. Chang, and Jianguo Xiao.\nTweet timeline generation with determinantal point processes. In Proceedings of the Thirtieth\nAAAI Conference on Arti\ufb01cial Intelligence, February 12-17, 2016, Phoenix, Arizona, USA.,\npages 3080\u20133086, 2016.\n\n10\n\n\f", "award": [], "sourceid": 3669, "authors": [{"given_name": "Victor-Emmanuel", "family_name": "Brunel", "institution": "ENSAE"}]}