{"title": "Online Reciprocal Recommendation with Theoretical Performance Guarantees", "book": "Advances in Neural Information Processing Systems", "page_first": 8257, "page_last": 8267, "abstract": "A reciprocal recommendation problem is one where the goal of learning is not just to predict a user's preference towards a passive item (e.g., a book), but to recommend the targeted user on one side another user from the other side such that a mutual interest between the two exists. The problem thus is sharply different from the more traditional items-to-users recommendation, since a good match requires meeting the preferences of both users. We initiate a rigorous theoretical investigation of the reciprocal recommendation task in a specific framework of sequential learning. We point out general limitations, formulate reasonable assumptions enabling effective learning and, under these assumptions, we design and analyze a computationally efficient algorithm that uncovers mutual likes at a pace comparable to those achieved by a clairvoyant algorithm knowing all user preferences in advance. Finally, we validate our algorithm against synthetic and real-world datasets, showing improved empirical performance over simple baselines.", "full_text": "Online Reciprocal Recommendation with Theoretical\n\nPerformance Guarantees\n\nFabio Vitale\n\nDepartment of Computer Science\n\nSapienza University of Rome (Italy) & University of Lille (France) & INRIA Lille Nord Europe\n\nRome, Italy & Lille, France\nfabio.vitale@inria.fr\n\nNikos Parotsidis\n\nUniversity of Rome Tor Vergata, Rome, Italy\n\nnikos.parotsidis@uniroma2.it\n\nClaudio Gentile\n\nINRIA Lille & Google New York\nLille, France & New York, USA\n\ncla.gentile@gmail.com\n\nAbstract\n\nA reciprocal recommendation problem is one where the goal of learning is not\njust to predict a user\u2019s preference towards a passive item (e.g., a book), but to\nrecommend the targeted user on one side another user from the other side such that\na mutual interest between the two exists. The problem thus is sharply different from\nthe more traditional items-to-users recommendation, since a good match requires\nmeeting the preferences at both sides. We initiate a rigorous theoretical investiga-\ntion of the reciprocal recommendation task in a speci\ufb01c framework of sequential\nlearning. We point out general limitations, formulate reasonable assumptions\nenabling effective learning and, under these assumptions, we design and analyze\na computationally ef\ufb01cient algorithm that uncovers mutual likes at a pace com-\nparable to that achieved by a clairvoyant algorithm knowing all user preferences\nin advance. Finally, we validate our algorithm against synthetic and real-world\ndatasets, showing improved empirical performance over simple baselines.\n\nIntroduction\n\n1\nRecommendation Systems are at the core of many successful online businesses, from e-commerce, to\nonline streaming, to computational advertising, and beyond. These systems have extensively been\ninvestigated by both academic and industrial researchers by following the standard paradigm of\nitems-to-users preference prediction/recommendation. In this standard paradigm, a targeted user\nis presented with a list of items that s/he may prefer according to a preference pro\ufb01le that the\nsystem has learned based on both explicit user features (item data, demographic data, explicitly\ndeclared preferences, etc.) and past user activity. In more recent years, due to their hugely increasing\ninterest in the online dating and the job recommendation domains, a special kind of recommendation\nsystems called Reciprocal Recommendation Systems (RRS) have gained big momentum. The\nreciprocal recommendation problem is sharply different from the more traditional items-to-users\nrecommendation, since recommendations must satisfy both parties, i.e., both parties can express their\nlikes and dislikes and a good match requires meeting the preferences of both. Examples of RRS\ninclude, for instance: online recruitment systems (e.g., LinkedIn), 1 where a job seeker searches for\njobs matching his/her preferences, say salary and expectations, and a recruiter who seeks suitable\ncandidates to ful\ufb01l the job requirements; heterosexual online dating systems (e.g., Tinder), 2 where\npeople have the common goal of \ufb01nding a partner of the opposite gender; roommate matching systems\n\n1 https://www.linkedin.com/.\n2 https://tinder.com.\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\f(e.g., Badi), 3 used to connect people looking for a room to those looking for a roommate, online\nmentoring systems, customer-to-customer marketplaces, etc.\nFrom a Machine Learning perspective, the main challenge in a RRS is thus to learn reciprocated\npreferences, since the goal of the system is not just to predict a user\u2019s preference towards a passive\nitem (a book, a movie, etc), but to recommend the targeted user on one side another user from the\nother side such that a mutual interest exists. Importantly enough, the interaction the two involved\nusers have with the system is often staged and unsynced. Consider, for instance, a scenario where a\nuser, Geena, is recommended to another user, Bob. The recommendation is successful only if both\nGeena and Bob mutually agree that the recommendation is good. In the \ufb01rst stage, Bob logs into the\nsystem and Geena gets recommended to him; this is like in a standard recommendation system: Bob\nwill give a feedback (say, positive) to the system regarding Geena. Geena may never know that she\nhas been recommended to Bob. In a subsequent stage, some time in the future, also Geena logs in. In\nan attempt to \ufb01nd a match, the system now recommends Bob to Geena. It is only when also Geena\nresponds positively that the reciprocal recommendation becomes successful.\nThe problem of reciprocal recommendation has so far being studied mainly in the Data Mining,\nRecommendation Systems, and Social Network Analysis literature (e.g., [7, 1, 16, 15, 11, 19, 23, 3,\n17, 12, 13]), with some interesting adaptations of standard collaborative \ufb01ltering approaches to user\nfeature similarity, but it has remained largely unexplored from a theoretical standpoint. Despite each\napplication domain has its own speci\ufb01city,4 in this paper we abstract such details away, and focus on\nthe broad problem of building matches between the two parties in the reciprocal recommendation\nIn particular, we do not consider explicit user\nproblem based on behavioral information only.\npreferences (e.g., those evinced by user pro\ufb01les), but only the implicit ones, i.e., those derived from\npast user behavior. The explicit-vs-implicit user features is a standard dichotomy in Recommendation\nSystem practice, and it is by now common knowledge that collaborative effects (aka, implicit features)\ncarry far more information about actual user preferences than explicit features, like, for instance,\ndemographic metadata[18]. Similar experimental \ufb01ndings are also reported in the context of RRS in\nthe online dating domain [2].\nIn this paper, we initiate a rigorous theoretical investigation of the reciprocal recommendation\nproblem, and we view it as a sequential learning problem where learning proceeds in a sequence\nof rounds. At each round, a user from one of the two parties becomes active and, based on past\nfeedback, the learning algorithm (called matchmaker) is compelled to recommend one user from the\nother party. The broad goal of the algorithm is to uncover as many mutual interests (called matches)\nas possible, and to do so as quickly as possible. We formalize our learning model in Section 2.\nAfter observing that, in the absence of structural assumptions about matches, learning is virtually\nprecluded (Section 3), we come to consider a reasonable clusterability assumption on the preference\nof users at both sides. Under these assumptions, we design and analyze a computationally ef\ufb01cient\nmatchmaking algorithm that leverages the correlation across matches. We show that the number of\nuncovered matches within T rounds is comparable (up to constant factors) to those achieved by an\noptimal algorithm that knows beforehand all user preferences, provided T and the total number of\nmatches to be uncovered is not too small (Sections 3, and 4). Finally, in Section 5 we present a suite\nof initial experiments, where we contrast (a version of) our algorithm to noncluster-based random\nbaselines on both synthetic and publicly available real-world benchmarks in the domain of online\ndating. Our experiments serve the twofold purpose of validating our structural assumptions on user\npreferences against real data, and showing the improved matchmaking performance of our algorithm,\nas compared to simple noncluster-based baselines.\n\n2 Preliminaries\nWe \ufb01rst introduce our basic notation. We have a set of users V partitioned into two parties. Though a\nnumber of alternative metaphores could be adopted here, for concreteness, we call the two parties B\n(for \u201cboys\") and G (for \u201cgirls\"). Throughout this paper, g, g(cid:48) and g(cid:48)(cid:48) will be used to denote generic\nmembers of G, and b, b(cid:48), and b(cid:48)(cid:48) to denote generic members of B. For simplicity, we assume the two\nparties B and G have the same size n. A hidden ground truth about the mutual preferences of the\nmembers of the two parties is encoded by a sign function \u03c3 : (B \u00d7 G) \u222a (G \u00d7 B) \u2192 {\u22121, +1}.\n\n3 https://badiapp.com/en.\n4 For instance, users in an online dating system have relevant visual features, and the system needs speci\ufb01c\ncare in removing popular user bias, i.e., ensuring that popular users are not recommended more often than\nunpopular ones.\n\n2\n\n\fSpeci\ufb01cally, for a pairing (b, g) \u2208 B \u00d7 G, the assignment \u03c3(b, g) = +1 means that boy b likes girl g,\nand \u03c3(b, g) = \u22121 means that boy b dislikes girl g. Likewise, given pairing (g, b) \u2208 G \u00d7 B, we have\n\u03c3(g, b) = +1 when girl g likes boy b, and \u03c3(g, b) = \u22121 when girl g dislikes boy b. The ground truth\n\u03c3 therefore de\ufb01nes a directed bipartite signed graph collectively denoted as ((cid:104)B, G(cid:105), E, \u03c3), where\nE, the set of directed edges in this graph, is simply (B \u00d7 G) \u222a (G \u00d7 B), i.e., the sef of all possible\n2n2 directed egdes in this bipartite graph. A \u201c+1\" edge will sometimes be called a positive edge,\nwhile a \u201c-1\" edge will be called a negative edge. Any pair of directed edges (g, b) \u2208 G \u00d7 B and\n(b, g) \u2208 B \u00d7 G involving the same two subjects g and b is called a reciprocal pair of edges. We also\nsay that (g, b) is reciprocal to (b, g), and vice versa. The pairing of signed edges (g, b) and (b, g) is\ncalled a match if and only if \u03c3(b, g) = \u03c3(g, b) = +1. The total number of matches will often be\ndenoted by M. See Figure 1 for a pictorial illustration.\n\nFigure 1: (a) The (complete and directed) bipartite\ngraph ((cid:104)B, G(cid:105), E, \u03c3) with n = |B| = |G| = 4,\nedges are only sketched. (b) Representation of the\n\u03c3 function through its two pieces \u03c3 : B \u00d7 G \u2192\n{\u22121, +1} (B \u00d7 G matrix on the left), and \u03c3 : G \u00d7\nB \u2192 {\u22121, +1} (G \u00d7 B matrix on the right). For\ninstance, in this graph, Boy 1 likes Girl 1 and Girl 3,\nand dislikes Girl 2 and Girl 4, while Girl 3 likes Boy 1, and dislikes Boys 2, 3, and 4. Out of the n2 = 16\npairs of reciprocal edges, this graph admits only M = 4 matches, which are denoted by green circles on both\nmatrices. For instance, the pairing of edges (1, 3) and (3, 1) are a match since Boy 1 likes Girl 3 and, at the\nsame time, Girl 3 likes Boy 1. (c) The associated (undirected and bipartite) matching graph M. We have, for\ninstance, degM(Girl 1) = 3, and degM(Boy 2) = 1.\nCoarsely speaking, the goal of a learning algorithm A is to uncover in a sequential fashion as many\nmatches as possible as quickly as possible. More precisely, we are given a time horizon T \u2264 n2, e.g.,\nT = n\n\nn, and at each round t = 1, . . . , T :\n\n\u221a\n\nboy\" that logs into the system);\n\n(1B) A receives the id of a boy b chosen uniformly at random5 from B (b is meant to be the \u201cnext\n(2B) A selects a girl g(cid:48) \u2208 G to recommend to b;\n(3B) b provides feedback to the learner, in that the sign \u03c3(b, g(cid:48)) of the selected boy-to-girl edge is\n\nrevealed to A.\n\nWithin the same round t, the three steps described above are subsequently executed after switching\nthe roles of G and B (and will therefore be called Steps (1G), (2G), and (3G)). Hence, each round\nt is made up of two halves, the \ufb01rst half where a boy at random is logged into the system and the\nlearner A is compelled to select a girl, and the second half where a girl at random is logged in and A\nhas to select a boy. Thus at each round t, A observes the sign of the two directed edges (b, g(cid:48)) and\n(g, b(cid:48)), where b \u2208 B and g \u2208 G are generated uniformly at random by the environment, and g(cid:48) and\nb(cid:48) are the outcome of A\u2019s recommendation effort. Notice that we assume the ground truth encoded\nby \u03c3 is persistent and noiseless, so that whereas the same user (boy or girl) may recur several times\nthroughout the rounds due to their random generation, there is no point for the learner to request the\nsign of the same edge twice at two different rounds. The goal of algorithm A is to maximize the\nnumber of uncovered matches within the T rounds. The sign of the two reciprocal edges giving rise\nto a match need not be selected by A in the same round; the round where the match is uncovered\nis the time when the reciprocating edge is selected, e.g., if in round t1 we observe \u03c3(b, g(cid:48)) = \u22121,\n\u03c3(g, b(cid:48)) = +1, and in round t2 > t1 we observe \u03c3(b(cid:48), g) = +1, \u03c3(g(cid:48)(cid:48), b(cid:48)(cid:48)) = +1, we say that the\nmatch involving b(cid:48) and g has been uncovered only in round t2. In fact, if A has uncovered a positive\nedge g \u2192 b(cid:48) in (the second half of) round t1, the reciprocating positive edge (b(cid:48), g) need not be\nuncovered any time soon, since A has at the very least to wait until b(cid:48) will log into the system, an\nevent which on average will occur only n rounds later.\nWe call matching graph, and denote it by M, the bipartite and undirected graph having B \u222a G as\nnodes, where (b, g) \u2208 B \u00d7 G is an edge in M if and only if b and g determine a match in the original\ngraph ((cid:104)B, G(cid:105), E, \u03c3). Given b \u2208 B, we let NM(b) \u2286 G be the set of matching girls for b according\nto \u03c3, and degM(b) be the number of such girls. NM(g) and degM(g) are de\ufb01ned symmetrically.\nSee again Figure 1 for an example.\nThe performance of algorithm A is measured by the number of matches found by A within the T\nrounds. Speci\ufb01cally, if Mt(A) is the number of matches uncovered by A after t rounds of a given run,\n5 Though different distributional assumptions could be made, for technical simplicity in this paper we decided\n\nto focus on the uniform distribution only.\n\n3\n\n43214321 G x B \u2192 {-1,+1} B1 2 3 4 1 2 3 4 1 -1 -1 +1 +1+1 -1 -1 -1+1 +1 +1 +1 -1 -1 +1 +1+1 -1 +1 -1+1 -1 -1 -1+1 +1 +1 +1+1 -1 -1 -1G B x G \u2192 {-1,+1} \u03c3\u03c343214321BG(c)(a)(b)2431243\ft=1. Likewise, Er\n\nwe would like to obtain lower bounds on MT (A) that hold with high probability over the random\ngeneration of boys and girls that log into the system as well as the internal randomization of A. To\nthis effect, we shall repeatedly use in our statements the acronym w.h.p to signify with probability\nat least 1 \u2212 O( 1\nn ), as n \u2192 \u221e. It will also be convenient to denote by Et(A) the set of directed\nedges selected by A during the \ufb01rst t rounds, with E0(A) = \u2205. A given run of A may therefore be\nsummarized by the sequence {Et(A)}T\nt (A) will denote the set of reciprocal (not\nnecessarily matching) directed edges selected by A up to time t. Finally, Er will denote the set of all\n|B| \u00b7 |G| = n2 pairs of reciprocal (not necessarily matching) edges between B and G.\nWe will \ufb01rst show (Section 3) that in the absence of further assumptions on the way the matches\nare located, there is not much one can do but try and simulate a random sampler. In order to further\nillustrate our model, the same section introduces a reference optimal behavior that assumes prior\nknowledge of the whole sign fuction \u03c3. This will be taken as a yardstick to be contrasted to the\nperformance of our algorithm SMILE (Section 4) that works under more speci\ufb01c, yet reasonable,\nstructural assumptions on \u03c3.\n3 General limitations and optimal behavior\nWe now show6 that in the absence of speci\ufb01c assumptions on \u03c3, the best thing to do in order to\nuncover matches is to reciprocate at random, no matter how big the number M of matches actually is.\nTheorem 1 Given B and G such that |B| = |G| = n, and any integer m \u2264 n2\n2 , there exists a\nrandomized strategy for generating \u03c3 such that M = m, and the expected number of matches\n\nuncovered by any algorithm A operating on ((cid:104)B, G(cid:105), E, \u03c3) satis\ufb01es7 EMT (A) = O(cid:0) T\n\nn2 M(cid:1) .\n\nAn algorithm matching the above upper bound is described next. We call this algorithm OOMM (Obliv-\nious Online Match Maker), The main idea is to develop a strategy that is able to draw uniformly\nat random as many pairs of reciprocal edges as possible from Er (recall that Er is the set of all\nreciprocal edges between B and G). In particular, within the T rounds, OOMM will draw uniformly\nat random \u0398(T )-many such pairs. The pseudocode of OOMM is given next. For brevity, throughout\nthis paper an algorithm will be described only through Steps (2B) and (2G) \u2013 recall Section 2.\n\nAlgorithm 1: OOMM (Oblivious Online Match Maker)\n(cid:46) INPUT : B and G\nAt each round t: (2B) Select g(cid:48) uniformly at random from G ;\n\n(2G) Bg,t \u2190 {b(cid:48)(cid:48) \u2208 B :\n\n(b(cid:48)(cid:48), g) \u2208 Et(OOMM), (g, b(cid:48)(cid:48)) (cid:54)\u2208 Et\u22121(OOMM)};\n\nIf Bg,t (cid:54)= \u2205 then select b(cid:48) uniformly at random from Bg,t\nelse select b(cid:48) uniformly at random from B .\n\nOOMM simply operates as follows. In Step (2B) of round t, the algorithm chooses a girl g(cid:48) uniformly\nat random from the whole set G. OOMM maintains over time the set Bg,t \u2286 B of all boys that so\nfar gave their feedback (either positive or negative) on g, but for whom the feedback from g is not\navailable yet. In Step (2G), if Bg,t is not empty, OOMM chooses a boy uniformly at random from\nBg,t, otherwise it selects a boy uniformly at random from the whole set B.8\nNote that, the way it is designed, the selection of g(cid:48) and b(cid:48) does not depend on the signs \u03c3(b, g) or\n\n\u03c3(g, b) collected so far. The following theorem guarantees that EMT (OOMM) = \u0398(cid:0) T\nthat EMT (OOMM) = \u0398(cid:0) T\n\nis as if we were able to directly sample in most of the T rounds pairs of reciprocal edges.\nTheorem 2 Given any input graph ((cid:104)B, G(cid:105), E, \u03c3), with |B| = |G| = n, if T \u2212 n = \u2126(n) then\nT (OOMM) is selected uniformly at random (with replacement) from Er, its size |Er\nT (OOMM)| is\nEr\nsuch that E|Er\nT (OOMM)| = \u0398(T ), and the expected number of matches disclosed by OOMM is such\n\nn2 M(cid:1), which\n\nn2 M(cid:1) .\n\nWe now describe an optimal behavior (called Omniscient Matchmaker) that assumes prior knowledge\nof the whole edge sign assignment \u03c3. This optimal behavior will be taken as a reference performance\nfor our algorithm of Section 4. This will also help to better clarify our learning model.\n\n6 All proofs are provided in the appendix.\n7 Recall that an upper bound on MT (A) is a negative result here, since we are aimed at making MT (A) as\n\nlarge as possible.\n\n8 A boy could be selected more than once while serving a girl g during the T rounds. The optimality of\nOOMM (see Theorems 1 and 2) implies that this redundancy does not signi\ufb01cantly affect OOMM\u2019s performance.\n\n4\n\n\fn\n\nn2 M(cid:1) (this is how Theorem 1 is proven). On\n\nalgorithm can achieve is always upper bounded by O(cid:0) T\n\nDe\ufb01nition 1 The Omniscient Matchmaker A\u2217 is an optimal strategy based on the prior knowledge\nof the signs \u03c3(b, g) and \u03c3(g, b) for all b \u2208 B and g \u2208 G. Speci\ufb01cally, based on this information, A\u2217\nmaximizes the number of matches uncovered during T rounds over all n2T possible selections that\ncan be made in Steps (2B) and (2G). We denote this optimal number of matches by M\u2217\nT = MT (A\u2217).\nObserve that when the matching graph M is such that degM(u) > T\nn for some user u \u2208 B \u222a G, no\nalgorithm will be able to uncover all M matches in expectation, since Steps (1B) and (1G) of our\nlearning protocol entail that the expected number of times each user u logs into the system is equal to\nn . In fact, this holds even for the Omniscient Matchmaker A\u2217, despite the prior knowledge of \u03c3. For\nT\ninstance, when M turns out to be a random bipartite graph9 the expected number of matches that any\nT = \u0398(M ) as n grows large, it is suf\ufb01cient that degM(u) \u2264 T\nthe other hand, in order to have M\u2217\nholds for all users u \u2208 B \u222a G, even with such a random M. In order to avoid the pitfalls of M being\na random bipartite graph (and hence the negative result of Theorem 1), we need to slightly depart\nfrom our general model of Section 2, and make structural assumptions on the way matches can be\ngenerated. The next section formulates such assumptions, and analyzes an algorithm that under these\nassumptions is essentially optimal i.t.o. number of uncovered matches. The assumptions and the\nalgorithm itself are then validated against simple baselines on real-world data in the domain of online\ndating (Section 5).\n4 A model based on clusterability of received feedback\nIn a nutshell, our model is based on the extent to which it is possible to arrange the users in\n(possibly) overlapping clusters by means of the feedbacks they may potentially receive from the\nother party. In order to formally describe our model, it will be convenient to introduce the Boolean\npreference matrices B, G \u2208 {0, 1}n\u00d7n. These two matrices collect in their rows the ground truth\ncontained in \u03c3, separating the two parties B and G. Speci\ufb01cally, Bi,j = 1\n2 (1 + \u03c3(bi, gj)), and\n2 (1 + \u03c3(gi, bj)) (these are essentially the matrices exempli\ufb01ed in Figure 1(b) where the \u201c\u22121\u201d\nGi,j = 1\nsigns therein are replaced by \u201c0\u201d). Then, we consider the n column vectors of B (resp. G) \u2013 i.e., the\nwhole set of feedbacks that each g \u2208 G (resp. b \u2208 B) may receive from members of B (resp. G)\nand, for a given radius \u03c1 \u2265 0, the associated covering number of this set of Boolean vectors w.r.t.\nHamming distance. We recall that the covering number at radius \u03c1 is the smallest number of balls of\nradius \u2264 \u03c1 that are needed to cover the entire set of n vectors. The smaller \u03c1 the higher the covering\nnumber. If the covering number stays small despite a small \u03c1, then our n vectors can be clustered\ninto a small number of clusters each one having a small (Hamming) radius.\nAs we mentioned in Section 3, a reasonable model for this problem is one for which our learning task\ncan be solved in a nontrival manner, thereby speci\ufb01cally avoiding the pitfalls of M being a random\nbipartite graph. It is therefore worth exploring what pairs of radii and covering numbers may be\nassociated with the two preference matrices G and B when M is indeed random bipartite. Assume\nM = o(n2), so as to avoid pathological cases. When M is random bipartite, one can show that we\n\nmay have \u03c1 = \u2126(cid:0) M\nregime is when \u03c1 = o(cid:0) M\n\n(cid:1) even when the two covering numbers are both 1. Hence, the only interesting\n(cid:1). Within this regime, our broad modeling assumption is that the resulting\n\nn\n\ncovering numbers for G and B are o(n), i.e., less that linear in n when n grows large.\nRelated work. The approach of clustering users according to their description/preference similar-\nities while exploiting user feedback is similar in spirit to the two-sided clusterability assumptions\ninvestigated, e.g., in [1], which is based on a mixture of explicit and implicit (collaborative \ufb01ltering-\nlike) user features. Yet, as far as we are aware, ours is the \ufb01rst model that lends itself to a rigorous\ntheoretical quanti\ufb01cation of matchmaking performance (see Section 4.1). Moreover, in general in our\ncase the user set is not partitioned as in previous RRS models. Each user may in fact belong to more\nthan one cluster, which is apparently more natural for this problem.\nThe reader might also wonder whether the reciprocal recommendation task and associated modeling\nassumptions share any similarity to the problem of (online) matrix completion/prediction. Recovering\na matrix from a sample of its entries has been widely analyzed by a number of authors with different\napproaches, viewpoints, and assumptions, e.g., in Statistics and Optimization (e.g., [5, 14]), in Online\nLearning (e.g., [20, 21, 22, 9, 8, 6, 10]), and beyond. In fact, one may wonder if the problem of\npredicting the entries of matrices B and G may somehow be equivalent to the problem of disclosing\n9 The matching graph M is a random bipartite graph if any edge (b, g) \u2208 B \u00d7 G is generated independently\n\nn\n\nwith the same probability p \u2208 [0, 1].\n\n5\n\n\fmatches between B and G. A closer look reveals that the two tasks are somewhat related, but\nnot quite equivalent, since in reciprocal recommendation the task is to search for matching \"ones\"\nbetween the two binary matrices B and G by observing entries of the two matrices separately. In\naddition, because we get to see at each round the sign of two pairings (b, g(cid:48)) and (g, b(cid:48)), where b and\ng are drawn at random and b(cid:48) and g(cid:48) are selected by the matchmaker, our learning protocol is rather\nhalf-stochastic and half-active, which makes the way we gather information about matrix entries\nquite different from what is usually assumed in the available literature on matrix completion.\n4.1 An ef\ufb01cient algorithm\nUnder the above modeling assumptions, our goal is to design an ef\ufb01cient matchmaker. We speci\ufb01cally\nfocus on the ability of our algorithm to disclose \u0398(M ) matches, in the regime where also the optimal\nnumber of matches M\u2217\nT is \u0398(M ). Recall from Section 3 that the latter assumption is needed so\nas to make the uncovering of \u0398(M ) matches possible within the T rounds. Our algorithm, called\nSMILE (Sampling Matching Information Leaving out Exceptions) is described as Algorithm 2. The\nalgorithm depends on input parameter S \u2208 [log n, n/ log n] and, after randomly shuf\ufb02ing both B and\nG, operates in three phases: Phase 0 (described at the end), Phase I, and Phase II.\nAlgorithm 2: SMILE (Sampling Matching Information Leaving out Exceptions)\n(cid:46) INPUT : B and G; parameter S > 0.\nRandomly shuf\ufb02e sets B and G ;\nPhase 0: Run OOMM to provide an estimate \u02c6M of M;\nPhase I: (C,F) \u2190 Cluster Estimation((cid:104)B, G(cid:105), S);\nPhase II: User Matching((cid:104)B, G(cid:105), (C,F));\n\n\u221a\n\nDue to space limitations, the actual pseudocode of Cluster Estimation() and User Matching() is\npresented in the appendix. What follows is an an informal, yet precise, description of their functioning.\nPhase I: Cluster Estimation. SMILE approximates the clustering over users by: i. asking, for each\ncluster representative b \u2208 B, \u0398(n) feedbacks (i.e., edge signs) selected at random from G (and\noperating symmetrically for each representative g \u2208 G), ii. asking \u0398(S)-many feedbacks for each\nremaining user, where parameter S will be set later. In doing so, SMILE will be in a position to\nestimate the clusters each user belongs to, that is, to estimate the matching graph M, the misprediction\nper user being w.h.p of the order of (n log n)/S. The estimated M will then be used in Phase II.\nA more detailed description of the Cluster Estimation procedure follows. For convenience, we focus\non clustering G (hence observing feedbacks from B to G), the procedure operates in a completely\nsymmetric way on B. Let Fg be the set of all b \u2208 B who provided feedback on g \u2208 G so far. Assume\nfor the moment we have at our disposal a subset Gr \u2286 G containing one representative for each\ncluster over B, and that for each g \u2208 Gr we have already observed n/2 feedbacks provided by n/2\ndistinct members of B, selected uniformly at random from B. Also, let B(g, S) be a subset of B\nobtained by sampling at random S(cid:48) = 2S + 4\nS log n-many b from B. Then a Chernoff-Hoeffding\nbound argument shows that for any g \u2208 G\\ Gr and any gr \u2208 Gr we have w.h.p. |B(g, S)\u2229 Fgr| \u2265 S.\nWe use the above to estimate the cluster each g \u2208 G \\ Gr belongs to. This task can be accomplished\nby \ufb01nding gr \u2208 Gr who receives the same set of feedbacks as that of g, i.e., who belongs to the\nsame cluster as gr. Yet, in the absence of the feedback provided by all b \u2208 B to both g and gr, it is\nnot possible to obtain this information with certainty. The algorithm simply estimates g\u2019s cluster by\nexploiting Step (1B) of the protocol to ask for feedback on g from S(cid:48) = S(cid:48)(S) randomly selected\nb \u2208 B, which will be seen as forming the subset B(g, S). We shall therefore assign g to the cluster\nrepresented by an arbitrary gr \u2208 Gr such that s(b, g) = s(b, gr) for all b \u2208 B(g, S) \u2229 Fgr. We\nproceed this way for all g \u2208 G \\ Gr.\nWe now remove the assumption on Gr. Although we initially do not have Gr, we can build through\na concentration argument an approximate version of Gr while asking for the feedback B(g, S) on\neach unclustered g. The Cluster Estimation procedure does so by processing girls g sequentially, as\ndescribed next. Recall that G was randomly shuf\ufb02ed into an ordered sequence G = {g1, g2, . . . , gn}.\nThe algorithm maintains an index i over G that only moves forward, and collects feedback information\nfor gi. At any given round, Gr contains all cluster representatives found so far. Given b \u2208 B that\nneeds to be served during round t (Step (1B)), we include b in Fgi. If |Fgi| becomes as big as S(cid:48),\nthen we look for g \u2208 Gr so as to estimate gi\u2019s cluster. If we succeed, index i is incremented and\nthe algorithm will collect feedback for gi during the next rounds. If we do not succeed, gi will be\nincluded in Gr, and the algorithm will continue to collect feedback on gi until |Fgi| < n\n2 . When\n\n6\n\n\f|Fgi| \u2265 n\n2 , index i is incremented, so as to consider the next member of G. Phase I terminates when\nwe have estimated the cluster of each b and g that are themselves not representative of any cluster.\nFinally, when we have concluded with one of the two sides, but not with the other (e.g., we are done\nwith G but not with B), we continue with the unterminated side, while for the terminated one we can\nselect members (g \u2208 G in this case) in Step 2 (Step (2B) in this case) arbitrarily.\nPhase II: User matching. In phase II, we exploit the feedback collected in Phase I so as to match as\nmany pairs (b, g) as possible. For each user u \u2208 B \u222a G selected in Step (1B) or Step (1G), we pick\nin step (2G) or (2B) a user u(cid:48) from the other side such that u(cid:48) belongs to an estimated cluster which is\namong the set of clusters whose members are liked by u, and viceversa. When no such u(cid:48) exists, we\nselect u(cid:48) from the other side arbitrarily.\nPhase 0: Estimating M. In the appendix we show that the optimal tuning of S is to set it as a\nfunction of the number of hidden matches M, i.e. S := (n2 log n)/M. Since M is unknown, we run\na preliminary phase where we run OOMM (from Section 3) for a few rounds. Using Theorem 2 it is\nnot hard to show that the number T \u02c6M of rounds taken by this preliminary phase to \ufb01nd an estimate\n\n\u02c6M of M which is w.h.p. accurate up to a constant factor satis\ufb01es T \u02c6M = \u0398(cid:0)(n2 log n)/M(cid:1).\n\nIn order to quantify the performance of SMILE, it will be convenient to refer to the de\ufb01nition of the\nBoolean preference matrices B, G \u2208 {0, 1}n\u00d7n. For a given radius \u03c1 \u2265 0, we denote by C G\n\u03c1 the\ncovering number of the n column vectors of B w.r.t. Hamming distance. In a similar fashion we\nde\ufb01ne C B\n\u03c1 . Moreover, let C G and C B be the total number of cluster representatives for girls and\nboys, respectively, found by SMILE, i.e., C G = |Gr| and C B = |Br| at the end of the T rounds.\nThe following theorem shows that when the optimal number of matches M\u2217\nT is M, then so is also\nMT (SMILE) up to a constant factor, provided M and T are not too small.\nTheorem 3 Given any input graph ((cid:104)B, G(cid:105), E, \u03c3), with |B| = |G| = n, such that M\u2217\nas n grows large, then we have\n\nT = M w.h.p.\n\n(cid:110)\n(cid:110)\n\n(cid:16)\n(cid:16)\n\n\u03c1/2 + 3\u03c1S(cid:48)(cid:17)\n\u03c1/2 + 3\u03c1S(cid:48)(cid:17)\n\n(cid:111)\n(cid:111)\n\n,\n\n.\n\n, n\n\nC G \u2264 \u00afC G def\nC B \u2264 \u00afC B def\n\n= min\n\nmin\u03c1\u22650\n\nC G\n\nFurthermore, when T and M are such that T = \u03c9(cid:0)n( \u00afC G + \u00afC B + S(cid:48))(cid:1) and M = \u03c9(cid:0) n2 log n\n\nmin\u03c1\u22650\n\n= min\n\nC B\n\n, n\n\n(cid:1) , then\n\nS\n\nwe have w.h.p. MT (SMILE) = \u0398(M ) .\n\nNotice in the above theorem the role played by the upper bounds \u00afC G and \u00afC B. If the minimizing\n\u03c1 therein gives \u00afC G = \u00afC B = n, we have enough degrees of freedom for M to be generated as a\nrandom bipartite graph. On the other hand, when \u00afC G and \u00afC B are signi\ufb01cantly smaller than n at the\nminimizing \u03c1 (which is what we expect to happen in practice) the resulting M will have a cluster\nstructure that cannot be compatible with a random bipartite graph. This entails that on both sides\nof the bipartite graph, each subject receives from the other side a set of preferences that can be\ncollectively clustered into a relatively small number of clusters with small intercluster distance. Then\nthe number of rounds T that SMILE takes to achieve (up to a constant factor) the same number of\nmatches M\u2217\nT as the Omniscient Matchmaker drops signi\ufb01cantly. In particular, when S in SMILE has\nthe form (n2 log n)/ \u02c6M, where \u02c6M is the value returned by Phase 0, we have the following result.\nCorollary 1 Given any input graph ((cid:104)B, G(cid:105), E, \u03c3), with |B| = |G| = n, such that M\u2217\n\nas n grows large, with T and M satisfying T = \u03c9(cid:0)n ( \u00afC G + \u00afC B) + (n3 log n)/M(cid:1) , where \u00afC G and\n\nT = M w.h.p.\n\n\u00afC B are the upper bounds on CG and CB given in Theorem 3, then we have w.h.p. MT (SMILE) =\n\u0398(M ) .\n\nIn order to evaluate in detail the performance of SMILE, it is very interesting to show to what extent\nthe conditions bounding from below T in Theorem 3 are necessary. We have the following general\nlimitation, holding for any matchmaker A.\nTheorem 4 Given B and G such that |B| = |G| = n, any integer m \u2208 (n log n, n2 \u2212 n log n) , and\nany algorithm A operating on ((cid:104)B, G(cid:105), E, \u03c3), there exists a randomized strategy for generating \u03c3 such\n0 \u22121 < M \u2264 m, and the number of rounds T needed to achieve EMT (A) = \u0398(M ),\nthat m \u2212\nsatis\ufb01es T = \u2126(n (C G\nlog n(cid:1). To see this, observe that by de\ufb01nition we have \u00afC G \u2264 C G\nRemark 1 One can verify that the time bound for SMILE established in Corollary 1 is nearly optimal\n0 and\n\nwhenever M = \u03c9(cid:0)n3/2\u221a\n\n0 ) + M ) , as n \u2192 \u221e.\n\n0 + C B\n\n0 +CB\n\nCG\n\nn\n\n7\n\n\f\u00afC B \u2264 C B\n\n0 . Now, if M = \u03c9(cid:0)n3/2\u221a\n\nlog n(cid:1), then the additive term (n3 log n)/M becomes o(M ) and\nthe condition on T in Corollary 1 simply becomes T = \u03c9(cid:0)n (C G\n0 + M(cid:48))(cid:1), where M(cid:48) = o(M ).\nlog n(cid:1), the additive term (n3 log n)/M\nWe now explain why it is possible that, when M = \u03c9(cid:0)n3/2\u221a\nin the bound T = \u03c9(cid:0)n ( \u00afC G + \u00afC B) + (n3 log n)/M(cid:1) of Corollary 1 becomes o(M ), while the\n\nThis has to be contrasted to the lower bound on T contained in Theorem 4.\n\n0 + C B\n\n0 + C B\n\n0 + C B\n\nT = M. Let T \u2217 be the number of rounds T necessary to satisfy w.h.p. M\u2217\n\n\ufb01rst term n ( \u00afC G + \u00afC B) can be upper bounded by n (C G\n0 ). Since the lower bound T =\n0 ) + M ) of Theorem 4 has a linear dependence on M, it might seem surprising that\n\u2126(n (C G\nthe larger M is the smaller becomes the second term in the bound of Corollary 1. However, it is\nimportant to take into account that T in Corollary 1 must be large enough to also satisfy the condition\nM\u2217\nT = M. In Corollary 1,\n\nboth the conditions T \u2265 T \u2217 and T = \u03c9(cid:0)n ( \u00afC G + \u00afC B) + (n3 log n)/M(cid:1) must simultaneously hold.\nAs a further insight, consider the following. We either have M = O(cid:0)n( \u00afC G + \u00afC B)(cid:1) or\nM = \u03c9(cid:0)n( \u00afC G + \u00afC B)(cid:1).\nT = \u2126(cid:0)n (C G\n0 + \u00afC G + \u00afC B)(cid:1), hence not directly depending on M. In the second case,\nwhenever M = \u03c9(cid:0)n3/2\u221a\nlog n(cid:1), T \u2217 is larger than n ( \u00afC G + \u00afC B) + (n3 log n)/M since, by de\ufb01-\n\nWhen M is large, the number of rounds needed to satisfy the former condition becomes much larger\nthan the one needed for the latter.\n\nIn the \ufb01rst case, the lower bound in Theorem 4 clearly becomes\n\nnition, we must have T \u2217 = \u2126(M ), while in this case n ( \u00afC G + \u00afC B) + (n3 log n)/M = o(M ). In\nconclusion, if the number of rounds SMILE takes to uncover \u0398(M ) matches equals the number of\nrounds taken by the omniscient Matchmaker to uncover exactly M matches, then SMILE is optimal up\nto a constant factor, because no algorithm can outperform the omniscient Matchmaker. This provides\na crucially important insight into the key factors allowing the additive term (n3 log n)/M to be equal\nto o(M ) in Corollary 1, and is indeed one of the keystones in the proof of Theorem 3.\n\n0 + C B\n\nWe conclude this section by emphasizing the fact that SMILE is indeed quite scalable. As proven\nin the appendix, an implementation of SMILE exists that leverages a combined use of suitable\ndata-structures, leading to both time and space ef\ufb01ciency.\nTheorem 5 Let \u00afC G and \u00afC B be the upper bounds on CG and CB given in Theorem 3. Then\n\nthe running time of SMILE is O(cid:0)T + n S (cid:0) \u00afC G + \u00afC B(cid:1)(cid:1), the memory requirement is O(n ( \u00afC G +\n\u00afC B)). Furthermore, when T = \u03c9(cid:0)n ( \u00afC G + \u00afC B) + (n3 log n)/M(cid:1), as required by Corollary 1, the\n\namortized time per round is \u0398(1) + o( \u00afC G + \u00afC B), which is always sublinear in n.\n\n5 Experiments\nIn this section, we evaluate the performance of (a variant of) our algorithm by empirically contrasting\nit to simple baselines against arti\ufb01cial and real-world datasets from the online dating domain. The\ncomparison on real-world data also serves as a validation of our modeling assumptions.\n\n#clusters within bounded radius\n\nproperties\n\nCB\n20\n95\n500\n2000\n\n|B|\n1007\n1526\n2265\n\nCG\n23\n100\n480\n2000\n\n|G|\n1286\n2564\n3939\n\nS-20-23\nS-95-100\nS-500-480\nS-2000-2000\n\nRW-1007-1286\nRW-1526-2564\nRW-2265-3939\n\n2 \u00b7 n/ log n\nSynthetic datasets (2000 boys and 2000 girls)\nCG\n23\n100\n480\n2000\n\n#matches\n374K\n377K\n380K\n382K\n\n#likes\n2.45M\n2.46M\n2.47M\n2.47M\n\nCB\n20\n95\n500\n2000\n\nReal-world datasets\n\n#likes\n125K\n227K\n370K\n\n#matches\n13.9K\n19.6K\n25.0K\n\nCB\n53\n37\n42\n\nCG\n48\n45\n45\n\nn/ log n\n\nCB\n20\n95\n500\n2000\n\nCB\n177\n138\n145\n\nCG\n23\n100\n480\n2000\n\nCG\n216\n216\n215\n\n0.5 \u00b7 n/ log n\n\nCB\n445\n603\n983\n2000\n\nCB\n385\n339\n306\n\nCG\n429\n624\n950\n2000\n\nCG\n508\n601\n622\n\nTable 1: Relevant properties of our datasets. The last six columns present an approximation to the number of\nclusters when we allow radius 2 \u00b7 n/ log n, n/ log n, and 0.5 \u00b7 n/ log n between users of the same cluster.\n\nDatasets. The relevant properties of our datasets are given in Table 1. Each of our synthetic datasets\nhas |B| = |G| = 2000. We randomly partitioned B and G into CB and CG clusters, respectively.\nEach boy likes all the girls of a cluster C with probability 0.2, and with probability 0.8 dislikes them.\n\n8\n\n\fWe do the same for the preferences from girls to boy clusters. Finally, for each preference (either\npositive or negative) we reverse its sign with probability 1/(2\u00b7 log n) (in our case, n = 2000). Notice\nthat in Table 1, for all four datasets we generated, the number of likes is bigger than |B|\u00b7|G|/2. As for\nreal-world datasets, we used the one from [4], which is also publicly available. This is a dataset from\na Czech dating website, where 220,970 users rate each other in a scale from 1 (worst) to 10 (best).\nThe gender of the users is not always available. To get two disjoint parties B and G, where each user\nrates only users from the other party, we disregarded all users whose gender is not speci\ufb01ed. As this\ndataset is very sparse, we extracted dense subsets as follows. We considered as \u201dlike\" any rating > 2,\nwhile all ratings, including the missing ones, are \u201ddislikes\". Next, we iteratively removed the users\nwith the smallest number of ratings until we met some desired density level. Speci\ufb01cally, we executed\nthe above process until we obtained two sets B and G such that the number of likes between the two\nparties is at least 2(min{|B|,|G|})3/2 (resulting in dataset RW-1007-1286), 1.75(min{|B|,|G|})3/2\n(dataset RW-1526-2564), or 1.5(min{|B|,|G|})3/2 (dataset RW-2265-3939).\nRandom baselines. We included as baselines OOMM , from Section 3, and a random method that\nasks a user for his/her feedback on another user (of opposite gender) picked uniformly at random.\nWe refer to this algorithm as UROMM.\nImplementation of SMILE.\nIn the implementation of SMILE, we slightly deviated from the de-\nscription in Section 4.1. One important modi\ufb01cation is that we interleaved Phase I and Phase II.\nThe high-level idea is to start exploiting immediately the clusters once some clusters are identi\ufb01ed,\nwithout waiting to learn all of them. Additionally, we gave higher priority to exploring the reciprocal\nfeedback of a discovered like, and we avoided doing so in the case of a dislike. Finally, whenever we\ntest whether two users belong in the same cluster, we allowed a radius of a (1/ log n) fraction of the\ntested entries. The parameter S(cid:48) in SMILE has been set to S +\nS log n, with S = (n2 log n)/ \u02c6M,\nwhere \u02c6M is the estimate from Phase 0. We call the resulting algorithm I-SMILE (Improved SMILE).\nEvaluation. To get a complete picture on the behavior of the algorithms for different time\nhorizons, we present for each algorithm the number of discovered matches as a function of\nT \u2208 {1, . . . , 2|B||G|}. Figure 2 contains three representative cases. In all datasets we tested,\nI-SMILE clearly outperforms UROMM and OOMM. Our experiments con\ufb01rm that SMILE (and there-\nfore I-SMILE) quickly learns the underlying structure of the likes between users, and uses this structure\nto reveal the matches between them. Moreover, the variant I-SMILE that we implemented allows\none not only to perform well on graphs with no underlying structure in the likes, but also to discover\nmatches during the exploration phase while learning the clusters.\n\n\u221a\n\nFigure 2: Empirical comparison\nof the three algorithms on datasets\nS-95-100 (left), RW-1007-1286\n(middle), RW-2265-3939 (right).\nEach plot gives number of dis-\nclosed matches vs. time.\n\n6 Conclusions and Ongoing Research\nWe have initiated a theoretical investigation of the problem of reciprocal recommendation in an ad\nhoc model of sequential learning. Under suitable clusterability assumptions, we have introduced\nan ef\ufb01cient matchmaker called SMILE , and have proven its ability to uncover matches at a speed\ncomparable to the omniscient Matchmaker, so long as M and T are not too small (Theorem 3 and\nCorollary 1). Our theoretical \ufb01ndings also include a computational complexity analysis (Theorem\n5), as well as limitations on the number of disclosable matches in both the general (Theorem 1) and\nthe cluster case (Theorem 4). We complemented our results with an initial set of experiments on\nsynthetic and real-world datasets in the online dating domain, showing encouraging evidence.\nCurrent ongoing research includes: i. Introducing suitable noise models for the sign function \u03c3. ii.\nGeneralizing our learning model to nonbinary feedback preferences. iii. Investigating algorithms\nwhose goal is to maximize the area under the curve \u201cnumber of matches-vs-time\", i.e., the criterion\nt\u2208[T ] Mt(A) , rather than the one we analyzed in this paper; maximizing this criterion requires\ninterleaving the phases where we collect matches (exploration) and the phases where we do actually\ndisclose them (exploitation). iv. More experimental comparisons on different datasets against heuristic\napproaches available in the literature. v. Incorporating in the protocol different frequencies for the\nuser logins.\n\n(cid:80)\n\n9\n\n0100000200000300000400000UROMMOOMMI-SMILE#matches found#recommendations0400080001200016000UROMMOOMMI-SMILE#matches found#recommendations0500010000150002000025000UROMMOOMMI-SMILE#matches found#recommendations\fAcknowledgements\nWe would like to thank the anonymous reviewers for their valuable comments and suggestions that\nhelped improving the presentation of this paper. Special thanks to Flavio Chierichetti and Marc\nTommasi for helpful discussions in early stages of our investigation. Fabio Vitale acknowledges\nsupport from the ERC Starting Grant \u201cDMAP 680153\u201d, the Google Focused Award \u201cALL4AI\u201d, and\ngrant \u201cDipartimenti di Eccellenza 2018-2022\u201d, awarded to the Department of Computer Science of\nSapienza University.\n\nReferences\n[1] Joshua Akehurst, Irena Koprinska, Kalina Yacef, Luiz Augusto Pizzato, Judy Kay, and Tomasz\nRej. CCR - A content-collaborative reciprocal recommender for online dating. In IJCAI Int. Jt.\nConf. Artif. Intell., pages 2199\u20132204, 2011.\n\n[2] Joshua Akehurst, Irena Koprinska, Kalina Yacef, Luiz Augusto Pizzato, Judy Kay, and Tomasz\nRej. Explicit and Implicit User Preferences in Online Dating. New Front. Appl. Data Min.,\npages 15\u201327, 2012.\n\n[3] Ammar Alanazi and Michael Bain. A Scalable People-to-People Hybrid Reciprocal Recom-\nmender Using Hidden Markov Models. In 2nd Int. Work. Mach. Learn. Methods Recomm. Syst.,\n2016.\n\n[4] Lukas Brozovsky and Vaclav Petricek. Recommender system for online dating service. In\n\nProceedings of Znalosti 2007 Conference, Ostrava, 2007. VSB.\n\n[5] J. Emmanuel Candes and Terence Tao. The power of convex relaxation: Near-optimal matrix\n\ncompletion. IEEE Transactions on Information Theory, 56(5):2053\u20132080, 2010.\n\n[6] Paul Christiano. Online local learning via semide\ufb01nite programming. In Proceedings of the\nForty-sixth Annual ACM Symposium on Theory of Computing, STOC \u201914, pages 468\u2013474, 2014.\n\n[7] F. Diaz, D. Metzler, and S. Amer-Yahia. Relevance and ranking in online dating systems. In\n33rd ACM conf. on Research and development in information retrieval, SIGIR\u201910, pages 66\u201373,\n2010.\n\n[8] C. Gentile, M. Herbster, and S. Pasteris. Online similarity prediction of networked data from\nknown and unknown graphs. In Proceedings of the 23rd Conference on Learning Theory (26th\nCOLT), 2013.\n\n[9] E. Hazan, S. Kale, and S. Shalev-Shwartz. Near-optimal algorithms for online matrix prediction.\n\nIn Proceedings of the 25th Annual Conference on Learning Theory (COLT\u201912), 2012.\n\n[10] M. Herbster, S. Pasteris, and M. Pontil. Mistake bounds for binary matrix completion. In NIPS\n\n29, pages 3954\u20133962, 2016.\n\n[11] Wenxing Hong, Siting Zheng, Huan Wang, and Jianchao Shi. A job recommender system based\n\non user clustering. Journal of Computers, 8(8):1960\u20131967, 2013.\n\n[12] A. Kleinerman, A. Rosenfeld, F. Ricci, and S. Kraus. Optimally balancing receiver and\nrecommended users\u2019 importance in reciprocal recommender systems. In Proceedings of the\n12th ACM Conference on Recommender Systems, 2018.\n\n[13] A. Kleinerman, A. Rosenfeld, and S. Kraus. Providing explanations for recommendations\nin reciprocal environments. In Proceedings of the 12th ACM Conference on Recommender\nSystems, 2018.\n\n[14] V. Koltchinskii, K. Lounici, and A. Tsybakov. Nuclear norm penalization and optimal rates for\n\nnoisy matrix completion. In arXiv:1011.6256v4, 2016.\n\n[15] J. Kunegis, G. Gr\u00f6ner, and T. Gottron. Online dating recommender systems: The split-complex\nnumber approach. In 4th ACM RecSys workshop on Recommender systems and the social web,\n2012.\n\n10\n\n\f[16] Lei Li and Tao Li. MEET: A Generalized Framework for Reciprocal Recommender Systems.\n\nIn Proc. 21st ACM Int. Conf. Inf. Knowl. Manag. (CIKM \u201912), pages 35\u201344, 2012.\n\n[17] Saket Maheshwary and Hemant Misra. Matching resumes to jobs via deep siamese network. In\n\nCompanion Proceedings of the The Web Conference 2018, WWW \u201918, pages 87\u201388, 2018.\n\n[18] Istvan Pilaszy and Domonkos Tikk. Movies: Even a few ratings are more valuable than metadata.\n\nIn In Proceedings of the 3rd ACM Conference on Recommender Systems (RecSys), 2009.\n\n[19] Luiz Augusto Pizzato, Tomasz Rej, Joshua Akehurst, Irena Koprinska, Kalina Yacef, and Judy\nKay. Recommending people to people: the nature of reciprocal recommenders with a case study\nin online dating. User Model. User-adapt. Interact., 23(5):447\u2013488, 2013.\n\n[20] S. Shalev-Shwartz, Y. Singer, and A. Ng. Online and batch learning of pseudo-metrics. In\nProceedings of the twenty-\ufb01rst international conference on Machine learning, ICML 2004.\nACM, 2004.\n\n[21] K. Tsuda, G. R\u00e4tsch, and M. K. Warmuth. Matrix exponentiated gradient updates for on-line\nlearning and bregman projections. Journal of Machine Learning Research, 6:995\u20131018, 2005.\n\n[22] M. K. Warmuth. Winnowing subspaces. In Proceedings of the 24th International Conference\n\non Machine Learning, pages 999\u20131006, 2007.\n\n[23] Peng Xia, Benyuan Liu, Yizhou Sun, and Cindy Chen. Reciprocal Recommendation System\nfor Online Dating. In Proc. 2015 IEEE/ACM Int. Conf. Adv. Soc. Networks Anal. Min. 2015 -\nASONAM \u201915, pages 234\u2013241. ACM Press, 2015.\n\n11\n\n\f", "award": [], "sourceid": 5037, "authors": [{"given_name": "Fabio", "family_name": "Vitale", "institution": "Sapienza University of Rome"}, {"given_name": "Nikos", "family_name": "Parotsidis", "institution": "University of Rome Tor Vergata"}, {"given_name": "Claudio", "family_name": "Gentile", "institution": "INRIA"}]}