{"title": "Learning from Group Comparisons: Exploiting Higher Order Interactions", "book": "Advances in Neural Information Processing Systems", "page_first": 4981, "page_last": 4990, "abstract": "We study the problem of learning from group comparisons, with applications in predicting outcomes of sports and online games. Most of the previous works in this area focus on learning individual effects---they assume each player has an underlying score, and the ''ability'' of the team is modeled by the sum of team members' scores. Therefore, all the current approaches cannot model deeper interaction between team members: some players perform much better if they play together, and some players perform poorly together. In this paper, we propose a new model that takes the player-interaction effects into consideration. However, under certain circumstances, the total number of individuals can be very large, and number of player interactions grows quadratically, which makes learning intractable. In this case, we propose a latent factor model, and show that the sample complexity of our model is bounded under mild assumptions. Finally, we show that our proposed models have much better prediction power on several E-sports datasets, and furthermore can be used to reveal interesting patterns that cannot be discovered by previous methods.", "full_text": "Learning from Group Comparisons: Exploiting\n\nHigher Order Interactions\n\nYao Li\n\nDepartment of Statistics\n\nUniversity of California, Davis\n\nyaoli@ucdavis.edu\n\nMinhao Cheng\n\nDepartment of Computer Science\n\nUniversity of California, Los Angeles\n\nmhcheng@ucla.edu\n\nKevin Fujii\n\nDepartment of Statistics\n\nUniversity of California, Davis\n\nkmfujii@ucdavis.edu\n\nFushing Hsieh\n\nDepartment of Statistics\n\nUniversity of California, Davis\n\nfhsieh@ucdavis.edu\n\nCho-Jui Hsieh\n\nDepartment of Computer Science\n\nUniversity of California, Los Angeles\n\nchohsieh@cs.ucla.edu\n\nAbstract\n\nWe study the problem of learning from group comparisons, with applications in\npredicting outcomes of sports and online games. Most of the previous works in\nthis area focus on learning individual effects\u2014they assume each player has an\nunderlying score, and the \u201cability\u201d of the team is modeled by the sum of team\nmembers\u2019 scores. Therefore, current approaches cannot model deeper interaction\nbetween team members: some players perform much better if they play together,\nwhile some players perform poorly together. In this paper, we propose a new model\nthat takes the player-interaction effects into consideration. However, under certain\ncircumstances, the total number of individuals can be very large, and number\nof player interactions grows quadratically, which makes learning intractable. In\nthis case, we propose a latent factor model, and show that the sample complexity\nof our model is bounded under mild assumptions. Finally, we show that our\nproposed models have much better prediction power on several E-sports datasets,\nand furthermore can be used to reveal interesting patterns that cannot be discovered\nby previous methods.\n\n1\n\nIntroduction\n\nNowadays there are a lot of online games in the form of group comparisons, and this e-sports industry\nis growing at an unexpected pace. For example, League of Legends (LoL) has attracted more than 11\nmillion active players in each month; Dota 2 had a grand prize of near 25 million dollars last year. A\nbig crowd of players and matches certainly creates many challenges: for instance, how to design a\ngood matchmaking system to match two teams with similar strengths, and how to form a better team\ncomposition to win the game. To answer these questions, we consider the core problem of modeling\ngroup comparisons: given the results of previous games (each game is a group comparison between\ntwo teams), how to predict the outcome of an unseen game?\nAll the previous work in this area focuses on the individual scoring model, that is, assuming each\nplayer has an underlying score, and the \"ability\" of the team is modeled by the sum of team members\u2019\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fscores. Through the process, one can also rank a player by his or her score in the model. For example,\n[13] extends the Bradley-Terry model to the group comparison setting; [12] proposed a TrueSkill\nalgorithm to learn individual scores using a Bayesian model, which has now been used by game\ncompanies and sport analysts.\nHowever, a common weakness of previous methods is that they ignore the player-interaction effects.\nIn a team challenge, the players work together and will always in\ufb02uence each other, and this\nplayer-interaction effect can signi\ufb01cantly alter game results. To make the prediction more accurate,\nincorporating the player-interaction effects are demanding. On the other hand, people are also\ninterested in the cooperation effects between players. Team coach can pair a better team based on\nboth individual abilities and cooperation abilities; game designers such as Blizzard can use the results\nto design their heroes. This brings us to the questions we wish to answer in this paper:\n\n\u2022 Can incorporating player-interaction effects improve the prediction accuracy of the model?\n\nHow to interpret those effects?\n\n\u2022 If the total number of players is too large, how can our algorithm scale up and meanwhile\n\nmaintain good generalization error and ef\ufb01cient computational time?\n\nTo answer the \ufb01rst question, we propose a new model that can incorporate pairwise effects, and show\nthat the pairwise effects can be learned when there are not too many players. The player-interaction\ncan not be fully modeled by pairwise effects. This is the \ufb01rst step, and investigating effects with\norder higher than two is our future work. As for the second question, we propose a latent factor\nmodel to describe pairwise interactions between players, and propose an ef\ufb01cient stochastic gradient\ndescent algorithm to solve it. A theoretical bound of the sample complexity is provided under mild\nconditions.\nIn the experimental part, we test our model on online game datasets and show that our proposed\nmodels have much higher prediction accuracy than previous individual-score based models. For\nexample, in Heroes of the Storm data our new models can get around 80% accuracy, while state-of-\nthe-art models such as Trueskill can only achieve 60% accuracy.\n\n2 Problem Setting\n\nAssume there are n individuals {1,\u00b7\u00b7\u00b7 , n}, and T observed comparisons. Each game involves two\nt and It , each of them is a subset of {1,\u00b7\u00b7\u00b7 , n}, indicating the players involved\ndisjoint teams I +\nin the team. Without loss of generality we assume team I +\nloses the\ngame. For simplicity, we assume each game has an equal number of players on each team, and there\nL different combinations, where L = |I +\nt | = |It | is the number of players on\n\neach team. For each game, the outcome ot can be observed under two scenarios:\n\ncould be N = n\n\nt wins the game, and team It\n\nobserved: ot 2 R can be any real number.\nence is revealed: ot 2{ +1,1}.\n\n\u2022 Measured outcomes (scores): for each comparison, the value of the score difference is\n\u2022 Binary indicator outcomes (wins/losses): for each comparison, the sign of the score differ-\nMost problems are given in the form of second case. However, in some cases it is possible to observe\nthe scores. For example, the score in an NBA game, or the number of kills in an online matching\ngame. Our proposed approaches work for both cases since we assume a general loss function, while\nin the experiment we focus on binary outcomes.\n\n3 Related Work\n\nLearning individual scores from group comparisons. Most of the previous work focus on learning\nindividual scores by group comparisons [13, 12]. All of them make the following assumption:\nAssumption 1. The team\u2019s score is the sum over team members\u2019 scores: s+\nt st .\nwj is the ability of player j. The observed outcome is determined by s+\n\nt =Pj2I +\n\nwj, where\n\nt\n\n2\n\n\fFor example, [13] proposed a generalized Bradley-Terry model: assume wj is the score for the j-th\nplayer, and\n\nP (I +\n\nt beats It ) = exp(Xj2I +\n\nt\n\nwj)/ exp(Xj2I +\n\nt\n\nwj) + exp(Xj2It\n\nwj),\n\nthen the MLE estimator for the underlying scores w 2 Rn can be estimated by \u02c6w =\narg minwPT\nt beats It ). Trueskill [12] is a Bayesian approach for learning the scores\nusing a similar generating model, and is used in most of the real world online game matching systems.\nHere we also consider a simple but effective individual-score based method:\n\nt=1 log P (I +\n\nmin\n\nw2RnXT\n\nt=1\n\n`(wT xt, ot) + R(w).\n\n(1)\n\nw 2 Rn is the individual score vector we want to learn. xt 2 Rn is the indicator vector, where\nt , (xt)j = 1 if j 2 It , and (xt)j = 0 for all other elements. Although we\n(xt)j = 1 if j 2 I +\ncannot \ufb01nd this simple model in the literature, in practice we found this often outperforms Trueskill\nand Bradley-Terry model when `(\u00b7,\u00b7) is logistic loss and R(w) is L2 regularization, so we also\ninclude this model in our comparisons in the experimental part. For all the individual score models, it\nis not hard to observe that they require at least O(n) games in order to recover n individual scores\nwith small error.\nFactorization machine. Factorization machines were introduced by Rendle [17]. They hold great\npromise in the applications with sparse predictors, especially when pairwise interaction of variables\nis useful and linear complexity with polynomial results is desired. For example, [16] introduced a\nfactorized sparse model to identify high-order feature interactions in linear and logistic regression\nmodels. In this paper, we propose a factorization model to help scale up when number of players (n)\nis large.\nOther related work. Ranking individuals from pairwise comparisons have been extensively studied.\nThe famous Elo system [11] has been used for a long time for chess and other sports ratings. [19, 10]\nalso proposed some different approaches with theoretical guarantee. [3, 4] recently provide a novel\nview of the ranking problem by modeling intransitivity in pairwise comparisons (intransitivity means\na > b, b > c, c > a). However, all these papers consider pairwise comparisons, while we consider\nthe problem of group comparisons in this paper.\nAnother recent line of research studies how to improve the ranking algorithm by exploiting feature\ninformation. [20] discussed a Bradley-Terry model with features. [21] applied a factorization model\nto incorporate feature information. [5] proposed a simple method to combine feature-based and\ncomparison-based approaches and demonstrated the use of feature can reduce the complexity in\ntheory. We do not consider features in this paper. However, since most of the online game matching\ndata has features associated with each game, it is our future work to explore this area.\n4 Exploiting higher order information\n\nAll the current approaches cannot model the pairwise relationships of team members: some players\nperform much better if they play together, and some players perform poorly together. We propose the\nfollowing methods to model pairwise interactions.\nBasic Model for Higher Order Interactions. We assume each player has its individual score wjj.\nAnd for each pair of players, there is a pairwise score wjq. A team\u2019s ability is modeled by\n\ns+\n\nt =Xj,q2I +\n\nt\n\nwjq.\n\nOur goal is to learn the model so that the score s+\nt\nt , et 2{ 0, 1}n are the indicator vectors for I +\ne+\nand (et )j = 1 if j 2 It . Then the objective function can be written as\n\nis larger than st\n\nt and It\n\nrespectively, where (e+\n\nfor each game. Assume\nt )j = 1 if j 2 I +\nt\n\nBasic HOI: min\n\nW2Rn\u21e5n\n\nTXt=1\n\n`((e+\n\nt )T W (e+\n\nt ) (et )T W (et ), ot) + kWk2\nF .\n\n(2)\n\n3\n\n\fW = (wjq) 2 Rn\u21e5n is the score matrix of players, where diagonal element wjj corresponds to the\nability score of player j and wjq corresponds to the pairwise score of players (j, q). One way to\nsolve (2) is by transforming it to classical empirical risk minimization. Since\nt (e+\n\nt )T ) = vec(W )T vec(e+\n\nt ) = tr(W e+\n\nt )T W (e+\n\nt )T ),\n\nt (e+\n\n(e+\n\nproblem (2) is equivalent to\n\n(3)\n\nmin\nw2Rn2\nt (e+\n\n`(wT xt, ot) + kwk2\n2,\n\nTXt=1\nt )T et (et )T ). After this reformulation, (3) can be solved\nwhere w = vec(W )T , xt = vec(e+\nby standard SVM or logistic regression packages when `(\u00b7,\u00b7) is hinge loss or logistic loss.\nIndeed, this model is quite \ufb02exible and can be extended to extract higher order interactions, such as\ninteractions among any 3 players, or 4 players. The only problem is the number of parameters will be\nvery large when higher order information is used.\nDif\ufb01culty in scaling to large problems. Our basic model is quite effective when the number of\nplayers n is small (see our experimental results). However, in many real world problems n is very\nlarge. For example, even a small online game would have tens of thousands of players, and popular\ngames such as LoL or Heroes of the Storm typically have millions of players. Unfortunately, our\nbasic model cannot scale to large n due to the following two reasons:\n\n\u2022 In terms of sample complexity, (2) has n2 parameters. Based on standard statistical learning\ntheory, it requires at least O(n2) observed samples to recover the underlying scores. Even\nfor 10,000 players, (2) will require 100 million games to get a good estimate.\n\u2022 In terms of computing, (2) requires O(n2) memory to store the W matrix, which is typically\ndense unless making further structural assumption. Therefore, a standard solver will be hard\nto scale beyond 30, 000 players.\n\n4.1 Factorization Model for Higher Order Interactions (Factorization HOI)\n\nTo overcome the large n problem, we propose the following Factorization HOI model, which assumes\na team\u2019s score can be written as\n\ns+\n\nt = Xj2I +\n\nt\n\nwj + Xj2I +\n\nt Xq2I +\n\nt\n\nvT\nj vq.\n\nModel parameters that have to be estimated are w 2 Rn and V 2 Rk\u21e5n, each vj is the j-th column\nof V . The hyper-parameter k de\ufb01nes the dimensionality of the factorization.\nIn this model, we capture the individual strength by wj, and each pairwise strength is modeled by\nj vq. This assumption is the key point which allows high quality and ef\ufb01cient parameter\nwjq \u21e1 vT\nestimation of higher order interactions. An intuitive explanation about this model is that each player\nis associated with k latent features, and the interaction between two players is modeled by the\ninteraction of them via these latent features.\nTo estimate the parameters for Factorization HOI, we solve the following optimization problem:\n\nargmin\n\nw2Rn,V 2Rk\u21e5n\n\n= argmin\n\nw2Rn,V 2Rk\u21e5n\nw\n2 kwk2\n\n+\n\nt st , ot) +\n\nw\n2 kwk2\nt et ) + Xj,q2I +\n\nt\n\nF\n\n2 + V kV k2\nj vq Xj,q2It\n\nvT\n\n`(wT (e+\n\n`(s+\n\nTXt=1\nTXt=1\n2 + V kV k2\nF ,\n\n(4)\n\nvT\nj vq, ot)\n\nwhere w and V are the regularization parameters.\nEf\ufb01cient Solver. To solve (4), we propose the following algorithm that alternatively updates w and\nV . When V is \ufb01xed, the problem becomes a standard empirical risk minimization (similar to (1)),\n\n4\n\n\fwhich can be solved by standard packages for linear SVM or logistic regression. When w is \ufb01xed,\nwe use stochastic gradient descent (SGD) to solve the following subproblem with respect to V :\n\nargmin\nV 2Rk\u21e5n\n\nvT\n\nTXt=1\u2713`(rt + Xj,q2I +\n\n2\u25c6,\nV\ndj kvjk2\nt [ It }| is number of games involving player j and rt = wT (e+\n\nj vq, ot) + Xj2I +\n\nj vq Xj,q2It\n\nt [It\n\nvT\n\nt\n\n(5)\n\nt et ).\n\nwhere dj = |{t : j 2 I +\nThe SGD update is then\n\nvj vj 2\u2318(`0(s+\nvj vj 2\u2318(`0(s+\n\nt st )(Xq2I +\nt st )(Xq2It\n\nt\n\nvq) + (V /dj)vj) if j 2 I +\nvq) + (V /dj)vj) if j 2 It .\n\nt\n\nEach SGD step only costs O(kL) time by pre-computingPq2I +\n\nHOI can scale to very large datasets.\n4.2 Sample Complexity Analysis. How many games do we need?\n\nvq andPq2It\n\nt\n\nvq, so Factorization\n\nTo derive the theoretical guarantee, we \ufb01rst re-formulate (4). In this model, we can rewrite\n\ns+\nt = wT e+\n\nt + (e+\n\nt )T (V T V )e+\nt .\n\nTherefore, by assuming M = V T V , and using the fact that kMk\u21e4 = minV :M =V T V kV k2\nFactorization HOI (4) can be converted to the following nuclear norm regularization problem:\n\nF , the\n\nmin\nw,M\n\nTXt=1\n\n`(fw,M (e+\n\nt , et ), ot) + (w/2)kwk2\n\n2 + V kMk\u21e4,\n\nwhere fw,M (e+, e) := wT (e+ e) + (e+)T M e+ (e)T M e. We then derive the guarantee\nfor the following equivalent hard-constraint form:\n\nmin\nw,M\n\n1\n\nT Xt2\u2326\n\n`(fw,M (e+\n\nt , et ), ot) s.t. kwk2 \uf8ff w, kMk\u21e4 \uf8ffM ,\n\n(6)\n\nt , et\nwhere \u2326 is the set of observed group comparisons (there can be repeated pairs in \u2326). Assume e+\nare sampled from E de\ufb01ned by all the n-dimensional 0/1 vectors with L ones, where L is the number\nof players on each team. Both of them are sampled from a \ufb01xed distribution, under the sampling with\nreplacement model. Our goal is to bound the expected error de\ufb01ned by\n\nR(f ) := E\uf8ff1sgn(f (e+\n\nt , et )) 6= sgn(ot).\n\nMore speci\ufb01cally, we want to study the sample complexity of our model: how many samples do we\nneed for our model to achieve small prediction error? We will show that the number of samples is\nproportional to the nuclear norm (M) and the two norm (w) of the underlying solution, which can\nbe small in many realistic scenarios. The sample complexity analysis is based on problem (6), but\nsolving it is slow (due to the need of SVD). In practice, we solve problem (4) for large-scale problem.\nNote that this is a generalized low-rank model, so based on [9], solving (4) with gradient descent\ncould converge to global minimum under certain assumptions. All the detailed proofs are included in\nthe appendix.\nWe will need the notation of expected and empirical `-risk:\n\nExpected `-risk: R`(f ) = E[`(f (e+\n\nt , et ), ot)]\n\nEmpirical `-risk: \u02c6R`(f ) =\n\n1\nT\n\n`(f (e+\n\nt , et ), ot).\n\nTXt=1\n\nLet the set of feasible w, M de\ufb01ned as \u21e5= {(w, M )|kwk2 \uf8ff w and\nkMk\u21e4 \uf8ffM} and F\u21e5 = {fw,M | (w, M ) 2 \u21e5}. We then have the following lemma:\n\n5\n\n\fLemma 1. Let ` be a loss function with Lipschitz constant L` bounded by B with respect to its \ufb01rst\nargument, and be a constant where 0 << 1. Then with probability at least 1 , the expected\n`-risk is upper bounded by:\n\n4wr L\n\nT\n\n+ 8L`MLr log(2n)\n\nT\n\n,s 144c3L`BpL(w + pnLM)pN\n\nT\n\nR`(f )\uf8ff \u02c6R`(f )+min8<:\nBs log 1\n\n2T\n\n,\n\n\n\n+\n\n9=;\n\nfor all f 2F \u21e5, where T is number of games and c3 is a universal constant. For other constants\nplease see Section 2 for details.\nLemma 1 states that the expected loss will be close to empirical loss if w and M are small, and the\nbound is proportional to w,M and inverse proportional to pT .\nt , st are generated from some underlying\nNow we discuss the recovery guarantee if the score s+\nmodel following s+\nMjq with kwk \uf8ff w and kMk\u21e4 \uf8ffM , and\nassume the assumptions in Lemma 1 are also satis\ufb01ed. We then have the following two theorems:\nTheorem 1. (Guarantee for score difference case). Let 2 (0, 1) be a constant. Suppose the\nfollowing assumptions hold:\n\nwj +Pj2I +\n\nt = Pj2I +\n\nt Pq2I +\n\nt\n\nt\n\n\u2022 T clean comparisons1 ot = s+\n\u2022 The convex surrogate loss functions ` is bounded for each ot, with `(z, z) = 0.\n\nt st are observed.\n\nwith probability at least 1 , the optimal f\u21e4 from problem (6) satis\ufb01es:\n\nR(f\u21e4) \uf8ff min8<:\n\nO w\npT\n\n+ Mr log(2n)\n\nT ! , O0@s (w + pnLM)pN\n\nT\n\n+ O0@s log 1\nT 1A ,\n\n\n\n1A9=;\n\nt st )), we have the\nWhen we can only observe the winning/losing game results (ot = sgn(s+\nfollowing guarantee.\nTheorem 2. (Guarantee for binary result case). Let 2 (0, 1) be a constant. Suppose the following\nassumptions hold:\n\n\u2022 T clean comparisons ot = sgn(s+\n\u2022 The convex surrogate loss functions ` is bounded for each ot.\n\nt st ) are observed.\n\nWith probability at least 1 , the optimal f\u21e4 from problem (6) satis\ufb01es:\n\nR(f\u21e4)\uf8ffO\u21e3 \u02c6R`(f\u21e4)R\u21e4`\u2318+min(O w\nT !,\n+Mr log(2n)\n1A9=;\nO0@s (w+pnLM)pN\n+O0@s log 1\nT 1A\n\npT\n\nT\n\n\n\nwhere R\u21e4` = inf f R`(f ).\nIn Theorem 2, the term \u02c6R`(f\u21e4) R\u21e4` may not be zero but will be small, depending on how we\nde\ufb01ne loss. In summary, after observing T samples, the expected error will be O(min(w + M, (w +\nM)1/2N 1/4))/pT ) in Theorem 1. The second term has less dependency on w and M, but will be\nlarge for large L (number of players per team), since N = O(nL). However, we take the minimum\nfor these two terms, so in either cases the sample complexity will be small if the nuclear norm M\nand two norm of w are small. We have the same conclusion for binary (+1/1) observations when\n\u02c6R`(f\u21e4) R\u21e4` = O(\u270f).\n\n1\u201cclean comparison\u201d means that the observed outcomes are noiseless.\n\n6\n\n\fFigure 1: Projection of interaction features for each hero (vi) to 2-D space. Colors represents the of\ufb01cial\ncategorization for these heroes. This low-dimensional representation reveals some interesting patterns for\npairwise interactions between heroes in Heroes of the Storm.\n\nAll the previous discussion is based on the assumption that we can observe clean comparisons.\nHowever, in practice, we usually observe noisy comparisons. We use a standard \"\ufb02ip sign model\"[19],\nwhere each comparison result is independently \ufb02ipped (\u02dcot = sgn(ot)) with probability \u21e2f 2 [0, 0.5),\nwhere \u02dcot is the observed \ufb02ipped result. The following theorem shows that with noisy comparisons,\nwe just need slightly more samples, depending on the noise level.\nTheorem 3. (Guarantee for noisy comparisons). Let each ot is now observed under the \"\ufb02ip sign\nmodel\" with \u21e2f 2 [0, 0.5). Then by solving Factorization HOI with squared loss,\n\nmin8<:\n\nO 1\n\n1 2\u21e2f\n\n(\n\nw\npT\n\n+ Mr log(2n)\n\nT\n\n)! , O0@s (w + pnLM)pN\n\n(1 \u21e2f )T\n\n+ O0@s log( 1\nT 1A\n\n )\n\n1A9=;\n\ncomparisons suf\ufb01ce to guarantee an \u270f-accurate result.\nTheorem 3 demonstrates that in noisy comparison case, Factorization HOI can achieve \u270faccurate\nresult with the same order of sample complexity as in clean comparison cases, but with a extra price,\nwhich is a\n\nfactor.\n\nor\n\n1\n\n1\u21e2f\n\n1p1\u21e2f\n\n5 Experimental Results\n\nWe include the following algorithms in our experiments:\n\nmates the pairwise interaction by a factor form.\n\n\u2022 Basic HOI: the proposed basic model using pairwise information with squared hinge loss\n(see eq (2)).\n\u2022 Factorization HOI: the proposed model in eq (4) with squared hinge loss, which approxi-\n\u2022 Trueskill [12]: the state-of-the-art algorithm used in all the online game matching engines.\nSince it is an online algorithm, we test the performance after running 1 epoch and 10 epochs.\nWe do not observe any accuracy gain after 10 epochs.\n\u2022 Bradley-Terry model [13]: the generalized Bradley-Terry model for group comparison data.\n\u2022 Logistic Regression (LR): another baseline for individual score model (1) using logistic loss.\nWe have not seen this algorithm in the literature, but we found this simple approach works\nquite well so we include it into comparison.\n\nDatasets and parameter settings. We consider datasets from two online games: Heroes of the\nStorm (HotS) and Dota 2. Both of them are Multiplayer Online Battle Arena (MOBA) games. In\neach game, two \ufb01ve-player teams \ufb01ght with each other on a map. Each player can choose one of the\n\n7\n\n\fheroes (characters), and each hero has different abilities. There are totally around 60 heroes in HotS\nand 100 heroes in Dota 2. For each dataset we consider two tasks: (1) we consider each hero as an\nindividual so that each game we get the group comparisons between 5 heroes versus another 5 heroes.\nAnd the goal is to predict the outcome of the games. Since there are only around 100 heroes, the\nparameter space will not be too large even for learning n2 pairwise interactions. (2) We also consider\neach player as an individual, so that each group comparison is between 5 players versus another 5\nplayers. In this case, there can be tens of thousands of players, so the parameter space is huge.\nWe collect the following three sets of data. For HotS tournament matches, we download all matching\nrecords provided by Hotslog2 for the years of 2015 and 2016. For HotS public game data, we crawl the\nmatching history of Master players in Hotslog. There are three game modes for public games\u2014quick\nmatch, team league, and hero league. Here we only consider the hero league data since it is closer to\nthe of\ufb01cial tournament games. For Dota 2, we download the recent data from OpenDota 3. We focus\non a set of \u201cnotable players\u201d (de\ufb01ned by the website), and get all their matching data in public games.\nFor each dataset, we have two different views, taking heroes as individuals (n) or taking players as\nindividuals (n). So we have 6 datasets in total, as listed in Table 1.\nFor each dataset, we randomly divided the games into 80% for training and 20% for testing. For all\nthe methods, we cross validate on the training set to choose the best parameter, and then use the best\nparameter to train a \ufb01nal model, which is then evaluated on the testing set. For our model, determining\nthe values of k is a trade-off between the model ef\ufb01ciency and accuracy. In our experiments, we\nchoose k by cross validation. Accuracy is evaluated by number of correct predicted games divided by\nthe total number of testing games. The results are presented in Table 2. Note that Basic HOI will\ngenerate n2 parameters, so it runs out of memory for some datasets. We have the following \ufb01ndings:\n\u2022 Our proposed algorithms, Basic HOI and Factorization HOI are always better than indi-\nvidual models, which indicates that higher order information is useful for modeling group\ncomparisons. Moreover, we observe that higher order information is particularly useful for\ntournament data (HotS tournament), which makes sense because tournament players are\nmore advanced and have better teamwork. The outcome of a professional game is often\ndetermined by some good use of \u201ccombo\u201d.4\n\u2022 For hero data, since the number of individuals is small, Basic HOI is able to learn a good\nmodel for all the individual scores and thus slightly outperforms Factorization HOI. However,\nwhen the number of individuals grow to thousands (e.g., two HotS player datasets), Basic\nHOI has too many parameters to learn and suffers from over-\ufb01tting, so the accuracy is lower\nthan Factorization HOI. Furthermore, Factorization HOI is able to scale to large amount of\nindividuals (e.g., 30,000 players in Dota 2), while Basic HOI will run out of memory since\nit requires O(n2) memory.\n\nFinally, in addition to better prediction accuracy, our model reveals interesting patterns that cannot\nbe discovered by individual scores. First, we extract the top-5 and bottom-5 hero pairs for HotS\nTournament data (see Table 3). Among them, one of the top-5 pairs, (Reghar, Illidan), is a famous\nstrong combination recognized by professional players, while most of the bottom-5 pairs are clearly\nnot good since they are heroes with repeated functions. Our results can thus guide the players and\nprofessional coaches for selecting heroes. For example, Illidan works well with Reghar, but is very\nbad with Thrall. We also extract the top-5 and bottom-5 pairs based on Bradley-Terry and Trueskill\n(see Table 4). It is obvious that the top-5/bottom-5 pairs based on Bradley-Terry and Trueskill\nare totally different from pairs got from our method, which shows that our method can capture\ninteraction effect that are not explored well in the previous methods. In addition, we project the\nlearned latent factors vi in the Factorization HOI model to a 2D space by PCA in Figure 1. These\nvectors characterize the pairwise interaction between heroes. We observe many interesting patterns.\nFor example, most of the siege heroes are on the right bottom area of the space; Murky and Leoric are\non the top-left corner, where they have similar behavior (this is actually an important combination that\nhelped C9 team to win the 2015 Heroes of the Storm championship). Illidan is in the very left-botton\ncorner, which means it is very good with other heroes in the third quadrant, but very bad with the\nheroes in the \ufb01rst quadrant.\n\n2https://www.hotslogs.com/Default\n3https://www.opendota.com\n4In games, a \u201ccombo\u201d indicates a set of actions performed in sequence that yield a signi\ufb01cant bene\ufb01t or\nadvantage. A \u201ccombo\u201d usually requires very precise timing, so is more commonly used by advanced players.\n\n8\n\n\fTable 1: Dataset Statistics\n\nDatasets\n\nNumber of Games (T )\n\nNumber of Individuals (n)\n\nHotS Tournament HotS Tournament HotS Public HotS Public Dota 2\n(Hero)\n46,459\n113\n\n(Player)\n139,462\n7,251\n\n(Player)\n9,610\n3,470\n\n(Hero)\n9,610\n54\n\n(Hero)\n139,462\n\n62\n\nDota 2\n(Player)\n46,459\n30,452\n\nTable 2: Performance of the proposed algorithm and other algorithms. The numbers are prediction\naccuracy (%), and \u201doom\u201d indicates out of memory here.5\n\nDatasets\n\nHotS Tournament (H)\nHotS Tournament (P)\n\nHotS Public (H)\nHotS Public (P)\n\nDota 2 (H)\nDota 2 (P)\n\nLR\n59.73\n83.45\n54.34\n54.01\n61.64\n65.98\n\nTrueskill (1)\n\nTrueskill (10) Bradley-Terry Basic HOI\n\nFactorization HOI\n\n62.90\n80.02\n53.36\n53.64\n52.50\n62.16\n\n58.48\n84.50\n53.06\n53.87\n52.61\n64.26\n\n59.52\n84.18\n53.50\n53.92\n61.37\n62.72\n\n80.59\n83.89\n54.45\n53.39\n65.34\noom\n\n77.84\n85.17\n54.75\n55.76\n63.72\n68.25\n\nTable 3: Top-5 pairs and bottom-5 hero pairs learned by our model on Heroes of the storm tournament\ndata.\n\nTop 5 pairs\n\n(Lunara, Leoric)\n\n(Kerrigen, Sylvanas)\n\n(Reghar, Illidan)\n\n(Chen, Jaina)\n(Thrall, Valla)\n\nBottom 5 pairs\n(Raynor, Zeratul)\n(Illidan, Thrall)\n(Sonya, Zeratul)\n\n(Muradin, Lt. Morales)\n\n(Anub\u2019arak, Illidan)\n\nTable 4: Top-5 and bottom-5 pairs for Trueskill and Bradley-Terry Method\n\nTop 5 pairs (Trueskill)\n\n(Auriel,Kerrigan)\n(Auriel,Tracer)\n(Auriel,Rexxar)\n\n(Auriel,The Lost Vikings)\n\n(Auriel,Kerrigan)\n\nBottom 5 pairs (Trueskill)\n(Chromie,Sgt.Hammer)\n(Chromie,The Butcher)\n\n(Chromie,Valla)\n\n(Chromie,Gazlowe)\n(Chromie,Tychus)\n\nTop 5 pairs (BTL)\n(Auriel,Medivh)\n\n(Auriel,The Lost Vikings)\n\n(Auriel,Rehgar)\n(Auriel,Kerrigan)\n(Auriel,Brightwing)\n\nBottom 5 pairs (BTL)\n(Chromie,Sgt.Hammer)\n\n(Chromie,Gazlowe)\n\n(Chromie,The Butcher)\n\n(Chromie,Tychus)\n(Chromie,Artanis)\n\n6 Conclusions\n\nPrevious models for group comparisons are all based on individual score models. In this paper,\nwe develop novel algorithms to utilize higher order interactions between players. The proposed\nalgorithm achieves much higher accuracy than existing methods, indicating that modeling higher\norder interaction is crucial for mining group comparison data.\n\n7 Acknowledgement\n\nThe paper is partially supported by the support of NSF via IIS-1719097, Intel Faculty Award, Google\nCloud and Nvidia.\n\n5In online games, \u201coom\u201d often stands for out-of-mana.\n\n9\n\n\fReferences\n[1] P. L. Bartlett, M. I. Jordan, and J. D. McAuliffe. Convexity, classi\ufb01cation, and risk bounds. Journal of the\n\nAmerican Statistical Association, 101(473):138\u2013156, 2006.\n\n[2] P. L. Bartlett and S. Mendelson. Rademacher and gaussian complexities: Risk bounds and structural results.\n\nJournal of Machine Learning Research, 3(Nov):463\u2013482, 2002.\n\n[3] S. Chen and T. Joachims. Modeling intransitivity in matchup and comparison data. In Proceedings of the\n\nNinth ACM International Conference on Web Search and Data Mining, pages 227\u2013236. ACM, 2016.\n\n[4] S. Chen and T. Joachims. Predicting matchups and preferences in context. 2016.\n\n[5] K.-Y. Chiang, C.-J. Hsieh, and I. Dhillon. Rank aggregation and prediction with item features. In Arti\ufb01cial\n\nIntelligence and Statistics, pages 748\u2013756, 2017.\n\n[6] K. Fujii, F. Hsieh, B. Beisner, and B. McCowan. Computing power structures in directed biosocial networks\n\ni: \ufb02ow percolation and imputed conductance. Technical report, 2017.\n\n[7] H. Fushing and M. P. McAssey. Time, temperature, and data cloud geometry. Physical Review E,\n\n82(6):061110, 2010.\n\n[8] H. Fushing, M. P. McAssey, and B. McCowan. Computing a ranking network with con\ufb01dence bounds from\na graph-based beta random \ufb01eld. In Proc. R. Soc. A, volume 467, pages 3590\u20133612. The Royal Society,\n2011.\n\n[9] R. Ge, J. D. Lee, and T. Ma. Matrix completion has no spurious local minimum. In Advances in Neural\n\nInformation Processing Systems, pages 2973\u20132981, 2016.\n\n[10] D. F. Gleich and L.-h. Lim. Rank aggregation via nuclear norm minimization. In Proceedings of the 17th\nACM SIGKDD international conference on Knowledge discovery and data mining, pages 60\u201368. ACM,\n2011.\n\n[11] M. E. Glickman. A comprehensive guide to chess ratings. American Chess Journal, 1995.\n\n[12] R. Herbrich, T. Minka, and T. Graepel. Trueskill: A bayesian skill rating system. In NIPS, 2006.\n\n[13] T.-K. Huang, C.-J. Lin, and R. C. Weng. Ranking individuals by group comparisons. JMLR, 2008.\n\n[14] S. M. Kakade, K. Sridharan, and A. Tewari. On the complexity of linear prediction: Risk bounds, margin\nbounds, and regularization. In Advances in neural information processing systems, pages 793\u2013800, 2009.\n\n[15] N. Natarajan, I. S. Dhillon, P. K. Ravikumar, and A. Tewari. Learning with noisy labels. In Advances in\n\nneural information processing systems, pages 1196\u20131204, 2013.\n\n[16] S. Purushotham, M. R. Min, C.-C. J. Kuo, and R. Ostroff. Factorized sparse learning models with\ninterpretable high order feature interactions. In Proceedings of the 20th ACM SIGKDD international\nconference on Knowledge discovery and data mining, pages 552\u2013561. ACM, 2014.\n\n[17] S. Rendle. Factorization machines. In Data Mining (ICDM), 2010 IEEE 10th International Conference on,\n\npages 995\u20131000. IEEE, 2010.\n\n[18] O. Shamir and S. Shalev-Shwartz. Matrix completion with the trace norm: learning, bounding, and\n\ntransducing. Journal of Machine Learning Research, 15(1):3401\u20133423, 2014.\n\n[19] F. Wauthier, M. Jordan, and N. Jojic. Ef\ufb01cient ranking from pairwise comparisons. In International\n\nConference on Machine Learning, pages 109\u2013117, 2013.\n\n[20] C. Xiao and M. Mller. Factorization ranking model for move prediction in the game of go. In AAAI, 2016.\n\n[21] L. Zhang, J. Wu, Z.-C. Wang, and C.-J. Wang. A factor-based model for context-sensitive skill rating\nsystems. In Tools with Arti\ufb01cial Intelligence (ICTAI), 2010 22nd IEEE International Conference on,\nvolume 2, pages 249\u2013255. IEEE, 2010.\n\n10\n\n\f", "award": [], "sourceid": 2404, "authors": [{"given_name": "Yao", "family_name": "Li", "institution": "University of California, Davis"}, {"given_name": "Minhao", "family_name": "Cheng", "institution": "University of California, Davis"}, {"given_name": "Kevin", "family_name": "Fujii", "institution": "UC Davis Department of Statistics"}, {"given_name": "Fushing", "family_name": "Hsieh", "institution": "UC Davis Department of Statistics"}, {"given_name": "Cho-Jui", "family_name": "Hsieh", "institution": "UCLA, Google Research"}]}