{"title": "Unorganized Malicious Attacks Detection", "book": "Advances in Neural Information Processing Systems", "page_first": 6976, "page_last": 6985, "abstract": "Recommender systems have attracted much attention during the past decade. Many attack detection algorithms have been developed for better recommendations, mostly focusing on shilling attacks, where an attack organizer produces a large number of user profiles by the same strategy to promote or demote an item. This work considers another different attack style: unorganized malicious attacks, where attackers individually utilize a small number of user profiles to attack different items without organizer. This attack style occurs in many real applications, yet relevant study remains open. We formulate the unorganized malicious attacks detection as a matrix completion problem, and propose the Unorganized Malicious Attacks detection (UMA) algorithm, based on the alternating splitting augmented Lagrangian method. We verify, both theoretically and empirically, the effectiveness of the proposed approach.", "full_text": "Unorganized Malicious Attacks Detection\n\nMing Pang Wei Gao Min Tao\nZhi-Hua Zhou\nNational Key Laboratory for Novel Software Technology,\n\nNanjing University, Nanjing, 210023, China\n\n{pangm, gaow, zhouzh}@lamda.nju.edu.cn\n\ntaom@nju.edu.cn\n\nAbstract\n\nRecommender systems have attracted much attention during the past decade. Many\nattack detection algorithms have been developed for better recommendations,\nmostly focusing on shilling attacks, where an attack organizer produces a large\nnumber of user pro\ufb01les by the same strategy to promote or demote an item. This\nwork considers another different attack style: unorganized malicious attacks, where\nattackers individually utilize a small number of user pro\ufb01les to attack different\nitems without organizer. This attack style occurs in many real applications, yet\nrelevant study remains open. We formulate the unorganized malicious attacks\ndetection as a matrix completion problem, and propose the Unorganized Malicious\nAttacks detection (UMA) algorithm, based on the alternating splitting augmented\nLagrangian method. We verify, both theoretically and empirically, the effectiveness\nof the proposed approach.\n\n1\n\nIntroduction\n\nOnline activities have been an essential part in our daily life as the \ufb02ourish of Internet, and it is\nimportant to recommend suitable products effectively as the number of users and items increases\ndrastically. Various collaborative \ufb01ltering techniques have been developed in diverse systems to help\ncustomers choose their favorite products in a set of items [5, 18, 28]. However, most collaborative\n\ufb01ltering approaches are vulnerable to spammers and manipulations of ratings [13, 19], and attackers\ncould bias systems by inserting fake rating scores into the user-item rating matrix. Some attackers try\nto increase the popularity of their own items (push attack) while the others intend to decrease the\npopularity of their competitors\u2019 items (nuke attack).\nDetecting attacks from online rating systems is crucial to recommendations. Most attack detection\nstudies focus on shilling attacks [13], where all the attack pro\ufb01les are produced by the same strategy\nto promote or demote a particular item. For example, an attack organizer may produce hundreds of\nfake user pro\ufb01les with one strategy where each fake user pro\ufb01le gives high scores to the most popular\nmovies and low scores to the target movie. Relevant studies have shown good detection performance\non diverse shilling attack strategies [16, 19, 23].\nPractical mechanisms have been developed to prevent shilling attacks. For example, lots of online\nsites require real names and phone numbers for user registration; CAPTCHA is used to determine\nwhether the response is generated by a robot; customers are allowed to rate a product after purchasing\nthis product on the shopping website. These mechanisms produce high cost for conducting traditional\nshilling attacks; for example, small online sellers in e-commerce like Amazon have insuf\ufb01cient\ncapacity to produce hundreds of fake rating pro\ufb01les to conduct a shilling attack.\nIn this paper, we introduce another different attack model named unorganized malicious attacks,\nwhere attackers individually use a small number of user pro\ufb01les to attack their own targets without\norganizer. This attack happens in many real applications: online sellers on Amazon may produce\na few fake customer pro\ufb01les to demote their competitors\u2019 high-quality products; writers may hire\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fseveral users to give high scores to promote their own books. Actually, recommender systems may\nbe seriously in\ufb02uenced by small amounts of unorganized malicious attacks, e.g., the \ufb01rst maliciously\nbad rating can decrease the sales of one seller by 13% [20]. So far as we know, the detection of\nunorganized malicious attacks has rarely been studied, and existing attack detection approaches do\nnot work well on this kind of attack [26].\nWe formulate the unorganized malicious attacks detection as a variant of matrix completion problem.\nLet X denote the ground-truth rating matrix without attacks and noises, and assume that the matrix\nis low-rank since the users\u2019 preferences are affected by several factors [31]. Let Y be the sparse\nmalicious-attack matrix, and Z denotes a small perturbation noise matrix. What we can observe is a\nmatrix M such that M = X + Y + Z.\nWe propose the Unorganized Malicious Attacks detection (UMA) algorithm, which can be viewed\nas an extension of alternating splitting augmented Lagrangian method. Theoretically, we show\nthat the low-rank rating matrix X and the sparse matrix Y can be recovered under some classical\nmatrix-completion assumptions, and we present the global convergence of UMA with a worst-case\nO(1/t) convergence rate. Finally, empirical studies are provided to verify the effectiveness of our\nproposed algorithm in comparison with the state-of-the-art methods for attack detection.\nThe rest of this paper is organized as follows. Section 2 reviews related work. Section 3 introduces\nthe framework of unorganized malicious attacks detection. Section 4 proposes the UMA algorithm.\nSection 5 shows the theoretical justi\ufb01cation. Section 6 reports the experimental results. Section 7\nconcludes this work.\n\n2 Related Work\n\nCollaborative \ufb01ltering has been one of the most successful techniques to build recommender systems.\nThe core assumption of collaborative \ufb01ltering is that if users have expressed similar interests in the\npast, they will share common interest in the future [12]. Signi\ufb01cant progress about collaborative\n\ufb01ltering has been made [5, 18, 28, 31]. There are two main categories of conventional collaborative\n\ufb01ltering (based on the user-item rating matrix) which are memory-based and model-based algorithms.\nCollaborative \ufb01ltering schemes are vulnerable to attacks [1, 13], and increasing attention has been\npaid to attack detection. Researchers have proposed several methods which mainly focus on shilling\nattacks where the attack organizer produces a large number of user pro\ufb01les by the same strategy\nto promote or demote a particular item. These methods mainly contain statistical, classi\ufb01cation,\nclustering and data reduction-based methods [13].\nStatistical methods are used to detect anomalies with suspicious ratings. Hurley et al. [16] proposed\nthe Neyman-Pearson statistical attack detection method to distinguish malicious users from normal\nusers, and Li and Luo [17] introduced the probabilistic Bayesian network models. Based on attributes\nderived from user pro\ufb01les, classi\ufb01cation methods detect attacks by kNN, SVM, etc. [14, 24]. Bhaumik\net al. [3] presented the unsupervised clustering algorithm based on several classi\ufb01cation attributes\n[7], and they apply k-means clustering based on these attributes and classify users in the smallest\ncluster as malicious users. Variable selection method treats users as variables and calculates their\ncovariance matrix [22]. Users with the smallest coef\ufb01cient in the \ufb01rst l principal components of\nthe covariance matrix are classi\ufb01ed as malicious users. Ling et al. [19] utilized a low-rank matrix\nfactorization method to predict the users\u2019 ratings. Users\u2019 reputation is computed according to the\npredicted ratings and low-reputed users are classi\ufb01ed as malicious users.\nThese methods make detection by \ufb01nding the common characteristics of the attack pro\ufb01les that\ndiffer from the normal pro\ufb01les. Therefore, they have a common assumption that the attack pro\ufb01les\nare produced by the same attack strategy. However, this assumption does not hold for unorganized\nmalicious attacks, where different attackers use different strategies to attack their own targets.\nRecovering low-dimensional structure from a corrupted matrix is related to robust PCA [4, 9, 33].\nHowever, robust PCA focuses on recovering low-rank part X from complete or incomplete matrix,\nand the target is different from attacks detection (which is our task). Our work considers the speci\ufb01c\nproperties of malicious attacks to distinguish the attack matrix Y from the small perturbation noise\nterm Z. In this way, our method can not only recover the low-rank part X, but also distinguish Y\nfrom the noise term Z which leads to better performance.\n\n2\n\n\f3 The Formulation\n\nThis section introduces some notations and problem formulation. We introduce the general form of\nan attack pro\ufb01le, and give a detailed comparison between unorganized malicious attacks and shilling\nattacks, followed by the corresponding detection problem formulation.\n\n3.1 Notations\nWe begin with some notations used in this paper. Let (cid:107)X(cid:107), (cid:107)X(cid:107)F and (cid:107)X(cid:107)\u2217 denote the operator norm,\nFrobenius norm and nuclear norm of matrix X, respectively. Let (cid:107)X(cid:107)1 and (cid:107)X(cid:107)\u221e be the (cid:96)1 and (cid:96)\u221e\nnorm of matrix X, respectively. Further, we de\ufb01ne the Euclidean inner product between two matrices\nas (cid:104)X, Y (cid:105) := trace(XY (cid:62)), where Y (cid:62) means the transpose of Y . We have (cid:107)X(cid:107)2\nLet P\u2126 denote an operator of linear transformation over matrices space, and we also denote by P\u2126\nthe linear space of matrices supported on \u2126 when it is clear from the context. Then, P\u2126(cid:62) represents\nthe space of matrices supported on \u2126c. For an integer m, let [m] := {1, 2, . . . , m}.\n\nF = (cid:104)X, X(cid:105).\n\nFigure 1: General form of an attack pro\ufb01le.\n\n3.2 Problem Formulation\n\nBhaumik et al. [2] introduced the general form of an attack pro\ufb01le, as shown in Figure 1. The attack\npro\ufb01le contains four parts. The single target item it is given a malicious rating, i.e., a high rating in\na push attack or a low rating in a nuke attack. The selected items IS are a group of selected items\nfor special treatment during the attack. The \ufb01ller items IF are selected randomly to complete the\nattack pro\ufb01le. The null part I\u2205 contains the rest of the items with no ratings. Functions \u03b8, \u03b6 and \u03a5\ndetermine how to assign ratings to items in IS, IF and target item it, respectively. Three basic attack\nstrategies are listed as follows.\n\u2022 Random attack: IS is empty; IF is selected randomly, and function \u03b6 assigns ratings to IF\nby generating random ratings centered around the overall average rating in the database.\n\u2022 Average attack: IS is empty; IF is selected randomly, and function \u03b6 assigns ratings to IF\n\u2022 Bandwagon attack: IS is selected from the popular items and function \u03b8 assigns high ratings\n\nby generating random ratings centered around the average rating of each item.\n\nto IS. The \ufb01ller items IF are handled similarly to random attack.\n\nThe shilling attack chooses one attack strategy (e.g., average attack strategy), and \ufb01xes the target\nitem it, the numbers of rated items k and l and the rating functions. This makes the generated attack\npro\ufb01les have some common characteristics in one shilling attack. Besides, a large number of attack\npro\ufb01les are required in the basic setting of shilling attacks.\nHowever, unorganized malicious attacks allow the concurrence of various attack strategies, and\nthe number of rated items, the target item and the rating functions can be different. Each attacker\nproduces a small number of attack pro\ufb01les with their own strategies and preference [26].\nLet U[m] = {U1, U2, . . . , Um} and I[n] = {I1, I2, . . . , In} denote m users and n items, respectively.\nLet X \u2208 Rm\u00d7n be the ground-truth rating matrix. Xij denotes the score that user Ui gives to item Ij\nwithout any attack or noise, i.e., Xij re\ufb02ects the ground-truth feeling of user Ui on item Ij. Suppose\nthat the score range is [\u2212R, R], and we have \u2212R \u2264 Xij \u2264 R. In this work, we assume that X is a\n\n3\n\n\ud835\udc56\"#\u22ef\ud835\udc56%#\ud835\udc56\"&\u22ef\ud835\udc56\u2019&\ud835\udc56\"\u2205\u22ef\ud835\udc56)\u2205\ud835\udc56*\ud835\udf03(\ud835\udc56\"#)\u22ef\ud835\udf03(\ud835\udc56%#)\ud835\udf01(\ud835\udc56\"&)\u22ef\ud835\udf01(\ud835\udc56\u2019&)\ud835\udc5b\ud835\udc62\ud835\udc59\ud835\udc59\u22ef\ud835\udc5b\ud835\udc62\ud835\udc59\ud835\udc59\u03a5(\ud835\udc56*)\ud835\udc3c#\ud835\udc3c&\ud835\udc3c\u2205Ratings for \ud835\udc58selected itemsRatings for \ud835\udc59filler itemsUnrated items in the attack profileRatings for thetarget item\flow-rank matrix as in classical matrix completion [30] and collaborative \ufb01ltering [31]. The intuition\nis that the user\u2019 preferences may be in\ufb02uenced by a few factors.\nThe ground-truth matrix X may be corrupted by a system noisy matrix Z. For example, if Xij = 4.8\nfor i \u2208 [m], then, it is acceptable that user Ui gives item Ij score 5 or 4.6. In this paper, we consider\nthe independent Gaussian noise, i.e., Z = (Zij)m\u00d7n where each element Zij is drawn i.i.d. from the\nGaussian distribution N (0, \u03c3) with parameter \u03c3.\nLet M be the observed rating matrix. We de\ufb01ne the unorganized malicious attacks formally as\nfollows: for every j \u2208 [n], we have |U j| < \u03b3 with U j = {Ui|i \u2208 [m] & |Mij \u2212 Xij| \u2265 \u0001}. The\nparameter \u0001 distinguishes malicious users from the normal, and parameter \u03b3 limits the number of\nuser pro\ufb01les attacking one item. Intuitively, unorganized malicious attacks consider that attackers\nindividually use a small number of user pro\ufb01les to attack their own targets, and multiple independent\nshilling attacks can be regarded as an example of unorganized malicious attacks if each shilling attack\ncontains a small number of attack pro\ufb01les.\nIt is necessary to distinguish unorganized malicious attacks from noise. Generally speaking, user Ui\ngives item Ij a normal score if |Mij \u2212 Xij| is very small, while user Ui makes an attack to item Ij if\n|Mij \u2212 Xij| \u2265 \u0001. For example, if the ground-truth score of item Ij is 4.8 for user Ui, then user Ui\nmakes a noisy rating by giving Ij score 5, yet makes an attack by giving Ij score \u22123. Therefore, we\nassume that (cid:107)Z(cid:107)F \u2264 \u03b4, where \u03b4 is a small parameter.\nLet Y = M \u2212 X \u2212 Z = (Yij)m\u00d7n be the malicious-attack matrix. Then, Yij = 0 if user Ui does\nnot attack item Ij; otherwise |Yij| \u2265 \u0001. We assume that Y is a sparse matrix, whose intuition lies in\nthe small ratio of malicious ratings to all the ratings. Notice that we can not directly recover X and\nY from M because such recovery is an NP-Hard problem [9]. We consider the optimization problem\nas follows:\n\n(cid:107)X(cid:107)\u2217 + \u03c4(cid:107)Y (cid:107)1 \u2212 \u03b1(cid:104)M, Y (cid:105) +\nmin\nX,Y,Z\ns.t. X + Y + Z = M, (cid:107)Z(cid:107)F \u2264 \u03b4.\n\n(cid:107)Y (cid:107)2\n\nF\n\n\u03ba\n2\n\n(1)\n\nHere (cid:107)X(cid:107)\u2217 acts as a convex surrogate of the rank function to pursue the low-rank part. (cid:107)Y (cid:107)1 is\nused to induce the sparse attack part. The term (cid:104)M, Y (cid:105) is introduced to better distinguish Y and Z,\nsince the malicious rating bias Yij and the observed rating Mij have the same sign, i.e., MijYij > 0,\nwhile each entry in Z is small and ZijMij can be either positive or negative. We have Yij < 0 and\nMij < 0 if it is a nuke attack; we also have Yij > 0 and Mij > 0 if it is a push attack. So the term\n(cid:104)M, Y (cid:105) distinguishes Y from Z. (cid:107)Y (cid:107)2\nF is another strongly convex regularizer for Y . This term also\nguarantees the optimal solution. \u03c4, \u03b1 and \u03ba are tradeoff parameters.\nIn many real applications, we can not get the full matrix M, and partial entries can be observed. Let\n\u2126 \u2208 [m] \u00d7 [n] be the set of observed entries. We de\ufb01ne an orthogonal projection P\u2126 onto the linear\nspace of matrices supported on \u2126 \u2282 [m] \u00d7 [n], i.e.,\n\nfor (i, j) \u2208 \u2126,\notherwise.\n\n(cid:26) Mij\n\n0\n\nP\u2126M =\n\nThe optimization framework for unorganized malicious attack detection can be formulated as follows.\n\n(cid:107)X(cid:107)\u2217 + \u03c4(cid:107)Y (cid:107)1 \u2212 \u03b1(cid:104) \u00afM , Y (cid:105) +\n\nmin\nX,Y,Z\ns.t. X + Y + Z = \u00afM , Z \u2208 B, B := {Z|(cid:107)P\u2126(Z)(cid:107)F \u2264 \u03b4},\n\n\u03ba\n2\n\n(cid:107)Y (cid:107)2\n\nF\n\n(2)\n\nwhere \u03ba > 0 and \u00afM := P\u2126(M ). This formulation degenerates into robust PCA as \u03ba \u2192 0 and \u03b1 \u2192 0.\nThere have been many studies focusing on recovering low-rank part X from complete or incomplete\nmatrix [9, 11, 21, 27], while we distinguish the sparse attack term Y from the small perturbation term\nZ. (cid:104) \u00afM , Y (cid:105) is added to \ufb01nd nonzero entries of Y , and this yields better detection performance.\n\n4 The Proposed Approach\n\nIn this section, we propose an alternating splitting augmented Lagrangian method to solve the\noptimization problem (2), which can be guaranteed with global convergence.\n\n4\n\n\fAlgorithm 1 The UMA Algorithm\nInput: matrix M and parameters \u03c4, \u03b1, \u03b2, \u03b4 and \u03ba.\nOutput: Label vector [y1, . . . , ym] where yi = 1 if user Ui is a malicious user; otherwise yi = 0.\nInitialize: Y 0 = X 0 = \u039b0 = 0, yi = 0 (i = 1, . . . , m), k = 0\nProcess:\n1: while not converged do\n2:\n3:\n4:\n5: end while\n6: if max(|Yi,:|) > 0, then yi = 1 (i = 1, . . . , m).\n\nCompute Z k+1, X k+1 and Y k+1 by Eq. (4), (5) and (6), respectively.\nUpdate the Lagrange multiplier \u039bk+1 by \u039bk \u2212 \u03b2(X k+1 + Y k+1 + Z k+1 \u2212 \u00afM ).\nk = k + 1.\n\nThe separable structure emerging in the objective function and constrains in Eq. (2) motivates us to\nderive an ef\ufb01cient algorithm by splitting the optimization problem. However, it is rather dif\ufb01cult\nto optimize this problem with theoretical guarantee, because this optimization involves three-block\nvariables. It is well-known that the direct extension of the alternating direction method of multipliers\nmay not be convergent for solving Eq. (2), a three-block convex minimization problem [10, 15, 32].\nWe propose an alternating splitting augmented Lagrangian method to decompose the optimization of\nEq. (2) into three sub-optimizations for the solutions of Z k+1, X k+1 and Y k+1 separately. We will\nprovide global convergence guarantee with a worst-case O(1/t) convergence rate in Section 5.\nWe \ufb01rst get the augmented Lagrangian function of Eq. (2) as\nLA(X, Y, Z, \u039b, \u03b2) := (cid:107)X(cid:107)\u2217 + \u03c4(cid:107)Y (cid:107)1 \u2212 \u03b1(cid:104) \u00afM , Y (cid:105) +\n\nF \u2212 (cid:104)\u039b, L(cid:105) +\n\n(cid:107)Y (cid:107)2\n\n(3)\n\n(cid:107)L(cid:107)2\nF ,\n\n\u03b2\n2\n\n\u03ba\n2\n\nwhere L = X + Y + Z \u2212 \u00afM and \u03b2 is a positive constant.\nGiven (X k, Y k, \u039bk), we update Z k+1 with the closed-form solution\n\n(cid:26) min{1, \u03b4/(cid:107)P\u2126N(cid:107)F}Nij\n\nZ k+1\n\nij =\n\nNij\n\nif (i, j) \u2208 \u2126,\notherwise,\n\n(4)\n\nwhere N = 1\n\n\u03b2 \u039bk + \u00afM \u2212 X k \u2212 Y k. Lemma 2 gives the closed solution of X k+1 as\n\nX k+1 = D1/\u03b2( \u00afM +\n\n\u039bk \u2212 Y k \u2212 Z k+1),\n\n(5)\nwhere the nuclear-norm-involved shrinkage operator D1/\u03b2 is de\ufb01ned in Lemma 2. Further, we update\nY k+1 and Lemma 1 gives the closed solution Y k+1 as\n\nY k+1 = S\u03c4 \u03c5(\n\n\u03b1 + \u03b2\n\n\u00afM +\n\n\u039bk \u2212 Z k+1 \u2212 X k+1)\u03c5\u03b2,\n\n(6)\n\n\u03b2\n\nwhere \u03c5 = 1/(\u03b2 + \u03ba) and the shrinkage operator S\u03c4 \u03c5 is de\ufb01ned in Lemma 1. Finally, we update\n\n1\n\u03b2\n\n1\n\u03b2\n\n\u039bk+1 = \u039bk \u2212 \u03b2(X k+1 + Y k+1 + Z k+1 \u2212 \u00afM ).\n\nThe pseudocode of the UMA algorithm is given in Algorithm 1.\n\n5 Theoretical Analysis\n\nThis section presents our main theoretical results, whose detailed proofs and analysis are given in the\nsupplement document due to the page limitation. We begin with two helpful lemmas for the deviation\nof our proposed algorithm as follows.\nLemma 1 [6] For \u03c4 > 0 and T \u2208 Rm\u00d7n, the closed solution of minY \u03c4(cid:107)Y (cid:107)1 + (cid:107)Y \u2212 T(cid:107)2\nF /2 is\nmatrix S\u03c4 (T ) with (S\u03c4 (T ))ij = max{|Tij| \u2212 \u03c4, 0} \u00b7 sgn(Tij), where sgn(\u00b7) means the sign function.\nLemma 2 [8] For \u00b5 > 0 and Y \u2208 Rm\u00d7n with rank r, the closed solution of minX \u00b5(cid:107)X(cid:107)\u2217 + (cid:107)X \u2212\nF /2 is given by D\u00b5(Y ) = S diag(S\u00b5(\u03a3))D(cid:62), where Y = S\u03a3D(cid:62) denotes the singular value\nY (cid:107)2\ndecomposition of Y , and S\u00b5(\u03a3) is de\ufb01ned in Lemma 1.\n\n5\n\n\fLet X0 = S\u03a3D(cid:62) =(cid:80)r\n\nWe now present theoretical guarantee that UMA can recover the low-rank component X0 and the\nsparse component Y0. For simplicity, our theoretical analysis focuses on square matrix, and it is easy\nto generalize our results to the general rectangular matrices.\ni be the singular value decomposition of X0 \u2208 Rn\u00d7n, where r\nis the rank of matrix X0, and \u03c31, . . . , \u03c3r are the positive singular values, and S = [s1, . . . , sr] and\nD = [d1, . . . , dr] are the left- and right-singular matrices, respectively. For \u00b5 > 0, we assume\n\ni=1 \u03c3isid(cid:62)\n\n(cid:107)S(cid:62)ei(cid:107)2 \u2264 \u00b5r/n, max\n\ni\n\nmax\n\ni\n\n(cid:107)D(cid:62)ei(cid:107)2 \u2264 \u00b5r/n, (cid:107)SD(cid:62)(cid:107)2\u221e \u2264 \u00b5r/n2.\n\n(7)\n\nTheorem 1 Suppose that X0 satis\ufb01es the incoherence condition given by Eq. (7), and \u2126 is uniformly\ndistributed among all sets of size \u03c9 \u2265 n2/10. We assume that each entry is corrupted independently\n\u221a\nwith probability q. Let X and Y be the solution of optimization problem given by Eq. (2) with\nparameter \u03c4 = O(1/\nn) and \u03b1 = O(1/n). For some constant c > 0 and\nsuf\ufb01ciently large n, the following holds with probability at least 1 \u2212 cn\u221210,\n\n\u221a\nn) , \u03ba = O(1/\n\n(cid:107)X0 \u2212 X(cid:107)F \u2264 \u03b4 and (cid:107)Y0 \u2212 Y (cid:107)F \u2264 \u03b4\n\nif rank(X0) \u2264 \u03c1rn/\u00b5/log2n and q \u2264 qs, where \u03c1r and qs are positive constants.\nWe now prove the global convergence of UMA with a worst-case O(1/t) convergence rate measured\nby iteration complexity. Let U = (Z; X; Y ) and W = (Z; X; Y ; \u039b). We also de\ufb01ne\n\n(cid:88)t\n\n\u03b8(U ) = (cid:107)X(cid:107)\u2217 + \u03c4(cid:107)Y (cid:107)1 \u2212 \u03b1(cid:104)M, Y (cid:105) +\n\n\u03ba\n2\n\n(cid:107)Y (cid:107)2\n\nF and U k+1\n\nt\n\n=\n\n1\nt\n\nU k+1.\n\nk=1\n\nIt follows from Corollaries 28.2.2 and 28.3.1 of [29] that the solution set of Eq. (2) is non-empty.\nThen, let W \u2217 = ((Z\u2217)(cid:62), (X\u2217)(cid:62), (Y \u2217)(cid:62), (\u039b\u2217)(cid:62))(cid:62) be a saddle point of Eq. (2), and de\ufb01ne U\u2217 =\n((Z\u2217)(cid:62), (X\u2217)(cid:62), (Y \u2217)(cid:62))(cid:62).\n\nTheorem 2 For t iterations generated by UMA with \u03b2 \u2208(cid:0)0, (\n\n33 \u2212 5)\u03ba/2(cid:1),\n\n\u221a\n\n1) We have (cid:107)X k+1\n2) We have |\u03b8(U k+1\n\nt + Y k+1\nt\n\nt\n\n+ Z k+1\n\nt \u2212 P\u2126M(cid:107)2 \u2264 \u00afc1/t2 for some constant \u00afc1 > 0.\n\n) \u2212 \u03b8(U\u2217)| \u2264 \u00afc2/t for some constant \u00afc2 > 0.\n\n6 Experiments\n\nIn this section, we compare our proposed UMA with the state-of-the-art approaches for attack\ndetection. We consider three common evaluating metrics for attack detection as in [13]:\n\nPrecision =\n\nTP\n\nTP + FP\n\n, Recall =\n\nTP\n\nTP + FN\n\n, F1 =\n\n2 \u00d7 Precision \u00d7 Recall\nPrecision + Recall\n\nwhere TP is the number of attack pro\ufb01les correctly detected as attacks, FP is the number of normal\npro\ufb01les that are misclassi\ufb01ed, and FN is the number of attack pro\ufb01les that are misclassi\ufb01ed.\n\n6.1 Datasets\n\nWe \ufb01rst conduct our experiments on the common-used datasets MovieLens100K and MovieLens1M,\nreleased by GroupLens [25]. These datasets are collected from a non-commercial recommender\nsystem, and it is more likely that the users in this dataset are non-spam users. We take the users\nalready in the datasets as normal users. The rating scores range from 1 to 5, and we preprocess the\ndata by minus 3 to the range [\u22122, 2]. Dataset MovieLens100K contains 100000 ratings of 943 users\nover 1682 movies, and dataset MovieLens1M contains 1000209 ratings of 6040 users over 3706\nmovies. We describe how to add attack pro\ufb01les in Section 6.3.\nWe also collect a real dataset Douban10K1 with attack pro\ufb01les from Douban website, where\nregistered users record rating information over various \ufb01lms, books, clothes, etc. We gather 12095\nratings of 213 users over 155 items. The rating scores range from 1 to 5, and we preprocess the data\nby minus 3 to the range [\u22122, 2]. Among the 213 user pro\ufb01les, 35 pro\ufb01les are attack pro\ufb01les.\n\n1http://www.douban.com/.\n\n6\n\n\fTable 1: Detection precision, recall and F1 on MovieLens100K and MovieLens1M. Here unorga-\nnized malicious attacks are based on a combination of traditional strategies.\n\nUMA\nRPCA\nN-P\n\nk-means\nPCAVarSel\nMF-based\n\nPrecision\n0.934\u00b10.003\n0.908\u00b10.010\n0.774\u00b10.015\n0.723\u00b10.171\n0.774\u00b10.009\n0.911\u00b10.009\n\nMovieLens100K\n\nRecall\n\n0.883\u00b10.019\n0.422\u00b10.048\n0.641\u00b10.046\n0.224\u00b10.067\n0.587\u00b10.024\n0.814\u00b10.008\n\nF1\n\n0.908\u00b10.011\n0.575\u00b10.047\n0.701\u00b10.032\n0.341\u00b10.092\n0.668\u00b10.019\n0.860\u00b10.009\n\nPrecision\n0.739\u00b10.009\n0.342\u00b10.003\n0.711\u00b10.007\n0.000\u00b10.000\n0.278\u00b10.007\n0.407\u00b10.005\n\nMovieLens1M\n\nRecall\n\n0.785\u00b10.023\n0.558\u00b10.028\n0.478\u00b10.018\n0.000\u00b10.000\n0.622\u00b10.022\n0.365\u00b10.004\n\nF1\n\n0.761\u00b10.016\n0.424\u00b10.009\n0.572\u00b10.014\n0.000\u00b10.000\n0.384\u00b10.011\n0.385\u00b10.005\n\nTable 2: Detection precision, recall and F1 on MovieLens100K and MovieLens1M. Here unorga-\nnized malicious attacks consider the hire of existing users in addition to combination.\nMovieLens1M\n\nMovieLens100K\n\nPrecision\n0.929\u00b10.013\n0.797\u00b10.046\n0.244\u00b10.124\n0.767\u00b10.029\n0.481\u00b10.027\n0.556\u00b10.023\n\nRecall\n\n0.865\u00b10.032\n0.659\u00b10.097\n0.145\u00b10.089\n0.234\u00b10.042\n0.168\u00b10.017\n0.496\u00b10.021\n\nF1\n\n0.896\u00b10.022\n0.721\u00b10.097\n0.172\u00b10.084\n0.357\u00b10.051\n0.248\u00b10.023\n0.524\u00b10.022\n\nPrecision\n0.857\u00b10.005\n0.635\u00b10.012\n0.273\u00b10.020\n0.396\u00b10.026\n0.120\u00b10.006\n0.294\u00b10.012\n\nRecall\n\n0.733\u00b10.003\n0.391\u00b10.022\n0.099\u00b10.031\n0.300\u00b10.039\n0.225\u00b10.012\n0.264\u00b10.010\n\nF1\n\n0.790\u00b10.002\n0.484\u00b10.015\n0.144\u00b10.035\n0.341\u00b10.035\n0.157\u00b10.008\n0.278\u00b10.011\n\nUMA\nRPCA\nN-P\n\nk-means\nPCAVarSel\nMF-based\n\n6.2 Comparison Methods and Implementation Details\n\nWe compare UMA with the state-of-the-art approaches for attack detection and robust PCA:\n\n\u2022 N-P: A statistical algorithm based on the Neyman-Pearson statistics [16].\n\u2022 k-means: A cluster algorithm based on classi\ufb01cation attributes [3].\n\u2022 PCAVarSel: A PCA-based variable selection algorithm [22].\n\u2022 MF-based: A reputation estimation algorithm based on low-rank matrix factorization [19].\n\u2022 RPCA: A low-rank matrix recovery method by considering sparse noise [9].\n\nm, \u03b1 = 10/m and \u03b4 =(cid:112)mn/200. A rating can be viewed\n\nIn the experiments, we set \u03c4 = 10/\nas a malicious rating if it deviates from the ground-truth rating by more than 3, since the scale of\nratings is from -2 to 2. We set parameter \u03b2 = \u03c4 /3 according to Eq. (6) where the entries of Y will\nbe nulli\ufb01ed if they are smaller than the threshold. We set \u03ba = \u03c4 under the convergence condition\n\u03b2 \u2208 (0, (\n33 \u2212 5)\u03ba/2) as in Theorem 2. For the baseline methods, we take the results reported\nin [26] for comparison.\n\n\u221a\n\n\u221a\n\n6.3 Comparison Results\n\nIn the \ufb01rst experiment, we add attack pro\ufb01les into the datasets MovieLens100K and MovieLens1M\nby a combination of several traditional attack strategies. These traditional attack strategies include\naverage attack strategy, random attack strategy and bandwagon attack strategy, discussed in Sec-\ntion 3.2. Speci\ufb01cally, each attacker randomly chooses one strategy to produce the user rating pro\ufb01les\nand promotes one item randomly selected from items with average rating lower than 0. In line with\nthe setting of previous attack detection works, we set the \ufb01ller ratio (percentage of rated items in\ntotal items) as 0.01 and the \ufb01ller items are drawn from the top 10% most popular items. We set\nthe spam ratio (number of attack pro\ufb01les/number of all user pro\ufb01les) as 0.2. The experiment is\nrepeated 10 times, and the average performance is reported. Table 1 shows the experimental results\non datasets MovieLens100K and MovieLens1M under the attack pro\ufb01les of a combination of\ntraditional strategies.\nThe second experiment studies a more general case of unorganized malicious attacks. We consider\nthat attackers can hire existing users to attack their targets, in addition to the pro\ufb01le injection attacks\nas mentioned above. We set spam ratio as 0.2, where 25% of the attack pro\ufb01les are produced similar\nto the \ufb01rst experiment, and 75% of the attack pro\ufb01les are from existing users by randomly changing\nthe rating of one item lower than 0 to +2. In this case, attacks are more dif\ufb01cult to be detected,\nbecause the attack pro\ufb01les are more similar to normal user pro\ufb01les. The experiment is repeated 10\ntimes and Table 2 demonstrates the comparison results on MovieLens100K and MovieLens1M.\n\n7\n\n\fTable 3: Detection precision, recall and F1 on dataset Douban10K.\n\nMethods UMA RPCA\n0.535\nPrecision\n0.472\nRecall\n0.502\n\n0.800\n0.914\n0.853\n\nF1\n\nN-P\n0.250\n0.200\n0.222\n\nk-means\n0.321\n0.514\n0.396\n\nPCAVarSel MF-based\n\n0.240\n0.343\n0.282\n\n0.767\n0.657\n0.708\n\nTable 3 shows the experiments on dataset Douban10K. The experimental results in Table 1, 2 and 3\nshow that our proposed algorithm UMA achieves the best performance on all the datasets and three\nmeasures: Precision, Recall and F1.\nTraditional attack detection approaches perform ineffectively on unorganized malicious attacks\ndetection, because the success of those methods depends on the properties of shilling attacks, e.g., k-\nmeans method and N-P method work well if the attack pro\ufb01les are similar in the view of classi\ufb01cation\nattributes or latent categories, and PCAVarSel method achieves good performance only if attack\npro\ufb01les have more common unrated items than normal pro\ufb01les. In summary, these methods detect\nattacks by identifying some common characteristics of attack pro\ufb01les, while these do not hold in\nunorganized malicious attacks. The RPCA and MF-based methods try to \ufb01nd the ground-truth rating\nmatrix from the observed rating matrix, whereas they hardly separate the sparse attack matrix from\nthe noisy matrix and tend to suffer from low precision, especially on large-scale and heavily sparse\ndataset MovieLens1M.\nWe compare UMA with other approaches by varying the spam ratio from 2% to 20% since different\nsystems may contain different spam ratios (# attack pro\ufb01les/# all user pro\ufb01les). As can be shown\nin Figure 2, UMA is robust and achieves the best performance in different spam ratios, whereas the\ncomparison methods (except the RPCA method) achieve worse performance for small spam ratio,\ne.g., the N-P approach detects almost nothing. Although the RPCA method is as stable as UMA in\ndifferent spam ratios, there is a performance gap between RPCA and UMA which becomes bigger\nwhen the dataset gets larger and sparser from MovieLens100K to MovieLens1M.\n\nFigure 2: Detection precision and recall on MovieLens100K under unorganized malicious attacks.\nThe spam ratio (# attack pro\ufb01les/# all user pro\ufb01les) varies from 0.02 to 0.2.\n\n7 Conclusion\n\nAttack detection plays an important role to improve the quality of recommendation. Most previous\nmethods focus on shilling attacks, and the key idea for detecting such attacks is to \ufb01nd the common\ncharacteristics of attack pro\ufb01les with the same attack strategy. This paper considers the unorganized\nmalicious attacks, produced by multiple attack strategies to attack different targets. We formulate\nunorganized malicious attacks detection as a variant of matrix completion problem, and we propose\nthe UMA algorithm and prove its recovery guarantee and global convergence. Experiments show that\nUMA achieves signi\ufb01cantly better performance than the state-of-the-art methods for attack detection.\nAcknowledgments This research was supported by the National Key R&D Program of China\n(2018YFB1004300), NSFC (61333014, 61503179), JiangsuSF (BK20150586), and Collaborative\nInnovation Center of Novel Software Technology and Industrialization, and Fundamental Research\nFunds for the Central Universities.\n\n8\n\n\fReferences\n[1] C. Aggarwal. Recommender Systems. Springer, 2016.\n\n[2] R. Bhaumik, C. Williams, B. Mobasher, and R. Burke. Securing collaborative \ufb01ltering against\nmalicious attacks through anomaly detection. In Proceedings of the 4th Workshop on Intelligent\nTechniques for Web Personalization, 2006.\n\n[3] R. Bhaumik, B. Mobasher, and R. D. Burke. A clustering approach to unsupervised attack detec-\ntion in collaborative recommender systems. In Proceedings of the 7th International Conference\non Data Mining, pages 181\u2013187, 2011.\n\n[4] T. Bouwmans, A. Sobral, S. Javed, S. K. Jung, and E.-H. Zahzah. Decomposition into low-\nrank plus additive matrices for background/foreground separation: A review for a comparative\nevaluation with a large-scale dataset. Computer Science Review, 23:1\u201371, 2017.\n\n[5] G. Bresler, G. Chen, and D. Shah. A latent source model for online collaborative \ufb01ltering. In\nProceedings of the 28th Advances in Neural Information Processing Systems, pages 3347\u20133355,\n2014.\n\n[6] A. M. Bruckstein, D. L. Donoho, and M. Elad. From sparse solutions of systems of equations to\n\nsparse modeling of signals and images. SIAM Review, 51(1):34\u201381, 2009.\n\n[7] K. Bryan, M. O\u2019Mahony, and P. Cunningham. Unsupervised retrieval of attack pro\ufb01les in\ncollaborative recommender systems. In Proceedings of the 2nd ACM Conference on Recommender\nSystems, pages 155\u2013162, 2008.\n\n[8] J.-F. Cai, E. J. Cand\u00e8s, and Z.-W. Shen. A singular value thresholding algorithm for matrix\n\ncompletion. SIAM Journal on Optimization, 20(4):1956\u20131982, 2010.\n\n[9] E. J. Cand\u00e8s, X. D. Li, Y. Ma, and J. Wright. Robust principal component analysis? Journal of\n\nthe ACM, 58(3):1\u201337, 2011.\n\n[10] C.-H. Chen, B.-S. He, Y.-Y. Ye, and X.-M. Yuan. The direct extension of ADMM for multi-\nblock convex minimization problems is not necessarily convergent. Mathematical Programming,\n155(1-2):57\u201379, 2016.\n\n[11] J.-S. Feng, H. Xu, and S.-C. Yan. Online robust PCA via stochastic optimization. In Proceedings\n\nof the 27th Advances in Neural Information Processing Systems, pages 404\u2013412, 2013.\n\n[12] D. Goldberg, D. Nichols, B. M. Oki, and D. Terry. Using collaborative \ufb01ltering to weave an\n\ninformation tapestry. Communications of the ACM, 35(12):61\u201370, 1992.\n\n[13] I. Gunes, C. Kaleli, A. Bilge, and H. Polat. Shilling attacks against recommender systems: a\n\ncomprehensive survey. Arti\ufb01cial Intelligence Review, 42(4):767\u2013799, 2014.\n\n[14] F.-M. He, X.-R. Wang, and B.-X. Liu. Attack detection by rough set theory in recommendation\nIn Proceedings of the 6th International Conference on Granular Computing, pages\n\nsystem.\n692\u2013695, 2010.\n\n[15] B.-S. He, M. Tao, and X.-M. Yuan. A splitting method for separable convex programming. IMA\n\nJournal of Numerical Analysis, 35(1):394\u2013426, 2015.\n\n[16] N. J. Hurley, Z. P. Cheng, and M. Zhang. Statistical attack detection. In Proceedings of the 3rd\n\nACM Conference on Recommender Systems, pages 149\u2013156, 2009.\n\n[17] C. Li and Z.-G. Luo. Detection of shilling attacks in collaborative \ufb01ltering recommender\nsystems. In Proceedings of the 2nd International Conference of Soft Computing and Pattern\nRecognition, pages 190\u2013193, 2011.\n\n[18] B. Li, Q. Yang, and X.-Y. Xue. Transfer learning for collaborative \ufb01ltering via a rating-matrix\ngenerative model. In Proceedings of the 26th International Conference on Machine Learning,\npages 617\u2013624, 2009.\n\n9\n\n\f[19] G. Ling, I. King, and M. R. Lyu. A uni\ufb01ed framework for reputation estimation in online rating\nsystems. In Proceedings of the 23rd International Joint Conference on Arti\ufb01cial Intelligence,\npages 2670\u20132676, 2013.\n\n[20] M. Luca. Reviews, reputation, and revenue: The case of Yelp.com. 2016.\n\n[21] L. W. Mackey, M. I. Jordan, and A. Talwalkar. Divide-and-conquer matrix factorization. In\nProceedings of the 25th Advances in Neural Information Processing Systems, pages 1134\u20131142,\n2011.\n\n[22] B. Mehta and W. Nejdl. Unsupervised strategies for shilling detection and robust collaborative\n\n\ufb01ltering. User Modeling and User-Adapted Interaction, 19(1-2):65\u201397, 2009.\n\n[23] B. Mehta. Unsupervised shilling detection for collaborative \ufb01ltering. In Proceedings of the\n\n22nd International Conference on Arti\ufb01cial Intelligence, pages 1402\u20131407, 2007.\n\n[24] B. Mobasher, R. Burke, R. Bhaumik, and J. J. Sandvig. Attacks and remedies in collaborative\n\nrecommendation. Intelligent Systems, 22(3):56\u201363, 2009.\n\n[25] Research Grouplens. http://grouplens.org/datasets/movielens/.\n\n[26] M. Pang and Z.-H. Zhou. Unorganized malicious attacks detection (in Chinese). SCIENTIA\n\nSINICA Informationis, 48(2):177\u2013186, 2018.\n\n[27] Y.-G. Peng, A. Ganesh, J. Wright, W.-L. Xu, and Y. Ma. Rasl: Robust alignment by sparse and\nlow-rank decomposition for linearly correlated images. Pattern Analysis and Machine Intelligence,\n34(11):2233\u20132246, 2012.\n\n[28] N. Rao, H.-F. Yu, P. Ravikumar, and I. Dhillon. Collaborative \ufb01ltering with graph information:\nConsistency and scalable methods. In Proceedings of the 29th Advances in Neural Information\nProcessing Systems, pages 2098\u20132106, 2015.\n\n[29] R. T. Rockafellar. Convex Analysis. Princeton University Press, 2015.\n\n[30] R. Salakhutdinov and A. Mnih. Probabilistic matrix factorization. In Proceedings of the 22nd\n\nAdvances in Neural Information Processing Systems, pages 1257\u20131264, 2008.\n\n[31] R. Salakhutdinov, A. Mnih, and G. Hinton. Restricted boltzmann machines for collaborative\nIn Proceedings of the 24th International Conference on Machine Learning, pages\n\n\ufb01ltering.\n791\u2013798, 2007.\n\n[32] M. Tao and X.-M. Yuan. Recovering low-rank and sparse components of matrices from\n\nincomplete and noisy observations. SIAM Journal on Optimization, 21(1):57\u201381, 2011.\n\n[33] X.-Y. Yi, D. Park, Y.-D. Chen, and C. Caramanis. Fast algorithms for robust PCA via gradient\ndescent. In Proceedings of the 30th Advances in neural information processing systems, pages\n4152\u20134160, 2016.\n\n10\n\n\f", "award": [], "sourceid": 3468, "authors": [{"given_name": "Ming", "family_name": "Pang", "institution": "Nanjing University"}, {"given_name": "Wei", "family_name": "Gao", "institution": "Nanjing University"}, {"given_name": "Min", "family_name": "Tao", "institution": "Nanjing University"}, {"given_name": "Zhi-Hua", "family_name": "Zhou", "institution": "Nanjing University"}]}