{"title": "Semiparametric Principal Component Analysis", "book": "Advances in Neural Information Processing Systems", "page_first": 171, "page_last": 179, "abstract": null, "full_text": "Semiparametric Principal Component Analysis\n\nFang Han\n\nDepartment of Biostatistics\nJohns Hopkins University\n\nBaltimore, MD 21210\nfhan@jhsph.edu\n\nHan Liu\n\nDepartment of Operations Research\n\nand Financial Engineering\n\nPrinceton University, NJ 08544\nhanliu@princeton.edu\n\nAbstract\n\nWe propose two new principal component analysis methods in this paper utilizing\na semiparametric model. The according methods are named Copula Component\nAnalysis (COCA) and Copula PCA. The semiparametric model assumes that, af-\nter unspeci\ufb01ed marginally monotone transformations, the distributions are multi-\nvariate Gaussian. The COCA and Copula PCA accordingly estimate the leading\neigenvectors of the correlation and covariance matrices of the latent Gaussian dis-\ntribution. The robust nonparametric rank-based correlation coef\ufb01cient estimator,\nSpearman\u2019s rho, is exploited in estimation. We prove that, under suitable condi-\ntions, although the marginal distributions can be arbitrarily continuous, the COCA\nand Copula PCA estimators obtain fast estimation rates and are feature selection\nconsistent in the setting where the dimension is nearly exponentially large relative\nto the sample size. Careful numerical experiments on the synthetic and real data\nare conducted to back up the theoretical results. We also discuss the relationship\nwith the transelliptical component analysis proposed by Han and Liu (2012).\n\nj=1 \u03c9jujuT\n\nthe sample covariance S. By spectral decomposition, \u03a3 = Pd\n\n1 Introduction\nThe Principal Component Analysis (PCA) is introduced as follows. Given a random vector X \u2208 Rd\nwith covariance matrix \u03a3 and n independent observations of X, the PCA reduces the dimension of\nthe data by projecting the data onto a linear subspace spanned by the k leading eigenvectors of \u03a3,\nsuch that the principal modes of variations are preserved. In practice, \u03a3 is unknown and replaced by\nj with eigenvalues \u03c91 \u2265\n. . . \u2265 \u03c9d and the corresponding orthornormal eigenvectors u1, . . . , ud. PCA aims at recovering the\n\ufb01rst k eigenvectors u1, . . . , uk.\nAlthough the PCA method as a procedure is model free, its theoretical and empirical performances\nrely on the distributions. With regard to the empirical concern, the PCA\u2019s geometric intuition is\ncoming from the major axes of the contours of constant probability of the Gaussian [10]. [5] show\nthat if X is multivariate Gaussian, then the distribution is centered about the principal component\naxes and is therefore \u201cself-consistent\u201d [8]. We refer to [10] for more good properties that the PCA\nenjoys under the Gaussian model, which we wish to preserve while designing its generalization.\nWith regard to the theoretical concern, \ufb01rstly, the PCA generally fails to be consistent in high di-\n\nmensional setting. Given bu1 the dominant eigenvector of S, [9] show that the angle between bu1\nlim inf n\u2192\u221e E\u2220(bu1, u1) > 0, where we denote by \u2220(bu1, u1)\n0} and card(supp(u1)) = s < n. The resulting estimatoreu1 is:\n\nand u1 will not converge to 0, i.e.\nthe angle between the estimated and the true leading eigenvectors. This key observation motivates\nregularizing \u03a3, resulting in a series of methods with different formulations and algorithms. The sta-\ntistical model is generally further speci\ufb01ed such that u1 is sparse, namely supp(u1) := {j : u1j 6=\n\neu1 = arg max\n\nv\u2208Rd\n\nvT Sv subject to kvk2 = 1, card(supp(v)) \u2264 s.\n\n(1.1)\n\nTo solve Equation (1.1), a variety of algorithms are proposed: greedy algorithms [3], lasso-type\nmethods including SCoTLASS [11], SPCA [25] and sPCA-rSVD [19], a number of power methods\n[12, 23, 16], the biconvex algorithm PMD [21] and the semide\ufb01nite relaxation DSPCA [4]. Sec-\nondly, it is realized that the distribution where the data are drawn from needs to be speci\ufb01ed, such\n\n1\n\n\f1 (X1), . . . , f 0\n\nj=1 such that (f 0\n\nthat the estimatoreu1 converges to \u00afu1 in a fast rate. [9, 1, 16, 18, 20] all establish their results under\n\nj=1 as [15] did, realizing that {f 0\nj }d\n\nthe normal score transformation functions {bf 0\nj }d\n\na strong Gaussian or sub-Gaussian assumption in order to obtain a fast rate under certain conditions.\nIn this paper, we \ufb01rst explore the use of the PCA conducted on the correlation matrix \u03a30 instead of\nthe covariance matrix \u03a3, and then propose a high dimensional semiparametric scale-invariant prin-\ncipal component analysis method, named the Copula Component Analysis (COCA). In this paper,\nthe population version of the scale-invariant PCA is built as the estimator of the leading eigenvector\nof the population correlation matrix \u03a30. Secondly, to handle the non-Gaussian data, we general-\nize the distribution family from the Gaussian to the larger Nonparanormal family [15]. A random\nvariable X = (X1, . . . , Xd)T belongs to a Nonparanormal family if and only if there exists a set\nof univariate monotone functions {f 0\nj }d\nd (Xd))T is multivariate Gaus-\nsian. The Nonparanormal can have arbitrary continuous marginal distributions and can be far away\nfrom the sub-Gaussian family. Thirdly, to estimate \u03a30 robustly and ef\ufb01ciently, instead of estimating\nj=1 preserve the\nranks of the data, we utilize the nonparametric correlation coef\ufb01cient estimator, Spearman\u2019s rho, to\nestimate \u03a30. [14, 22] prove that the corresponding estimators converge to \u03a30 in a parametric rate.\nIn theory, we analyze the general case that X is following the Nonparanormal and \u03b81 is weakly\nsparse, here \u03b81 is the leading eigenvector of \u03a30. We obtain the estimation consistency of the COCA\nestimator to \u03b81 using the Spearman\u2019s rho correlation coef\ufb01cient matrix. We prove that the estimation\nconsistency rates are close to the parametric rate under Gaussian assumption and the feature selec-\ntion consistency can be achieved when d is nearly exponential to the sample size. In this paper, we\nalso propose a scale variant PCA procedure, named the Copula PCA. The Copula PCA estimates\nthe leading eigenvector of the latent covariance matrix \u03a3. To estimate the leading eigenvectors of\n\u03a3, instead of \u03a30, in a fast rate, we prove that extra conditions are required on the transformation\nfunctions.\n2 Background\nWe start with notations: Let M = [Mjk] \u2208 Rd\u00d7d and v = (v1, ..., vd)T \u2208 Rd. Let v\u2019s subvector\nwith entries indexed by I be denoted by vI, M\u2019s submatrix with rows indexed by I and columns\nindexed by J be denoted by MIJ. Let MI\u00b7 and M\u00b7J be the submatrix of M with rows in I and\nall columns, and the submatrix of M with columns in J and all rows. For 0 < q \u2264 \u221e, we\nPn\ni=1 |vi|q)1/q and kvk\u221e := max1\u2264i\u2264d |vi|, and\nkvk0 := card(supp(v))\u00b7kvk2. We de\ufb01ne the matrix \u2018max norm as the elementwise maximum value:\nkMkmax := max{|Mij|} and the \u2018\u221e norm as kMk\u221e := max1\u2264i\u2264m\nj=1 |Mij|. Let \u039bj(M) be\nthe toppest j\u2212th eigenvalue of M. In special, \u039bmin(M) := \u039bd(M) and \u039bmax(M) := \u039b1(M) are\nthe smallest and largest eigenvalues of M. The vectorized matrix of M, denoted by vec(M), is\nde\ufb01ned as: vec(M) := (M T\u00b71 , . . . , M T\u00b7d)T . Let Sd\u22121 := {v \u2208 Rd : kvk2 = 1} be the d-dimensional\n\u20182 sphere. For any two vectors a, b \u2208 Rd and any two squared matrices A, B \u2208 Rd\u00d7d, denote the\ninner product of a and b, A and B by ha, bi := aT b and hA, Bi := Tr(AT B).\n2.1 The Models of the PCA and Scale-invariant PCA\n\nde\ufb01ne the \u2018q and \u2018\u221e vector norm as kvkq := (Pd\n\nj=1 \u03bbj\u03b8j\u03b8T\n\nj=1 \u03c9jujuT\n\nLet \u03a30 be the correlation matrix of \u03a3, and by spectral decomposition, \u03a3 =Pd\nPd\nproposition claims that the estimators {bu1, . . . ,bud} and {b\u03b81, . . . ,b\u03b8d}, the eigenvectors of the sample\nmators of PCA, {bu1, . . . ,bud}, and the estimators of the scale-invariant PCA, {b\u03b81, . . . ,b\u03b8d}, are the\n\nj and \u03a30 =\nj . Here \u03c91 \u2265 \u03c92 \u2265 . . . \u2265 \u03c9d > 0 and \u03bb1 \u2265 \u03bb2 \u2265 . . . \u2265 \u03bbd > 0 are the eigenvalues\nof \u03a3 and \u03a30, with u1, . . . , ud and \u03b81, . . . , \u03b8d the corresponding orthonormal eigenvectors. The next\ncovariance and correlation matrices S and S0, are the MLEs of {u1, . . . , ud} and {\u03b81, . . . , \u03b8d}:\nProposition 2.1. Let x1 . . . xn \u223c N(\u00b5, \u03a3) and \u03a30 be the correlation matrix of \u03a3. Then the esti-\nMLEs of {u1, . . . , ud} and {\u03b81, . . . , \u03b8d}.\nProof. Use Theorem 11.3.1 in [2] and the functional invariance property of the MLE.\nProposition 2.2. For any 1 \u2264 i \u2264 d, we have supp(ui) = supp(\u03b8i) and sign(uij) =\nsign(\u03b8ij), \u2200 1 \u2264 j \u2264 d.\nProof. For 1 \u2264 i \u2264 d, ui = (\u03b8i1/\u03c31, \u03b8i2/\u03c32, . . . , \u03b8id/\u03c3d), where (\u03c32\nIt is easy to observe that the scale-invariant PCA is a safe procedure for dimension reduction when\nvariables are measured in different scales. Although there seems no theoretical advantage of scale-\ninvariant PCA over the PCA under the Gaussian model, in this paper we will show that under a more\ngeneral Nonparanormal (or Gaussian Copula) model, the scale-invariant PCA will pose much less\nconditions to make the estimator achieve good theoretical performance.\n\nd)T := diag(\u03a3).\n\n1, . . . , \u03c32\n\n2\n\n\fj=1 such that: f(X) = (f1(X1), ..., fd(Xd))T \u223c N(\u00b5, \u03a3), and \u03c32\n\n2.2 The Nonparanormal\nWe \ufb01rst introduce two de\ufb01nitions of the Nonparanormal separately de\ufb01ned in [15] and [14].\nDe\ufb01nition 2.1 [15]. A random variable X = (X1, ..., Xd)T with population marginal means and\nstandard deviations \u00b5 = (\u00b51, . . . , \u00b5d)T and \u03c3 = (\u03c31. . . . , \u03c3d)T is said to follow a Nonparanormal\ndistribution N P Nd(\u00b5, \u03a3, f) if and only if there exists a set of univariate monotone transformations\nf = {fj}d\nj = 1, . . . , d.\nDe\ufb01nition 2.2 [14]. Let f 0 = {f 0\nj }d\nj=1 be a set of monotone univariate functions and \u03a30 \u2208 Rd\u00d7d\nbe a positive de\ufb01nite correlation matrix with diag(\u03a30) = 1. We say that a d dimensional random\nvariable X = (X1, . . . , Xd)T follows a Nonparanormal distribution, i.e. X \u223c N P Nd(\u03a30, f 0), if\nf 0(X) := (f 0\nThe following lemma proves that two de\ufb01nitions of the Nonparanormal are equivalent.\nLemma 2.1. A random variable X \u223c N P Nd(\u03a30, f 0) if and only if there exist \u00b5 = (\u00b51, . . . , \u00b5d)T ,\n\u03a3 = [\u03a3jk] \u2208 Rd\u00d7d such that for any 1 \u2264 j, k \u2264 d, E(Xj) = \u00b5j, Var(Xj) = \u03a3jj and \u03a30\njk =\n\u03a3jk\u221a\nj=1 such that X \u223c N P Nd(\u00b5, \u03a3, f).\n\u03a3jj\u00b7\u03a3kk\nProof. Using the connection that fj(x) = \u00b5j + \u03c3jf 0\nLemma 2.1 guarantees that the Nonparanormal is de\ufb01ned properly. De\ufb01nition 2.2 is more appealing\nbecause it emphasizes the correlation and hence matches the spirit of the Copula. However, De\ufb01ni-\ntion 2.1 enjoys notational simplicity in analyzing the Copula-based LDA and PCA approaches.\n\n, and a set of monotone univariate functions f = {fj}d\n\nj (x), for j \u2208 {1, 2 . . . , d}.\n\nd (Xd))T \u223c N(0, \u03a30).\n\n1 (X1), . . . , f 0\n\nj = \u03a3jj,\n\nn\n\n:=\n\ni=1 xij\n\nq 1\n\nand b\u03c3j =\n\n2.3 Spearman\u2019s rho Correlation and Covariance Matrices\n\nGiven n data points x1, . . . , xn \u2208 Rd, where xi = (xi1, . . . , xid)T , we denote by b\u00b5j\nPn\n\nPn\ni=1(xij \u2212b\u00b5j)2, the marginal sample means and standard devi-\n\n1\nn\nations. Because the Nonparanormal distribution preserves the rank of the data, it is natural to use the\nnonparametric rank-based correlation coef\ufb01cient estimator, Spearman\u2019s rho, to estimate the latent\ni=1 rij = n+1\ncorrelation. In detail, let rij be the rank of xij among x1j, . . . , xnj and \u00afrj := 1\n2 ,\nn\ni=1(rik\u2212\u00afrk)2 , and the correlation ma-\n\nwe consider the following statistics: b\u03c1jk =\ntrix estimator: bRjk = 2 sin( \u03c0\n6b\u03c1jk). The Lemma 2.2, coming from [14], claims that the estimation\n \nkbR \u2212 \u03a30kmax \u2264 8\u03c0\n\nPn\n\u221aPn\ni=1(rij\u2212\u00afrj )2\u00b7Pn\ni=1(rij\u2212\u00afrj )(rik\u2212\u00afrk)\n!\nrlog d\n\ncan reach the parametric rate.\nLemma 2.2 ([14]). When x1, . . . , xn \u223ci.i.d N P Nd(\u03a30, f 0), for any n \u2265 21\n\nWe denote by bR := [bRjk] the Spearman\u2019s rho correlation coef\ufb01cient matrix. In the following let\nbS := [bSjk] = [b\u03c3jb\u03c3kbRjk] be the Spearman\u2019s rho covariance matrix.\n\n\u2265 1 \u2212 2/d2.\n\nPn\n\nlog d + 2,\n\n(2.1)\n\nP\n\nn\n\n(cid:17)\n\n1 (x) = f 0\n\n(cid:16) 1\n\n1 (x) = x3 and f 0\n\n3 Methods\nIn Figure 1, we randomly generate 10,000 samples from three different types of Nonparanormal\ndistributions. We suppose that X \u223c N P N2(\u03a30, f 0). Here we set \u03a30 =\nand trans-\n0.5\n1 (x) = sign(x)x2 and\nformation functions as follows: (A) f 0\n2 (x) = \u03a6\u22121(x). It can be observed that there does not exist a nice\n2 (x) = x3; (C) f 0\nf 0\ngeometric explanation now. For example, researchers might wish to conduct PCA separately on\ndifferent clusters in (A) and (B). For (C), the data look very noisy and a nice major axis might be\nconsidered not existing.\nHowever, under the Nonparanormal model and realizing that there is a latent Gaussian distribution\nbehind, the geometric intuition of the PCA naturally comes back. In the next section, we will present\nthe model of the COCA and Copula PCA motivated from this observation.\n3.1 COCA Model\nWe \ufb01rstly present the model of the Copula Component Analysis (COCA) method, where the idea\nof scale-invariant PCA is exploited and we wish to estimate the leading eigenvector of the latent\ncorrelation matrix. In particular, the following model M0(q, Rq, \u03a30, f 0) is considered:\n\n2 (x) = x1/3; (B) f 0\n\n0.5\n1\n\nM0(q, Rq, \u03a30, f 0) :\n\nx1, . . . , xn \u223ci.i.d N P Nd(\u03a30, f 0),\n\u03b81 \u2208 Sd\u22121 \u2229 Bq(Rq),\n\n(3.1)\n\n(\n\n3\n\n\fA\n\nB\n\nC\n\nFigure 1: Scatter plots of three Nonparanormals, X \u223c N P N2(\u03a30, f 0). Here \u03a30\nthe transformation functions have the form as follows: (A) f 0\n1 (x) = sign(x)x2 and f 0\nf 0\n\n2 (x) = \u03a6\u22121(x).\n\n1 (x) = x3 and f 0\n\n2 (x) = x3; (C) f 0\n\n1 (x) = f 0\n\n12 = 0.5 and\n2 (x) = x1/3; (B)\n\nwhere \u03b81 is the leading eigenvectors of the latent correlation matrix \u03a30 we are interested in estimat-\ning, 0 \u2264 q \u2264 1 and the \u2018q ball Bq(Rq) is de\ufb01ned as:\n\nB0(R0) := {v \u2208 Rd : card(supp(v)) \u2264 R0};\n\nwhen q = 0,\nwhen 0 < q \u2264 1,\n\nv\u2208Rd\n\nBq(Rq) := {v \u2208 Rd : kvkq\n\nInspired by the model M0(q, Rq, \u03a30, f 0), we consider the following COCA estimator e\u03b81, which\nmaximizes the following equation with the constraint thate\u03b81 \u2208 Bq(Rq) for some 0 \u2264 q \u2264 1:\nHere bR is the estimated Spearman\u2019s rho correlation coef\ufb01cient matrix. The corresponding COCA\nestimatore\u03b81 can be considered as a nonlinear dimensional reduction procedure and has the potential\n\nq \u2264 Rq}.\nvTbRv, subject to v \u2208 Sd\u22121 \u2229 Bq(Rq).\n\ne\u03b81 = arg max\n\nto gain more \ufb02exibility compared with the classical PCA. In Section 4 we will establish the theo-\nretical results on the COCA estimator and will show that it can estimate the latent true dominant\neigenvector \u03b81 in a fast rate and can achieve feature selection consistency.\n3.1.1 Copula PCA Model\nIn contrast, we provide another model inspired from the classical PCA method, where we wish to\nestimate the leading eigenvector of the latent covariance matrix. In particular, the following model\nM(q, Rq, \u03a3, f) is considered:\n\n(3.4)\n\n(3.2)\n(3.3)\n\n(3.5)\n\n(3.6)\n\n(\n\nM(q, Rq, \u03a3, f) :\n\nx1, . . . , xn \u223ci.i.d N P Nd(0, \u03a3, f),\nu1 \u2208 Sd\u22121 \u2229 Bq(Rq),\n\nwhere u1 is the leading eigenvector of the covariance matrix \u03a3 and it is what we are interested in\nestimating. The corresponding Copula PCA estimator is:\n\neu1 = arg max\n\nvTbSv, subject to v \u2208 Sd\u22121 \u2229 Bq(Rq),\n\nv\u2208Rd\n\nPCA. In Section 4, we will show that the Copula PCA requires a much stronger condition than\n\n3.2 Algorithms\nIn this section we provide three sparse PCA algorithms, where the Spearman\u2019s rho correlation and\n\nwhere bS is the Spearman\u2019s rho covariance coef\ufb01cient matrix. This procedure is named the Copula\nCOCA to makeeu1 converge to u1 in a fast rate.\ncovariance matrices bR and bS can be directly plugged in to obtain sparse estimators.\nconvex optimization algorithm to the following problem: arg maxu,v uTb\u0393v,\nPenalized Matrix Decomposition (PMD) is proposed by [21]. The main idea of the PMD is a bi-\n2 \u2264\nthe following: (1) Input: A symmetric matrixb\u0393. Initialize v \u2208 Sd\u22121; (2) Iterate until convergence:\n2 \u2264 1, kuk1 \u2264 \u03b4, kvk1 \u2264 \u03b4. The COCA with PMD and Copula PCA with PMD are listed in\n1,kvk2\n(a) u \u2190 arg maxu\u2208Rd uTb\u0393v subject to kuk1 \u2264 \u03b4 and kuk2\n2 \u2264 1.(b) v \u2190 arg maxv\u2208Rd uTb\u0393v subject\n2 \u2264 1; (3) Output: v. Hereb\u0393 is either bR or bS, corresponding to the COCA with\n\nto kvk1 \u2264 \u03b4 and kvk2\nPMD and Copula PCA with PMD. \u03b4 is the tuning parameter. [21] suggest using the \ufb01rst leading\n\nsubject to kuk2\n\n4\n\n\u22121.5\u22121.0\u22120.50.00.51.01.5\u221240\u22122002040\u22122\u22121012\u22121.5\u22121.0\u22120.50.00.51.01.50.00.20.40.60.81.00.00.20.40.60.81.0\feigenvector ofb\u0393 to be the initial value of v. The PMD can be considered as a solver to Equation\n\n(3.4) and Equation (3.6) with q = 1.\nThe SPCA algorithm is proposed by [25]. The main idea of the SPCA algorithm is to exploit a\nregression approach to PCA and then utilize lasso and elastic net [24] to calculate a sparse estimator\nto the leading eigenvector. The COCA with SPCA and Copula PCA with SPCA are listed as follows:\n\n(1) Input: A symmetric matrix b\u0393. Initialize u \u2208 Sd\u22121. (2). Iterate until convergence: (a) v \u2190\narg minv\u2208Rd(u \u2212 v)Tb\u0393(u \u2212 v) + \u03b41kvk2\n2 + \u03b42kvk1; (b) u \u2190b\u0393v/kb\u0393vk2. (3) Output: v/kvk2. Here\nb\u0393 is either bR or bS, corresponding to the COCA with SPCA and Copula PCA with SPCA. \u03b41 \u2208 R\nand \u03b42 \u2208 R are two tuning parameters. [25] suggest using the \ufb01rst leading eigenvector ofb\u0393 to be the\n\n. Detailed algorithms are presented in the long version of this paper [6].\n\ninitial value of v. The SPCA can be considered as a solver to Equations (3.4) and (3.6) with q = 1.\nThe Truncated Power method (TPower) is proposed by [23]. The main idea is to utilize the power\nmethod, but truncate the vector to a \u20180 ball in each iteration. Actually, TPower can be generalized\nto a family of algorithms to solve Equation (3.4) when 0 \u2264 q \u2264 1. We name it the \u2018q Constraint\nTruncated Power Method (qTPM). Especially, when q = 0, the algorithm qTPM coincides with\n[23]\u2019s method. The TPower can be considered as a general solver to Equation (3.4) and Equation\n(3.6) with q \u2208 [0, 1]. In detail, we utilize the classical power method, but in each iteration t we\nproject the intermediate vector xt to the intersection of the d-dimension sphere Sd\u22121 and the \u2018q ball\nwith the radius R1/q\n4 Theoretical Properties\nIn this section we provide the theoretical properties of the COCA and Copula PCA methods. Espe-\ncially, we are interested in the high dimensional case when d > n.\n4.1 Rank-based Correlation and Covariance Matrices Estimation\n\nThis section is devoted to the statement of our result on quantifying the convergence rate of bR to \u03a30\nand bS to \u03a3. In particular, we establish the results on the \u2018max convergence rates of the Spearman\u2019s\nPCA, however, we still need to quantify the convergence rate of bS to \u03a3.\n\nrho correlation and covariance matrices to \u03a3 and \u03a30. For COCA, Lemma 2.2 is enough. For Copula\nDe\ufb01nition 4.1 Subgaussian Transformation Function Class. Let Z \u2208 R be a random variable\nfollowing the standard Gaussian distribution. The Subgaussian Transformation Function Class\nTF(K) is de\ufb01ned as the set of functions {g0 : R \u2192 R} which satis\ufb01es that: E|g0(Z)|m \u2264\n2 K m, \u2200 m \u2208 Z+.\nHere it is easy to see that for any function g0 : R \u2192 R, if there exists a constant L < \u221e such that\ng0(z) \u2264 L or g0\n0 (z) \u2264 L, \u2200 z \u2208 R, then g0 \u2208 TF(K) for some constant K.\nThen we have the following result, which states that \u03a3 can also be recovered in the parametric rate.\n{\u03c3j} < c0 <\nLemma 4.1. When x1, . . . , xn \u223ci.i.d N P Nd(\u00b5, \u03a3, f), 0 < 1/c0 < min\n\u221e, for some constant c0 and g := {gj = f\u22121\nj \u2208 T F (K)\nj }d\nwhere K < \u221e is some constant, we have for any 1 \u2264 j, k \u2264 d, for any n \u2265 21\n\n{\u03c3j} < max\nj=1 satis\ufb01es for all j = 1, . . . , K, g2\nlog d + 2,\n\n0(z) \u2264 L or g00\n\nm!\n\nq\n\nj\n\nj\n\nP(|bSjk \u2212 \u03a3jk| > t) \u2264 2 exp(\u2212c1nt2),\n\n(4.1)\n\nwhere c1 is a constant only depending on the choice of K.\nRemark 4.1. The Lemma 4.1 claims that, under certain constraint on the transformation functions,\nthe latent covariance matrix \u03a3 can be recovered using the Spearman\u2019s rho covariance matrix. How-\never, in this case, the marginal distributions of the Nonparanormal are required to be sub-gaussian\nand cannot be arbitrarily continuous. This makes the Copula PCA a less favored method.\n4.2 COCA and Copula PCA\nThis section is devoted to the statement of our main result on the upper bound of the estimated error\nof the COCA estimator and Copula PCA estimator.\n\nTheorem 4.1 (Upper bound for the COCA). Lete\u03b81 be the global solution to Equation (3.4) and\n| sin \u2220(v1, v2)| =p1 \u2212 (vT\n\nthe Model M0(q, Rq, \u03a30, f 0) holds. For any two vectors v1 \u2208 Sd\u22121 and v2 \u2208 Sd\u22121, let\n\n \nsin2 \u2220(e\u03b81, \u03b81) \u2264 \u03b3qR2\n\n2 !\n(cid:18) 64\u03c02\n(cid:19) 2\u2212q\n1 v2)2, then we have, for any n \u2265 21\n(\u03bb1 \u2212 \u03bb2)2 \u00b7 log d\n\nlog d + 2,\n\nn\n\nq\n\nP\n\n\u2265 1 \u2212 1/d2,\n\n(4.2)\n\n5\n\n\f\u221a\n\nwhere \u03b3q = 2 \u00b7 I(q = 1) + 4 \u00b7 I(q = 0) + (1 +\n\nProof. The key idea of the proof is to utilize the \u2018max norm convergence result of bR to \u03a30. Detailed\n\n3)2 \u00b7 I(0 < q < 1).\n\nproofs are presented in the long version of this paper [6].\nGenerally, when Rq and \u03bb1, \u03bb2 do not scale with (n, d), the rate is OP\n, which is the\nparametric rate [16, 20, 18] obtain. When (n, d) goes to in\ufb01nity, the two dominant eigenvalues \u03bb1\nand \u03bb2 will typically go to in\ufb01nity and will at least be away from zero. Hence, our rate shown in\nEquation (4.2) is better than the seemingly more state-of-art rate: \u03b3qR2\nq\n\n(cid:16)\nn )1\u2212q/2(cid:17)\n(cid:16) 64\u03c02\u03bb2\n(\u03bb1\u2212\u03bb2)2 \u00b7 log d\n\n(cid:17) 2\u2212q\n\n( log d\n\nn\n\n.\n\n2\n\n1\n\nEquation (3.4) and the Model M0(0, R0, \u03a30, f 0) holds. Let \u03980 := supp(\u03b81)\n\nThe COCA is signi\ufb01cantly different from [20] and [18]\u2019s results in the sense that: (1) In theory,\nthe Nonparanormal family can have arbitrary continuous marginal distributions, where a fast rate\ncannot be obtained using the techniques built for either Gaussian or sub-Gaussian distributions;\n\ninstead of using the sample correlation matrix S0. This procedure has been shown to lose little in\nrate and will be much more robust under the Nonparanormal model. Given Theorem 4.1, we can\nimmediately obtain a feature selection consistency result.\n\n(2) In methodology, we utilize the Spearman\u2019s rho correlation coef\ufb01cient matrix bR to estimate \u03a30,\nCorollary 4.1 (Feature Selection Consistency of the COCA). Let e\u03b81 be the global solution to\nb\u03980 :=\nq log d\nsupp(e\u03b81). If we further have minj\u2208\u03980 |\u03b81j| \u2265 16\nP(b\u03980 = \u03980) \u2265 1 \u2212 1/d2.\nn , then for any n \u2265 21/ log d + 2,\nTheorem 4.2 (Upper bound for Copula PCA). Leteu1 be the global solution to Equation (3.6) and\nj \u2208 T F (K) for all 1 \u2264 j \u2264 d,\n2 !\n(cid:19) 2\u2212q\n\nSimilarly, we can give an upper bound for the estimation rate of the Copula PCA to the true leading\neigenvalue u1 of the latent covariance matrix \u03a3. The next theorem provides the detail result.\nthe Model M(q, Rq, \u03a3, f) holds. If g := {gj = f\u22121\nj }d\nand 0 < 1/c0 < min\n\n{\u03c3j} < c0 < \u221e, then we have, for any n \u2265 21/ log d + 2,\n\n{\u03c3j} < max\n\nj=1 satis\ufb01es g2\n\n\u221a\n2R0\u03c0\n\u03bb1\u2212\u03bb2\n\n(cid:18)\n\nand\n\nj\n\nj\n\n \nsin2 \u2220(eu1, u1) \u2264 \u03b3qR2\n\nq\n\n\u2265 1 \u2212 1/d2,\n\nP\n\n4\n\nc1(\u03c91 \u2212 \u03c92)2 \u00b7 log d\n\nn\n\n\u221a\n\n3)2 \u00b7 I(0 < q < 1) and c1 is a constant de\ufb01ned in\n\nwhere \u03b3q = 2 \u00b7 I(q = 1) + 4 \u00b7 I(q = 0) + (1 +\nEquation (4.1), only depending on K.\n\nCorollary 4.2 (Feature Selection Consistency of the Copula PCA). Leteu1 be the global solution\nto Equation (3.6) and the Model M(0, R0, \u03a3, f) holds. Let \u0398 := supp(u1) and b\u0398 := supp(eu1).\nj \u2208 T F (K) for all 1 \u2264 j \u2264 d, and 0 < 1/c0 < minj{\u03c3j} <\nn , then for any\n\nIf g := {gj = f\u22121\nj }d\nmaxj{\u03c3j} < c0 < \u221e, and we further have minj\u2208\u0398 |u1j| \u2265\nn \u2265 21\n\nlog d + 2, P(b\u0398 = \u0398) \u2265 1 \u2212 1\n\nq log d\n\nj=1 satis\ufb01es g2\n\n4\nc1(\u03c91\u2212\u03c92)\n\nd2 .\n\n2R0\n\n\u221a\n\n\u221a\n\n5 Experiments\nIn this section we investigate the empirical usefulness of the COCA method. Three sparse PCA\nalgorithms are considered: PMD proposed by [21], SPCA proposed by [25] and Truncated Power\nmethod (TPower) proposed by [23]. The following three methods are considered: (1) Pearson:\nthe classic high dimensional PCA using the Pearson sample correlation matrix; (2) Spearman:\nthe COCA using the Spearman\u2019s rho correlation coef\ufb01cient matrix; (3) Oracle: the classic high\ndimensional PCA using the Pearson sample correlation matrix of the data from the latent Gaussian\n(perfect without contaminations).\n5.1 Numerical Simulations\nIn the simulation study we randomly sample n data points x1, . . . , xn from the Nonparanormal\ndistribution X \u223c N P Nd(\u03a30, f 0). Here we consider the setup of d = 100. We follow the\nsame generating scheme as in [19, 23] and [7]. A covariance matrix \u03a3 is \ufb01rstly synthesized\nthrough the eigenvalue decomposition, where the \ufb01rst two eigenvalues are given and the corre-\nsponding eigenvectors are pre-speci\ufb01ed to be sparse. In detail, we suppose that the \ufb01rst two dom-\ninant eigenvectors of \u03a3, u1 and u2, are sparse in the sense that only the \ufb01rst s = 10 entries of\nu1 and the second s = 10 entries of u2 are nonzero and set to be 1/\n10. \u03c91 = 5, \u03c92 = 2,\n\n\u221a\n\n6\n\n\f,\n\nh\u22121\n3 (x) :=\n\nx3\u221aR t6\u03c6(t)dt\n\n, h\u22121\n\n4 (x) :=\n\n\u03a6(x)\u2212R \u03a6(t)\u03c6(t)dt\n\n\u221aR (\u03a6(y)\u2212R \u03a6(t)\u03c6(t)dt)2\u03c6(y)dy\n\n1 (x) := x,\n\nnonlinear = {h1, h2, h3, h4, h5, h1, h2, h3, h4, h5, . . .}, where h\u22121\nexp(x)\u2212R exp(t)\u03c6(t)dt\n\n\u03c93 = . . . = \u03c9d = 1. The remaining eigenvectors are chosen arbitrarily. The correlation matrix\n\u03a30 is accordingly generated from \u03a3, with \u03bb1 = 4, \u03bb2 = 2.5, \u03bb3, . . . , \u03bbd \u2264 1 and the two domi-\nnant eigenvectors sparse. To sample data from the Nonparanormal, we also need the transformation\nfunctions: f 0 = {f 0\nj }d\nj=1. Here two types of transformation functions are considered: (1) Linear\nlinear = {h0, h0, . . . , h0}, where h0(x) := x; (2)\ntransformation (or no transformation): f 0\nNonlinear transformation: there exist \ufb01ve univariate monotone functions h1, h2, . . . , h5 : R \u2192 R\nh\u22121\n2 (x) :=\nand f 0\n\u221aR |t|\u03c6(t)dt\nsign(x)|x|1/2\n, h\u22121\n5 (x) :=\n\u221aR (exp(y)\u2212R exp(t)\u03c6(t)dt)2\u03c6(y)dy\n. Here \u03c6 and \u03a6 are de\ufb01ned to be the probability density and cu-\nmulative distribution functions of the standard Gaussian. h1, . . . , h5 are de\ufb01ned such that for any\nZ \u223c N(0, 1), E(h\u22121\nj (Z)) = 1 \u2200 j \u2208 {1, . . . , 5}. We then generate\nn = 100, 200 or 500 data points from:\n[Scheme 1] X \u223c N P Nd(\u03a30, f 0\nlinear = {h0, h0, . . . , h0} and \u03a30 is de\ufb01ned as above.\n[Scheme 2] X \u223c N P Nd(\u03a30, f 0\nTo evaluate the robustness of different methods, we adopt a similar data contamination procedure as\nin [14]. Let r \u2208 [0, 1) represents the proportion of samples being contaminated. For each dimension,\nwe randomly select bnrc entries and replace them with either 5 or -5 with equal probability. The\n\ufb01nal data matrix we obtained is X \u2208 Rn\u00d7d. The PMD, SPCA and TPower algorithms are then\n\nemployed on X to computer the estimated leading eigenvectore\u03b81.\n\nnonlinear = {h1, h2, h3, h4, h5, . . .}.\n\nlinear) where f 0\nnonlinear) where f 0\n\nj (Z)) = 0 and Var(h\u22121\n\nUnder the Scheme 1 and Scheme 2 with different levels of contamination (r = 0 or 0.05), we\nrepeatedly generate the data matrix X for 1,000 times and compute the averaged False Positive Rates\nand False Negative Rates using a path of tuning parameters \u03b4. The feature selection performances\nof different methods are then evaluated. The corresponding ROC curves are presented in Figure 2.\nMore quantitative results are provided in the long version of this paper [6]. It can be observed that\nwhen r = 0 and X is exactly Gaussian, Pearson,Spearman and Oracle can all recover the sparsity\npattern perfectly. However, when r > 0, the performances of Pearson signi\ufb01cantly decrease, while\nSpearman is still very close to the Oracle. In Scheme 2, even when r = 0, Pearson cannot recover\nthe support set of \u03b81, while Spearman can still recover the sparsity pattern almost perfectly. When\nr > 0, the performance of Spearman is still very close to the Oracle.\n\nr = 0\n\nr = 0.05\n\nr = 0\n\nr = 0.05\n\nr = 0\n\nr = 0.05\n\nFigure 2: ROC curves for the PMD, SPCA and Truncated Power method (the left two, the middle\ntwo, the right two) with linear (no) and nonlinear transformation (top, bottom) and data contamina-\ntion at different levels (r = 0, 0.05). Here n = 100 and d = 100.\n5.2 Large-scale Genomic Data Analysis\nIn this section we investigate the performance of Spearman compared with the Pearson using\none of the largest microarray datasets [17]. In summary, we collect in all 13,182 publicly available\nmicroarray samples from Affymetrixs HGU133a platform. The raw data contain 20,248 probes and\n13,182 samples belonging to 2,711 tissue types (e.g., lung cancers, prostate cancer, brain tumor etc.).\nThere are at most 1,599 samples and at least 1 sample belonging to each tissue type. We merge the\nprobes corresponding to the same gene. There are remaining 12,713 genes and 13,182 samples. This\ndataset is non-Gaussian (see the long version of this paper [6]). The main purpose of this experiment\nis to compare the performance of the COCA with the classical high dimensional PCA. We utilize the\nTruncated Power method proposed by [23] to achieve the sparse estimated dominant eigenvectors.\n\n7\n\n0.00.20.40.60.81.00.00.20.40.60.81.0PMDFPRTPRPearsonSpearmanOracle0.00.20.40.60.81.00.00.20.40.60.81.0PMDFPRTPRPearsonSpearmanOracle0.00.20.40.60.81.00.00.20.40.60.81.0SPCAFPRTPRPearsonSpearmanOracle0.00.20.40.60.81.00.00.20.40.60.81.0SPCAFPRTPRPearsonSpearmanOracle0.00.20.40.60.81.00.00.20.40.60.81.0TPowerFPRTPRPearsonSpearmanOracle0.00.20.40.60.81.00.00.20.40.60.81.0TPowerFPRTPRPearsonSpearmanOracle0.00.20.40.60.81.00.00.20.40.60.81.0PMDFPRTPRPearsonSpearmanOracle0.00.20.40.60.81.00.00.20.40.60.81.0PMDFPRTPRPearsonSpearmanOracle0.00.20.40.60.81.00.00.20.40.60.81.0SPCAFPRTPRPearsonSpearmanOracle0.00.20.40.60.81.00.00.20.40.60.81.0SPCAFPRTPRPearsonSpearmanOracle0.00.20.40.60.81.00.00.20.40.60.81.0TPowerFPRTPRPearsonSpearmanOracle0.00.20.40.60.81.00.00.20.40.60.81.0TPowerFPRTPRPearsonSpearmanOracle\fWe adopt the same idea of data-preprocessing as in [14]. In particular, we \ufb01rstly remove the batch\neffect by applying the surrogate variable analysis proposed by [13]. We then extract the top 2,000\ngenes with the highest marginal standard deviations. There are, accordingly, 2,000 genes left and the\ndata matrix we are focusing is 2, 000\u00d7 13, 182. We then explore several tissue types with the largest\nsample size: (1) Breast tumor, 1,599 samples; (2) B cell lymphoma, 213 samples; (3) Prostate tumor,\n148 samples; (4) Wilms tumor, 143 samples.\n\nFigure 3: The scatter plots of the \ufb01rst two principal components of the dataset. The Spearman\nversus Pearson are compared (top to bottom). b cell lymphoma, breast tumor, prostate tumor and\nWilms tumor are explored (from left to right). Each black point represents a sample and each red\npoint represents a sample belonging to the corresponding tissue type.\nFor each tissue type listed above, we apply the COCA (Spearman) and the classic high dimensional\nPCA (Pearson) on the data belonging to this speci\ufb01c tissue type and obtain the \ufb01rst two dominant\nsparse eigenvectors. Here we set R0 = 100 for both eigenvectors. For COCA, we do a normal score\ntransformation on the original dataset. We subsequently project the whole dataset to the \ufb01rst two\nprincipal components using the obtained eigenvectors. The according 2-dimension visualization is\nillustrated in Figure 3. In Figure 3 each black point represents a sample and each red point represents\na sample belonging to the corresponding tissue type. It can be observed that, in 2D plots learnt by\nthe COCA, the red points are averagely more dense and more close to the border of the sample\ncluster. The \ufb01rst phenomenon indicates that the COCA has the potential to preserve more common\ninformation shared by samples from the same tissue type. The second phenomenon indicates that\nthe COCA has the potential to differentiate samples from different tissue types more ef\ufb01ciently.\n6 Discussion and Comparison with Related Work\nA similar principal component analysis procedure is proposed by [7], in which they advocate the\nuse of the transformed Kendall\u2019s tau correlation matrix (instead of the Spearman\u2019s rho correlation\nmatrix as in the current paper) for estimating the sparse leading eigenvectors. Though both papers\nare working on principal component analysis, the core ideas are quite different: Firstly, the analy-\nsis in [7] is based on a different distribution family called transelliptical, while COCA and Copula\nPCA are based on the Nonparanormal family. Secondly, by improving the modeling \ufb02exibility, in\n[7] there does not exist a scale-variant variant since it is hard to quantify the transformation func-\ntions. In contrast, by introducing the subgaussian transformation function family, the current paper\nprovides suf\ufb01cient conditions for Copula PCA to achieve parametric rates. Thirdly, the method in\n[7] cannot explicitly conduct data visualization, due to the fact that the latent elliptical distribution\nis unspeci\ufb01ed and accordingly they cannot accurately estimate the marginal transformations. For\nCopula PCA, we are able to provide the projection visualization such as in the experiment part of\nthis paper. Moreover, via quantifying a sharp convergence rate in estimating the marginal transfor-\nmations, we can provide the convergence rates in estimating the principal components. Due to space\nlimit, we refer to the longer version of this paper [6] for more details. Finally, we recommend using\nthe Spearman\u2019s rho instead of the Kendall\u2019s tau in estimating the correlation coef\ufb01cients provided\nthat the Nonparanormal model holds. This is because Spearman\u2019s rho is statistically more ef\ufb01cient\nthan Kendall\u2019tau within the Nonparanormal family. This research was supported by NSF award\nIIS-1116730.\n\n8\n\n\u221220\u221215\u221210\u22125051015\u22121001020b cell lymphomaPrincipal Component 1Principal Component 2lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll\u221220\u221215\u221210\u22125051015\u22121001020breast tumorPrincipal Component 1Principal Component 2lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll\u221215\u221210\u221250510\u221210\u221250510prostate tumorPrincipal Component 1Principal Component 2llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll\u22121001020\u221210\u2212505101520wilms tumorPrincipal Component 1Principal Component 2lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll\u22124\u22122024\u22124\u2212202b cell lymphomaPrincipal Component 1Principal Component 2lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll\u22124\u22122024\u22124\u2212202breast tumorPrincipal Component 1Principal Component 2lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll\u22124\u2212202\u22124\u22122024prostate tumorPrincipal Component 1Principal Component 2llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll\u221220246\u22124\u22122024wilms tumorPrincipal Component 1Principal Component 2lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll\fReferences\n[1] A.A. Amini and M.J. Wainwright. High-dimensional analysis of semide\ufb01nite relaxations for\nIn Information Theory, 2008. ISIT 2008. IEEE International\n\nsparse principal components.\nSymposium on, pages 2454\u20132458. IEEE, 2008.\n\n[2] T.W Anderson. An introduction to multivariate statistical analysis, volume 2. Wiley New\n\nYork, 1958.\n\n[3] A. d\u2019Aspremont, F. Bach, and L.E. Ghaoui. Optimal solutions for sparse principal component\n\nanalysis. The Journal of Machine Learning Research, 9:1269\u20131294, 2008.\n\n[4] A. d\u2019Aspremont, L. El Ghaoui, M.I. Jordan, and G.R.G. Lanckriet. A direct formulation for\nsparse PCA using semide\ufb01nite programming. Computer Science Division, University of Cali-\nfornia, 2004.\n\n[5] B. Flury. A \ufb01rst course in multivariate statistics. Springer Verlag, 1997.\n[6] F. Han and H. Liu. High dimensional semiparametric scale-invariant principal component\n\nanalysis. Technical Report, 2012.\n\n[7] F. Han and H. Liu. Tca: Transelliptical principal component analysis for high dimensional\n\nnon-gaussian data. Technical Report, 2012.\n\n[8] T. Hastie and W. Stuetzle. Principal curves. Journal of the American Statistical Association,\n\npages 502\u2013516, 1989.\n\n[9] I.M. Johnstone and A.Y. Lu. On consistency and sparsity for principal components analysis in\n\nhigh dimensions. Journal of the American Statistical Association, 104(486):682\u2013693, 2009.\n\n[10] I.T. Jolliffe. Principal component analysis, volume 2. Wiley Online Library, 2002.\n[11] I.T. Jolliffe, N.T. Trenda\ufb01lov, and M. Uddin. A modi\ufb01ed principal component technique based\n\non the lasso. Journal of Computational and Graphical Statistics, 12(3):531\u2013547, 2003.\n\n[12] M. Journ\u00b4ee, Y. Nesterov, P. Richt\u00b4arik, and R. Sepulchre. Generalized power method for sparse\nprincipal component analysis. The Journal of Machine Learning Research, 11:517\u2013553, 2010.\n[13] J.T. Leek and J.D. Storey. Capturing heterogeneity in gene expression studies by surrogate\n\nvariable analysis. PLoS Genetics, 3(9):e161, 2007.\n\n[14] H. Liu, F. Han, M. Yuan, J. Lafferty, and L. Wasserman. High dimensional semiparametric\n\ngaussian copula graphical models. Annals of Statistics, 2012.\n\n[15] H. Liu, J. Lafferty, and L. Wasserman. The nonparanormal: Semiparametric estimation of high\ndimensional undirected graphs. The Journal of Machine Learning Research, 10:2295\u20132328,\n2009.\n\n[16] Z. Ma. Sparse principal component analysis and iterative thresholding. Arxiv preprint\n\narXiv:1112.2432, 2011.\n\n[17] Matthew McCall, Benjamin Bolstad, and Rafael Irizarry. Frozen robust multiarray analysis\n\n(frma). Biostatistics, 11:242\u2013253, 2010.\n\n[18] D. Paul and I.M. Johnstone. Augmented sparse principal component analysis for high dimen-\n\nsional data. Arxiv preprint arXiv:1202.1242, 2012.\n\n[19] H. Shen and J.Z. Huang. Sparse principal component analysis via regularized low rank matrix\n\napproximation. Journal of multivariate analysis, 99(6):1015\u20131034, 2008.\n\n[20] V.Q. Vu and J. Lei. Minimax rates of estimation for sparse pca in high dimensions. Arxiv\n\npreprint arXiv:1202.0786, 2012.\n\n[21] D.M. Witten, R. Tibshirani, and T. Hastie. A penalized matrix decomposition, with ap-\nplications to sparse principal components and canonical correlation analysis. Biostatistics,\n10(3):515\u2013534, 2009.\n\n[22] L. Xue and H. Zou. Regularized rank-based estimation of high-dimensional nonparanormal\n\ngraphical models. Annals of Statistics, 2012.\n\n[23] X.T. Yuan and T. Zhang. Truncated power method for sparse eigenvalue problems. Arxiv\n\npreprint arXiv:1112.2679, 2011.\n\n[24] H. Zou and T. Hastie. Regularization and variable selection via the elastic net. Journal of the\n\nRoyal Statistical Society: Series B (Statistical Methodology), 67(2):301\u2013320, 2005.\n\n[25] H. Zou, T. Hastie, and R. Tibshirani. Sparse principal component analysis. Journal of compu-\n\ntational and graphical statistics, 15(2):265\u2013286, 2006.\n\n9\n\n\f", "award": [], "sourceid": 4809, "authors": [{"given_name": "Fang", "family_name": "Han", "institution": null}, {"given_name": "Han", "family_name": "Liu", "institution": null}]}