Part of Advances in Neural Information Processing Systems 25 (NIPS 2012)
Fang Han, Han Liu
We propose a high dimensional semiparametric scale-invariant principle compo- nent analysis, named TCA, by utilize the natural connection between the ellipti- cal distribution family and the principal component analysis. Elliptical distribu- tion family includes many well-known multivariate distributions like multivari- ate Gaussian, t and logistic and it is extended to the meta-elliptical by Fang et.al (2002) using the copula techniques. In this paper we extend the meta-elliptical distribution family to a even larger family, called transelliptical. We prove that
TCA can obtain a near-optimal splog d/n estimation consistency rate in recover-
ing the leading eigenvector of the latent generalized correlation matrix under the transelliptical distribution family, even if the distributions are very heavy-tailed, have infinite second moments, do not have densities and possess arbitrarily con- tinuous marginal distributions. A feature selection result with explicit rate is also provided. TCA is further implemented in both numerical simulations and large- scale stock data to illustrate its empirical usefulness. Both theories and experi- ments confirm that TCA can achieve model flexibility, estimation accuracy and robustness at almost no cost.