Wu-jun Li, Dit-Yan Yeung, Zhihua Zhang
One crucial assumption made by both principal component analysis (PCA) and probabilistic PCA (PPCA) is that the instances are independent and identically distributed (i.i.d.). However, this common i.i.d. assumption is unreasonable for relational data. In this paper, by explicitly modeling covariance between instances as derived from the relational information, we propose a novel probabilistic dimensionality reduction method, called probabilistic relational PCA (PRPCA), for relational data analysis. Although the i.i.d. assumption is no longer adopted in PRPCA, the learning algorithms for PRPCA can still be devised easily like those for PPCA which makes explicit use of the i.i.d. assumption. Experiments on real-world data sets show that PRPCA can effectively utilize the relational information to dramatically outperform PCA and achieve state-of-the-art performance.