Efficient Kernel Discriminant Analysis via QR Decomposition

Part of Advances in Neural Information Processing Systems 17 (NIPS 2004)

Bibtex Metadata Paper


Tao Xiong, Jieping Ye, Qi Li, Ravi Janardan, Vladimir Cherkassky


Linear Discriminant Analysis (LDA) is a well-known method for fea- ture extraction and dimension reduction. It has been used widely in many applications such as face recognition. Recently, a novel LDA algo- rithm based on QR Decomposition, namely LDA/QR, has been proposed, which is competitive in terms of classification accuracy with other LDA algorithms, but it has much lower costs in time and space. However, LDA/QR is based on linear projection, which may not be suitable for data with nonlinear structure. This paper first proposes an algorithm called KDA/QR, which extends the LDA/QR algorithm to deal with nonlin- ear data by using the kernel operator. Then an efficient approximation of KDA/QR called AKDA/QR is proposed. Experiments on face image data show that the classification accuracy of both KDA/QR and AKDA/QR are competitive with Generalized Discriminant Analysis (GDA), a gen- eral kernel discriminant analysis algorithm, while AKDA/QR has much lower time and space costs.

1 Introduction

Linear Discriminant Analysis [3] is a wellknown method for dimension reduction. It has been used widely in many applications such as face recognition [2]. Classical LDA aims to find optimal transformation by minimizing the within-class distance and maximizing the between-class distance simultaneously, thus achieving maximum discrimination. The optimal transformation can be readily computed by computing the eigen-decomposition on the scatter matrices.

Although LDA works well for linear problems, it may be less effective when severe non- linearity is involved. To deal with such a limitation, nonlinear extensions through kernel functions have been proposed. The main idea of kernel-based methods is to map the input data to a feature space through a nonlinear mapping, where the inner products in the feature

space can be computed by a kernel function without knowing the nonlinear mapping explic- itly [9]. Kernel Principal Component Analysis (KPCA) [10], Kernel Fisher Discriminant Analysis (KFDA) [7] and Generalized Discriminant Analysis (GDA) [1] are, respectively, kernel-based nonlinear extensions of the well known PCA, FDA and LDA methods.

To our knowledge, there are few efficient algorithms for general kernel based discriminant algorithms -- most known algorithms effectively scale as O(n3) where n is the sample size. In [6, 8], S. Mika et al. made a first attempt to speed up KFDA through a greedy approximation technique. However the algorithm was developed to handle the binary clas- sification problem. For multi-class problem, the authors suggested the one against the rest scheme by considering all two-class problems.

Recently, an efficient variant of LDA, namely LDA/QR, was proposed in [11, 12]. The essence of LDA/QR is the utilization of QR-decomposition on a small size matrix. The time complexity of LDA/QR is linear in the size of the training data, as well as the number of dimensions of the data. Moreover, experiments in [11, 12] show that the classification accuracy of LDA/QR is competitive with other LDA algorithms.

In this paper, we first propose an algorithm, namely KDA/QR1, which is a nonlinear exten- sion of LDA/QR. Since KDA/QR involves the whole kernel matrix, which is not scalable for large datasets, we also propose an approximation of KDA/QR, namely AKDA/QR. A distinct property of AKDA/QR is that it scales as O(ndc), where n is the size of the data, d is the dimension of the data, and c is the number of classes.

We apply the proposed algorithms on face image datasets and compare them with LDA/QR, and Generalized Discriminant Analysis (GDA) [1], a general method for kernel discrim- inant analysis. Experiments show that: (1) AKDA/QR is competitive with KDA/QR and GDA in classification; (2) both KDA/QR and AKDA/QR outperform LDA/QR in classifi- cation; and (3) AKDA/QR has much lower costs in time and space than GDA.